Indexing and Query Processing on Similarity Search on Spatial Time Series Data

The spatial time series dataset is a common type of scientific data usually collected by satellites, sensor networks, and medical instruments. Correlation-based similarity queries are crucial for data analysis of spatial time series data in many application domains such as epidemiology, ecology, climatology, and census statistics. Existing query processing techniques adapt dimensionality reduction techniques and assume that the power spectrum after transformations is highly skewed. However, the assumption does not hold for many spatial time series data. We proposed spatial cone tree structure and query processing algorithms to facilitate correlation-based queries by exploiting spatial autocorrelation. The algorithms have been successfully applied to the discovery of interacting relationships in a collaborative research project with NASA scientists.

Data Mining and Database Design on Emerging Applications

Database systems are the preferred infrastructure for a wide range of emerging applications that require robust data management capabilities, and data mining provides supplementary data analysis and knowledge discovery capabilities for such databases. Consequently, database and data mining research are likely to interact with various application domains including information security, geographic information science, bioinformatic, commerce, climatology, and epidemiology. We proposed novel database designs and data mining techniques by exploiting the nature of the data and applications, e.g., intrusion detection on data streams collected by electronic monitoring systems in my intern work at United Technologies Research Center and knowledge discovery on spatio-temporal data collected by satellites and sensors in my collobaration with the scientistis from the NASA Ames Research Center.

Detecting Outliers in Topological Data Sets

The identification of outliers can lead to the discovery of unexpected, interesting, and implicit knowledge. Existing methods are designed for detecting outliers in multidimensional geometric data sets. We proposed a neighborhood-based algorithm to detect outliers in topological data sets, formally defined the new notion of topological outliers, and discussed the statistical foundation. We also applied our algorithm to traffic data to demonstrate experimentally its effectiveness and usefulness.

High Performance Spatial Visualization of Traffic Data

The visualization of loop-detector traffic data can help identify useful patterns embedded in the data. Many current visualization techniques do not scale to large data sets and are not practical for interactive visualization and "what if" analysis. We developed spatial algorithms which can help speed up visualization algorithms, provided a data warehousing framework for integrating different multi-dimensional views of traffic data, and built web-based spatial tools to generate critical visualization of highway traffic data. The Minnesota Department of Transportation found this interactive visualization system to be more effective than conventional manual approaches in identifying faulty sensors and in analyzing traffic patterns, e.g., bottleneck locations, source-sink, and rush-hour periods. The visualization toolkit is available at here.

Last updated : Jan 19, 2005