The spatial time series dataset is a common
type of scientific data usually collected by satellites, sensor
networks, and medical instruments. Correlation-based similarity
queries are crucial for data analysis of spatial time series
data in many application domains such as epidemiology, ecology,
climatology, and census statistics. Existing query processing
techniques adapt dimensionality reduction techniques and assume
that the power spectrum after transformations is highly skewed.
However, the assumption does not hold for many spatial time
series data. We proposed spatial cone tree structure and query
processing algorithms to facilitate correlation-based queries
by exploiting spatial autocorrelation. The algorithms have been
successfully applied to the discovery of interacting relationships
in a collaborative research project with NASA scientists.
Data Mining and Database Design on Emerging
Applications
Database systems are the preferred infrastructure
for a wide range of emerging applications that require robust
data management capabilities, and data mining provides supplementary
data analysis and knowledge discovery capabilities for such
databases. Consequently, database and data mining research are
likely to interact with various application domains including
information security, geographic information science, bioinformatic,
commerce, climatology, and epidemiology. We proposed novel database
designs and data mining techniques by exploiting the nature
of the data and applications, e.g., intrusion detection on data
streams collected by electronic monitoring systems in my intern
work at United Technologies Research Center and knowledge discovery
on spatio-temporal data collected by satellites and sensors
in my collobaration with the scientistis from the NASA Ames
Research Center.
Detecting Outliers in Topological Data Sets
The identification of outliers can lead to the
discovery of unexpected, interesting, and implicit knowledge.
Existing methods are designed for detecting outliers in multidimensional
geometric data sets. We proposed a neighborhood-based algorithm
to detect outliers in topological data sets, formally defined
the new notion of topological outliers, and discussed the statistical
foundation. We also applied our algorithm to traffic data to
demonstrate experimentally its effectiveness and usefulness.
High Performance Spatial Visualization of Traffic
Data
The visualization of loop-detector traffic data
can help identify useful patterns embedded in the data. Many
current visualization techniques do not scale to large data
sets and are not practical for interactive visualization and
"what if" analysis. We developed spatial algorithms
which can help speed up visualization algorithms, provided a
data warehousing framework for integrating different multi-dimensional
views of traffic data, and built web-based spatial tools to
generate critical visualization of highway traffic data. The
Minnesota Department of Transportation found this interactive
visualization system to be more effective than conventional
manual approaches in identifying faulty sensors and in analyzing
traffic patterns, e.g., bottleneck locations, source-sink, and
rush-hour periods.
The visualization toolkit is available at
here.