TITLE:

What is special about mining spatial datasets?

PRESENTER:

Shashi Shekhar : Biography , Homepage

AFFILIATION:

Computer Science Department, University of Minnesota.

URL:

http://www.cs.umn.edu/~shekhar

SLIDES:

ABSTRACT:

The importance of spatial data mining is growing with the increasing incidence and importance of large geo-spatial datasets such as maps, virtual globes, repositories of remote-sensing images, the decennial census and collections of trajectories (e.g. gps-tracks). Applications include M(obile)-commerce (e.g. location-based services), Public Health (predicting hotspots and spread of disease), Public Safety (finding crime hot spots), Public Security (e.g. common operational picture), Environment and Climate (global change, land-use classification), etc.

Classical data mining techniques often perform poorly when applied to spatial data sets because of the many reasons. First, spatial data is embedded in a continuous space, whereas classical datasets (e.g. transactions) are often discrete. Second, spatial patterns are often local where as classical data mining techniques often focus on global patterns. Finally, one of the common assumptions in classical statistical analysis is that data samples are independently generated. When it comes to the analysis of spatial data, however, the assumption about the independence of samples is generally false because spatial data tends to be highly self correlated. For example, people with similar characteristics, occupation and background tend to cluster together in the same neighborhoods. In spatial statistics this tendency is called spatial autocorrelation. Ignoring spatial autocorrelation when analyzing data with spatial characteristics may produce hypotheses or models that are inaccurate or inconsistent with the data set.

Thus new methods are needed to analyze spatial data to detect spatial patterns. This talk surveys some of the new methods including those for discovering spatial interactions (e.g. co-locations , tele-connections), detecting spatial outliers and location prediction along with emerging ideas on spatio-temporal pattern mining.

KEYWORDS: Spatial Datasets, Auto-correlation, Spatial data mining.

ACKNOWLEDGMENTS: This work was supported in part by the National Science Foundation, the U.S. Department of Defense, the National Aeronautics and Space Administration the Federal Highway Authority, and the University of Minnesota (e.g. Center for Transportation Studies).

NOTE: Some of the results discussed in this talk appeared in the following publications:

  1. S. Shekhar, V. R. Raju and M. Celik, Spatial and Spatio-temporal Data Mining: Recent Advances, ( pdf ), Next Generation of Data Mining, Chapman & Hall/CRC, 2008, isbn 1420085867, (Ed. H. Kargupta, J. Han, P. Yu, R. Motwani, V. Kumar). Proc. NSF 2nd workshop on Future Directions in Data Mining (2007).
  2. S. Shekhar, P. Zhang, V. R. Raju and Y. Huang, Trends in Spatial Data Mining ( pdf ) , Data Mining: Next Generation Challenges and Future Directions, MIT Press, 2004, isbn 0-262-61203-8 (Ed. H. Kargupta et al). Proc. NSF 1st workshop on Future Directions in Data Mining (2003).
  3. Architecture Technology Corporation, Spatial Data Mining Toolkit for Generating MSDS (aka TopoAssistant) (Topic No. A03-129), SBIR Phase I, US Army Topographic Eng. Center, June 2004, Final Report , Slides .
  4. Mining Colocation patterns from spatial datasets, .
  5. S. Shekhar and S. Chawla, Spatial Databases: A Tour (Chapter 7 on Spatial Data Mining), Prentice Hall 2003, ISBN 0-13-017480-7.
  6. A Summary of Spatial Statistics and Spatial Data Mining Softwares compiled by Dr. B. Kazar in 2004-2005.