TITLE:

What is special about spatial and spatio-temporal data science?

PRESENTER:

Shashi Shekhar : Biography , Homepage , Picture

AFFILIATION:

Computer Science Department, University of Minnesota.

URL:

http://www.cs.umn.edu/~shekhar

VIDEOS:

SLIDES:

ABSTRACT:

The importance of spatial and spatio-temporal data science is growing with the increasing incidence and importance of large datasets such as trajectories, maps, remote-sensing images, census and geo-social media. Applications include Public Health (e.g., monitoring spread of disease, spatial disparity, food deserts), Public Safety (e.g., crime hot spots), Public Security (e.g., common operational picture), Environment and Climate (change detection, land-cover classification), M(obile)-commerce (e.g., location-based services), etc.

Classical data science and machine learning techniques often perform poorly when applied to spatial and spatio-temporal data sets because of the many reasons. First, these dataset are embedded in continuous space with implicit relationships (e.g., distance), which are important. Second, the cost of spurious patterns is often high in many spatial application domains, which ask for guardrails (e.g., statistical significance tests) to reduce false positives and chance patterns. In addition, one of the common assumptions in classical statistical analysis and machine learning is that data samples are independently generated from identical distributions. However, this assumption is generally false due to spatio-temporal auto-correlation and variability. Ignoring autocorrelation and variability when analyzing data with spatial and spatio-temporal characteristics may produce hypotheses or models that are inaccurate or inconsistent with the data.

Thus, new methods are needed to analyze spatial and spatio-temporal data. This talk surveys common and emerging methods for spatial classification and prediction (e.g., spatial autoregression, GWR), as well as techniques for discovering interesting, useful and non-trivial patterns such as hotspots (e.g., circular, linear, arbitrary shapes ), spatiotemporal interactions (e.g., co-locations , cascade , tele-connections ), spatial outliers, and their spatio-temporal counterparts.

KEYWORDS: Spatial, Spatio-temporal, Auto-correlation, Data Mining, Machine Learning, Statistics.

ACKNOWLEDGMENTS: This work was supported in part by the National Science Foundation, the U.S. Department of Defense, the National Aeronautics and Space Administration the Federal Highway Authority, and the University of Minnesota (e.g., Center for Transportation Studies).

SURVEY PAPERS

  1. Transdisciplinary Foundations of Geospatial Data Science ( html , pdf ) ISPRS International Journal of Geo-Informatics, 6(12), 2017. doi:10.3390/ijgi6120395. (with Y. Xie, E. Eftelioglu, R. Ali, X. Tang, Y. Li, and R. Doshi)
  2. Spatiotemporal Data Mining: A Computational Perspective , ISPRS International Journal on Geo-Informtion, 4(4):2306-2338, 2015 (DOI: 10.3390/ijgi4042306). (w/ Z. Jiang, R. Ali, E. Efteliglu, X. Tang, V. Gunturi, and X. Zhou).
  3. Identifying patterns in spatial information: a survey of methods ( pdf ), S. Shekhar, M. R. Evans, J. M. Kang and P. Mohan, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 193-214, 1(3), May/June 2011. (DOI: 10.1002/widm.25).
  4. Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data, IEEE Transactions on Knowledge and Data Mining, 29(10):2318-2331, June 2017. ( DOI: 10.1109/TKDE.2017.2720168 ). (w/ A. Karpatne et al.).
  5. Parallel Processing over Spatial-Temporal Datasets from Geo, Bio, Climate and Social Science Communities: A Research Roadmap. IEEE BigData Congress 2017: 232-250 (with S. Prasad et al.)..
  6. Spatial and Spatio-temporal Data Mining: Recent Advances, ( pdf ), S. Shekhar, V. R. Raju and M. Celik, Next Generation of Data Mining, Chapman & Hall/CRC, 2008, isbn 1420085867, (Ed. H. Kargupta, J. Han, P. Yu, R. Motwani, V. Kumar). Proc. NSF 2nd workshop on Future Directions in Data Mining (2007).
  7. Trends in Spatial Data Mining ( pdf ) , S. Shekhar, P. Zhang, V. R. Raju and Y. Huang, Data Mining: Next Generation Challenges and Future Directions, MIT Press, 2004, isbn 0-262-61203-8 (Ed. H. Kargupta et al). Proc. NSF 1st workshop on Future Directions in Data Mining (2003).
  8. Spatial Data Mining Toolkit for Generating MSDS (aka TopoAssistant) (Topic No. A03-129), SBIR Phase I, US Army Topographic Eng. Center, June 2004, Architecture Technology Corporation, Final Report , Slides .
  9. Mining Colocation patterns from spatial datasets (slides, papers). .
  10. Spatial Databases: A Tour (Chapter 7 on Spatial Data Mining), S. Shekhar and S. Chawla, Prentice Hall 2003, ISBN 0-13-017480-7.
  11. A Summary of Spatial Statistics and Spatial Data Mining Software compiled by Dr. B. Kazar in 2004-2005.

PAPERS ON SPECIFIC PATTERN FAMILIES

  1. Discovering colocation patterns from spatial data sets: a general approach, IEEE Trans. on Know. and Data Eng., 16(12), 2004 (w/ Y. Huang et al.).  
  2. A join-less approach for mining spatial colocation patterns, IEEE Trans. on Know. and Data Eng.,18 (10), 2006. (w/ J. Yoo).
  3. Cascading Spatio-Temporal Pattern Discovery , IEEE Trans. Knowl. Data Eng. 24(11): 1977-1992, 2012 (w/ P. Mohan et al.).
  4. Detecting graph-based spatial outliers: algorithms and applications Proc.: ACM Intl. Conf. on Knowledge Discovery & Data Mining, 2001 (with Q. Lu et al.)
  5. A unified approach to detecting spatial outliers, Springer GeoInformatica, 7 (2), 2003. (w/ C. Lu, et al.)
  6. Discovering Flow Anomalies: A SWEET Approach , IEEE Intl. Conf. on Data Mining, 2008 (w/ J. Kang).
  7. Discovering personally meaningful places: An interactive clustering approach, ACM Trans. on Info. Systems (TOIS) 25 (3), 2007. (with C. Zhou et al.)
  8. A K-Main Routes Approach to Spatial Network Activity Summarization , IEEE Trans on Know. & Data Eng., 26(6), 2014. (with D. Oliver et al.)
  9. Significant Linear Hotspot Discovery< IEEE Trans. Big Data 3(2): 140-153, 2017, (w/ X.Tang et al.)
  10. Ring-Shaped Hotspot Detection, IEEE Trans. Know. and Data Eng., 28(12): 3367-3381, 2016, (w/ E. Eftelioglu et al.)
  11. Spatial contextual classification and prediction models for mining geospatial data , IEEE Transactions on Multimedia, 4 (2), 2002. (with P. Schrater et al.)
  12. Focal-Test-Based Spatial Decision Tree Learning, IEEE Trans. Knowl. Data Eng. 27(6): 1547-1559, 2015 (summary in Proc. IEEE Intl. Conf. on Data Mining, 2013) (w/ Z. Jiang et al.).
  13. Spatiotemporal change footprint pattern discovery: an inter-disciplinary survey., Wiley Interdisc. Rew.: Data Mining and Know. Discovery 4(1), 2014. (with X. Zhou et al.)

NOTE: This talk has been presented at following forums: