TITLE:

High Performance Computing With Spatial Datasets

PRESENTER:

Shashi Shekhar : Biography , Homepage

AFFILIATION:

Computer Science Department, University of Minnesota.

URL:

http://www.cs.umn.edu/~shekhar

SLIDES: Powerpoint (9 Mb)

ABSTRACT:

The importance of geo-spatial data is growing with the increasing availability of large geo-spatial datasets such as maps, remote-sensing images, and the decennial census. Applications include Geo-spatial intelligence, Real-time Situation assessment (e.g. during disaster response); High-fidelity terrain visualization (e.g. Google Earth, flight simulators); Location-based services; Predicting clustering or spread of disease; Finding crime hot spots; Mission to planet earth (global change and climatology, land-use classification); etc. Many of these applications often impose stringent performance and response time constraints which can not often be met by today's sequential Geographic Information Systems (GIS) due to the large volume of geo-spatial datasets and the complexity of geo-spatial data-items including imagery, and extended objects (e.g. polygons and line-strings).

High performance computing, e.g. parallelization of GIS, may meet the requirements of some of these applications. In this talk, we illustrate this message in context of two case studies. First, we focus on real-time terrain visualization in context of flight simulators, whose workload can be modeled as range queries on geo-spatial data-sets. Our work with the GIS-range-query operation shows that data-partitioning is an effective approach towards achieving high performance in GIS. As partitioning extended spatial objects is difficult, special techniques such as systematic declustering beyond random partitioning are needed. Experiments also show that the replication of data may be needed to facilitate dynamic load balancing, as the cost of local processing is often less than the cost of data transfer for spatial objects. Second, we describe our recent effort to parallelize spatial data mining algorithms. In particular, we present preliminary results in parallelizing algorithms to estimate parameters for spatial auto-regression model, which generalizes the linear regression model to address the lack of independence among nearby spatial data-points.

KEYWORDS: Spatial Datasets, High-Performance, Parallel, Geographic Information Systems, Range Query, Spatial Auto-regression.

NOTE: Some of the results discussed in this talk appeared in the following publications:

  1. S. Shekhar and S. Chawla, Spatial Databases: A Tour (Chapters 5 and 7), Prentice Hall 2003, ISBN 0-13-017480-7.
  2. B. M. Kazar, S. Shekhar, D. J. Lilja, D. Boley, A Parallel Formulation of the Spatial Auto-Regression Model for Mining Large Geo-Spatial Datasets, Proc. of 2004 SIAM International Conf. on Data Mining Workshop on High Performance and Distributed Mining (HPDM2004), Florida, USA, April 2004.
  3. S. Shekhar, S. Ravada, G. Turner, D. Chubb, and V. Kumar. Declustering and Load-Balancing Methods for Parallelizing Geographic Information Systems, IEEE Trans. on Knowledge and Data Eng, IEEE, Vol. 10, No. 4, July-Aug. 1998.
  4. S. Shekhar, S. Ravada, G. Turner, D. Chubb, and V. Kumar. Parallelizing a GIS on a Shared Address Space Architecture, Computer (Special Issue on Shared Memory Multipro-cessors), IEEE, Vol. 29, No. 12, Dec. 1996.