Knowledge Discovery in Spatial-Temporal Databases

valbul1a.gif (686 bytes)  Investigators

    Fiez, Tim
    Lazarevic, Aleksandar
    Obradovic, Zoran
    Pokrajac, Dragoljub
    Vucetic, Slobodan

valbul1a.gif (686 bytes)  Problem
The objective of this project is to develop a system for knowledge discovery and local models integration without exchange of confidential and proprietary information in large spatial and spatial-temporal databases.
valbul1a.gif (686 bytes)  Results
    To achieve this objective, we have    
     valbul2a.gif (530 bytes)      developed and tested novel exploratory data analysis methods for spatial data;
     valbul2a.gif (530 bytes)      developed machine learning algorithms for building and selectively applying multiple expert modules;
     valbul2a.gif (530 bytes)      tested model development and prediction capabilities using multiple non-centralized data sets; and
     valbul2a.gif (530 bytes)      prototyped a software package for knowledge discovery from spatial data.


  valbul3a.gif (541 bytes)    Sampling optimization

    A procedure for evaluating spatial sampling techniques in terms of sampling cost and interpolated feature accuracy was developed in our lab and applied to modify grid sampling as to achieve similar expected accuracy with twice less data.

  valbul3a.gif (541 bytes)    Exploratory data analysis

    A spatial data partitioning procedures was developed for training and testing spatial regression methods.  For heterogeneous spatial databases with unstable driving attributes (typical in earth sciences) an adaptive and spatial attribute boosting  algorithm is proposed as an effective technique for increasing modeling accuracy through manipulating training data distributions.

  valbul3a.gif (541 bytes)    Data partitioning

    For identifying more homogeneous sub-fields and designing corresponding expert models we have developed data partitioning methods based on spatial clustering, sequential development of local regressors and the corresponding data distribution models, and an iterative data partitioning using spatial error analysis. All of the multiple expert approaches have resulted in better prediction than a single global model when tested on real-life agricultural data.  Also, data partitioning and local regression algorithms were successfully  adopted to a distributed environment where data mining is restricted to exchange of local models and essential statistics without  raw data communication.

  valbul3a.gif (541 bytes)    Models characterization

  To fully characterize our knowledge discovery algorithms for a large distributed system, we have developed a spatial data simulator which   generates feature layers statistically similar to real spatial data  and computes a target layer according to previously observed rules and  expert knowledge.  This is employed for analyzing the influence of sensor error, unexplained variance, sampling density and data distribution on spatial data prediction quality in precision agriculture.

  valbul3a.gif (541 bytes)    Technology transfer

  Currently, we are developing a data mining software package that integrates our algorithms for spatial and distributed data inspection, preprocessing, and partitioning into an easy-to-use toolbox.