Data Mining Scientific and Engineering Applications

Bob Grossman, University of Illinois at Chicago
Chandrika Kamath, Lawrence Livermore National Laboratory
Vipin Kumar, AHPCRC - University of Minnesota

Due to advances in information technology and high performance computing, very large data sets are becoming available in many scientific disciplines. The rate of production of such data far outstrips our ability to analyze them manually. For example, a computational simulation can generate tera-bytes of data within a few hours, whereas human analysts may take several weeks to analyze these data sets. Other examples include several digital sky surveys, and data sets from the fields of medical imaging, bioinformatics, and remote sensing. As a result, there is an increasing interest in various scientific communities to explore the use of emerging data mining techniques for the analysis of these large data sets.

Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in data. Traditional data analysis is assumption driven as a hypothesis is formed and validated against the data. Data mining, in contrast, is discovery driven as the patterns are automatically extracted from data. The goal of the tutorial is to provide researchers and practitioners in the area of Supercomputing with an introduction to data mining and its application to several scientific and engineering domains, including astrophysics, medical imaging, computational fluid dynamics, structural mechanics, and ecology.