Data Mining for Scientific and Engineering Applications

edited by
Robert L. Grossman
University of Illinois at Chicago, USA
Chandrika Kamath
Lawrence Livermore National Laboratory, CA, USA
Philip Kegelmeyer
Sandia National Laboratories, Livermore, CA, USA
Vipin Kumar
Army High Performance Computing Research Center (AHPCRC), Minneapolis, MN,USA
Raju R. Namburu
Army Research Laboratory, Aberdeen Proving Ground, MD, USA
 

Copyright ® 2001
Kluwer Academic Publishers
(Click link for ordering info)
ISBN 1-4020-0033-2

                                   

Advances in technology are making massive data sets common in many scientific disciplines, such as astronomy, medical imaging, bio-informatics, combinatorial chemistry, remote sensing, and physics. To find useful information in these data sets, scientists and engineers are turning to data mining techniques. This book is a collection of papers based on the first two in a series of workshops on mining scientific datasets. It illustrates the diversity of problems and application areas that can benefit from data mining, as well as the issues and challenges that differentiate scientific data mining from its commercial counterpart. While the focus of the book is on mining scientific data, the work is of broader interest as many of the techniques can be applied equally well to data arising in business and web applications. 

Audience: This work would be an excellent text for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problem in science or engineering.

Foreword 

Dr. N. Radhakrishnan

ix
List of Contributors xi
List of Reviewers xvii
Preface xix

On Mining Scientific Datasets

Chandrika Kamath

1

Understanding High Dimensional and Large Data Sets: Some Mathematical Challenges and Opportunities 

Jagadish Chandra

23

Data Mining at the Interface of Computer Science and Statistics 

Padhraic Smyth

35

Mining Large Image Collections

Michael C. Burl

63

Mining Astronomical Databases 

Roberta M. Humphreys, Juan E. Cabanela, and Jeffrey Kriessler

85

Searching for Bent-Double Galaxies in the First Survey 

Chandrika Kamath, Erick Cantú-Paz, Imola K. Fodor and Nu Ai Tang

95

A Dataspace Infrastructure for Astronomical Data 

Robert Grossman, Emory Creel, Marco Mazzucco, and Roy Williams

115

Data Mining Applications in Bioinformatics 

Naren Ramakrishnan, Ananth Y. Grama

125

Mining Residue Contacts in Proteins 

Mohammed J. Zaki and Chris Bystroff

141

KDD Services at the Goddard Earth Sciences Distributed Archive Center 

Christopher Lynnes and Robert Mack

165

Data Mining in Integrated Data Access and Data Analysis Systems 

Ruixin Yang, Menas Kafatos, Kwang-Su Yang, and X. Sean Wang

183

Spatial Data Mining for Classification, Visualisation and Interpretation with Artmap Neural Network 

Weiguo Liu, Sucharita Gopal, and Curtis Woodcock

201

Real Time Feature Extraction for the Analysis of Turbulent Flows 

I. Marusic, G.V. Candler, V. Interrante, P.K. Subbareddy, and A. Moss

223

Data Mining for Turbulent Flows 

Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar

239

Evita-Efficient Visualization and Interrogation of Tera-Scale Data 

Raghu Machiraju, James E. Fowler, David Thompson, Bharat Soni, and Will Schroeder

257

Towards Ubiquitous Mining of Distributed Data 

Hillol Kargupta, Krishnamoorthy Sivakumar, Weiyun Huang, Rajeev Ayyagari, Rong Chen, Byung-Hoon Park, and Erik Johnson

281

Decomposable Algorithms for Data Mining 

Raj Bhatnagar

307

HDDI™: Hierarchical Distributed Dynamic Indexing 

William M. Pottenger, Yong-Bin Kim, and Daryl D. Meling 

319

Parallel Algorithms for Clustering High-Dimensional Large-Scale Datasets 

Harsha Nagesh, Sanjay Goil, and Alok Choudhary

335

Efficient Clustering of Very Large Document Collections 

Inderjit S. Dhillon, James Fan, and Yuqiang Guan

357

A Scalable Hierarchical Algorithm for Unsupervised Clustering

Daniel Boley

383

High-Performance Singular Value Decomposition 

David B. Skillicorn, and Xiaolan Yang

401

Mining High-Dimensional Scientific Data Sets Using Singular Value Decomposition 

Ekaterina Maltseva, Clara Pizzuti, and Domenico Talia

425

Spatial Dependence in Data Mining 

James P. LeSage, and R. Kelley Pace.

439

Sparc: Spatial Association Rule-Based Classification 

Jaiwei Han, Anthony K.H. Tung, and Jing He

461

What's Spatial About Spatial Data Mining: Three Case Studies

Shashi Shekhar, Yan Huang, Weili Wu, C.T. Lu, and S. Chawla

487

Predicting Failures in Event Sequences 

Mohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara

515

Efficient Algorithms for Mining Long Patterns in Scientific Data Sets 

Ramesh C. Agarwal, and Charu C. Aggarwal.

541

Probabilistic Estimation in Data Mining 

Edwin P.D. Pednault, Chidanand Apte.

567

Classification Using Association Rules: Weaknesses and Enhancements 

Bing Liu, Yiming Ma, and Ching-Kian Wong

591


Home