
edited
by
Robert
L. Grossman
University of
Illinois at Chicago, USA
Chandrika
Kamath
Lawrence Livermore
National Laboratory, CA, USA
Philip Kegelmeyer
Sandia National
Laboratories, Livermore, CA, USA
Vipin Kumar
Army High
Performance Computing Research Center (AHPCRC), Minneapolis, MN,USA
Raju R. Namburu
Army Research
Laboratory, Aberdeen Proving Ground, MD, USA
Copyright ® 2001
Kluwer Academic Publishers
(Click
link for ordering info)
ISBN 1402000332
Advances in technology are making massive data sets common in many scientific disciplines, such as astronomy, medical imaging, bioinformatics, combinatorial chemistry, remote sensing, and physics. To find useful information in these data sets, scientists and engineers are turning to data mining techniques. This book is a collection of papers based on the first two in a series of workshops on mining scientific datasets. It illustrates the diversity of problems and application areas that can benefit from data mining, as well as the issues and challenges that differentiate scientific data mining from its commercial counterpart. While the focus of the book is on mining scientific data, the work is of broader interest as many of the
techniques can be applied equally well to data arising in business and web applications.
Audience: This work would be an excellent text for students and researchers who are familiar with the basic principles of data mining and want to learn more about the application of data mining to their problem in science or engineering.



Foreword
Dr.
N. Radhakrishnan

ix 

List of Contributors 
xi 

List of Reviewers 
xvii 

Preface 
xix 

On Mining Scientific
Datasets;
Chandrika Kamath

1 

Understanding High
Dimensional and Large Data Sets: Some Mathematical Challenges and
Opportunities
Jagadish Chandra

23 

Data Mining at the
Interface of Computer Science and Statistics
Padhraic
Smyth

35 

Mining Large Image
Collections
Michael
C.
Burl

63 

Mining Astronomical
Databases
Roberta
M. Humphreys, Juan
E. Cabanela, and Jeffrey Kriessler

85 

Searching for BentDouble
Galaxies in the First Survey
Chandrika Kamath,
Erick CantúPaz,
Imola K. Fodor
and Nu Ai Tang

95 

A Dataspace Infrastructure
for Astronomical Data
Robert
Grossman, Emory Creel, Marco Mazzucco, and Roy
Williams

115 

Data Mining Applications in
Bioinformatics
Naren
Ramakrishnan, Ananth
Y.
Grama

125 

Mining Residue Contacts in
Proteins
Mohammed
J. Zaki and Chris
Bystroff

141 

KDD Services at the
Goddard Earth Sciences Distributed Archive Center
Christopher Lynnes
and Robert
Mack

165 

Data Mining in Integrated
Data Access and Data Analysis Systems
Ruixin Yang,
Menas Kafatos, KwangSu
Yang, and X. Sean Wang

183 

Spatial Data Mining for
Classification, Visualisation and Interpretation with Artmap Neural
Network
Weiguo
Liu, Sucharita Gopal, and Curtis
Woodcock

201 

Real Time Feature
Extraction for the Analysis of Turbulent Flows
I.
Marusic, G.V.
Candler, V.
Interrante, P.K. Subbareddy, and A. Moss

223 

Data Mining for Turbulent
Flows
EuiHong
(Sam) Han, George
Karypis, and Vipin
Kumar

239 

EvitaEfficient
Visualization and Interrogation of TeraScale Data
Raghu Machiraju,
James E. Fowler,
David Thompson, Bharat Soni, and Will Schroeder

257 

Towards Ubiquitous Mining
of Distributed Data
Hillol Kargupta,
Krishnamoorthy Sivakumar, Weiyun Huang, Rajeev
Ayyagari, Rong Chen, ByungHoon Park, and Erik Johnson

281 

Decomposable Algorithms for
Data Mining
Raj Bhatnagar

307 

HDDI™:
Hierarchical
Distributed Dynamic Indexing
William
M.
Pottenger, YongBin Kim, and Daryl D. Meling

319 

Parallel Algorithms for
Clustering HighDimensional LargeScale Datasets
Harsha Nagesh,
Sanjay Goil, and Alok
Choudhary

335 

Efficient Clustering of
Very Large Document Collections
Inderjit
S.
Dhillon, James Fan, and Yuqiang
Guan

357 

A Scalable Hierarchical
Algorithm for Unsupervised Clustering
Daniel
Boley

383 

HighPerformance Singular
Value Decomposition
David
B. Skillicorn,
and Xiaolan Yang

401 

Mining HighDimensional
Scientific Data Sets Using Singular Value Decomposition
Ekaterina
Maltseva, Clara
Pizzuti, and Domenico
Talia

425 

Spatial Dependence in Data
Mining
James
P. LeSage, and R. Kelley Pace.

439 

Sparc: Spatial Association
RuleBased Classification
Jaiwei
Han, Anthony K.H.
Tung, and Jing He

461 

What's Spatial About
Spatial Data Mining: Three Case Studies
Shashi
Shekhar, Yan Huang, Weili Wu, C.T. Lu, and S. Chawla

487 

Predicting Failures in
Event Sequences
Mohammed
J. Zaki, Neal Lesh, and Mitsunori Ogihara

515 

Efficient Algorithms for
Mining Long Patterns in Scientific Data Sets
Ramesh
C. Agarwal, and Charu C.
Aggarwal.

541 

Probabilistic Estimation in
Data Mining
Edwin
P.D. Pednault, Chidanand Apte.

567 

Classification Using
Association Rules: Weaknesses and Enhancements
Bing
Liu, Yiming
Ma, and ChingKian Wong

591 
Home


