NSF Grant IIS-0308264 Data Mining for Rare Class Analysis
Data Mining for Rare Class Analysis
National Science Foundation Award Number: IIS-0308264 (September 1, 2003 - August 31, 2007)
Contact Information:
Vipin Kumar, PI
Department of Computer Science and Engineering
4-192, EE/CSci Building
University of Minnesota
Minneapolis, MN 55455
Phone (612) 625 0726
E-mail: kumar at cs.umn.edu URL: http://www.cs.umn.edu/~kumar
Jaideep Srivastava, co-PI
Department of Computer Science and Engineering
5-209, EE/CSci Building
University of Minnesota
Minneapolis, MN 55455
Phone (612) 625 4012
E-mail: srivasta at cs.umn.edu URL: http://www.cs.umn.edu/people/faculty/index.php?id=165
List of Supported Students and Staff:
Postdoctoral Researcher(s):
Aleksandar Lazarevic
Michael Steinbach
Graduate Students:
Sandeep Mane
Nishith Pathak
Aysel Ozgur
Gyorgy Simon
Varun Chandola
Rohit Gupta
Shyam Boriah
Project Award Information:
Award Number: IIS-0308264
Duration: September 1, 2003 - August 31, 2007
Title: Data Mining for Rare Class Analysis
Keywords: data mining, classification, rare class, data mining applications
Project Summary:
This project systematically addresses the rare class problem which is a important in building predictive models. It involves developing novel methods to select features to build predicvtive models, in the context of rare class learning. Specifically, a feature based approach has been developed to find local patterns and features using association analysis. It also involves developing new methods of predictive modeling that are specifically suited for rare classes. It also involves developing adaptive techniques for data streams, as many rare class analysis applications arrives as a set of time-oriented streams. Techniques have been developed for analyzing network traffic to detect scans aimed at identifying network vulnerabilities.
Steinbach, M., Tan, P., and Kumar, V. 2004. Support envelopes: a technique for exploring the structure of association patterns. In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Seattle, WA, USA, August 22 - 25, 2004). KDD '04. ACM Press, New York, NY, 296-305. DOI= http://doi.acm.org/10.1145/1014052.1014086.
Steinbach, M., Tan, P., Xiong, H., and Kumar, V. 2004. Generalizing the notion of support. In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Seattle, WA, USA, August 22 - 25, 2004). KDD '04. ACM Press, New York, NY, 689-694.
Vipin Kumar, Parallel and Distributed Computing for Cyber Security. An article based on the keynote talk by the author at 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004). DS Online Journal, Volume 6, number 10, October 2005.
Aleksandar Lazarevic and Vipin Kumar, Feature Bagging for Outlier Detection, Proceedings of the Eleventh ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD 2005) Chicago, August, 2005.
Sandeep Mane, Jamshid Vayghan, Jaideep Srivastava, Philip Yu and Gedas Adomavicius, Data Mining Techniques for Automated Evaluation of Sales Opportunities: A Case Study In International workshop on Customer Relationship Management: Data Mining Meets Marketing, 2005.
Mane, S., Srivastava, J., and Hwang, S. Estimating missed actual positives using independent classifiers. In Proceedings of the Eleventh ACM SIGKDD international Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA, August 21 - 24, 2005). KDD '05. ACM Press, New York, NY, 648-653, 2005.
J. Srivastava, P. Desikan, and V. Kumar, Web Mining – Concepts, Applications and Research Directions, Book Chapter in Recent Advances in Data Mining and Granular Computing (mathematical aspects of knowledge discovery), T.Y. Lin and Wesley Chu, eds., Springer-Verlag, expected 2005.
Michael Steinbach and Vipin Kumar, Generalizing the Notion of Confidence, Fifth IEEE International Conference on Data Mining (ICDM' 05), pp 402-409, Houston, TX, 27-30 November, 2005. Also to appear in Knowledge and Information Systems.
Gyorgy Simon, Eric Eilertson, Vipin Kumar, Zhi-Li Zhang and Hui Xiong, Scan Detection: A Data Mining Approach, Proceedings of the Sixth SIAM International Conference on Data Mining, April 20-22, 2006, Bethesda, MD
Mark Shaneck, Yongdae Kim, Vipin Kumar, Privacy Preserving Nearest Neighbor Search, To appear in the 2006 IEEE International Workshop on Privacy Aspects of Data Mining, December, 2006
Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar, Enhancing Data Analysis with Noise Removal, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 18, no. 3, pp. 304-319, March, 2006.
Hui Xiong, Michael Steinbach, Arafin Ruslim, Vipin Kumar, Characterizing Pattern Preserving Clustering, Submitted to ACM Transactions on Information Systems, 2006.
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Outlier Detection - A Survey. University of Minnesota Technical Report 07-017, August, 2007.
Sandeep Mane and Jaideep Srivastava, False Negative Estimation and Feature Subsets Selection, submitted to IEEE TKDE in Dec, 2007.
Gyorgy Simon, Vipin Kumar, and Zhi-Li Zhang, Estimating False Negatives for Classification Problems with Cluster Structure, University of Minnesota Technical Report, 2007. Also published in Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, MN
Research Contributions:
New algorithms and techniques have been discovered for the classification of rare classes and these techniques have been applied in a number of areas. Specific techniques include hierarchical classification models, local regression (RBA), outlier detection techniques, feature bagging, the hyperclique association pattern finding algorithm, methods for estimating true and false positives, and scan detection algorithms. Some of the work on scan detection has been incorporated into the Minnesota Intrusion Detection System (MINDS), which is the subject of a National Science Foundation's (NSF) news clip.
Contributions to Resources for Research and Education:
Kumar has co-authored the following introductory textbook on data mining: Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Addison-Wesley, ISBN: 0321321367, April 2005.
The book appeared in print in Spring 2005 and has since been adopted extensively world-wide including major universities such as Stanford, University of Texas at Austin, UIUC, etc. The book has been translated in (or translation in progress) for several languages including Chinese, Portuguese, and Greek.
Kumar, Srivastava, and Lazarevic have co-authored the following survey:
A. Lazarevic, V. Kumar, and J. Srivastava, A Survey of Intrusion Detections Systems, in Managing Cyber Threats: Issues, Approaches and Challenges, edited by V. Kumar, J. Srivastava, and A. Lazarevic, Kluwer Academic Publishers, May 2005.
Kumar is a co-author of a survey on protein function prediction.
Gaurav Pandey, Vipin Kumar, and Michael Steinbach. Computational approaches for protein function prediction: A Survey, Technical Report 06-028, Department of Computer Science, University of Minnesota, October 2006. (Submitted to ACM Computing Surveys.)
Rare classes are an important issue for protein function prediction since functional classes are often quite imbalanced.
The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of Minnesota.