Recent Updates
[02-03-2018]I am pleased to announce that our textbook, Introduction to Data Mining, 2nd edition, is finally out! More information can be found from the book's website that contains companion materials such as slides and additional resources for instructors and students. Here is the link to the Pearson of the book for ordering copies.
[01-25-2018]Presented my work at the 2nd workshop on Physics Informed Machine Learning, which was highly relevant for my research on theory-guided data science. It was great interacting with everyone from different disciplines working at the intersections of machine learning and natural sciences.
[12-15-2017] Presented my latest work on theory-guided data science for geoscience problems at AGU Fall Meeting 2017.
[12-09-2017]Presented my work on physics-guided neural networks at the Deep Learning for Physical Sciences Workshop at NIPS 2017.
[11-21-2017]Gave an Invited Keynote Talk at ICDM 2017 Workshop on Data Mining for Earth System Science.
[11-13-2017]Survey article on spatio-temporal data mining got accepted at ACM Computing Surveys. Preprint of survey available on arXiv.
[10-31-2017] Started a collaboration with DC Water and other agencies (USGS, EPA, Xylum, Limnotech, ESRI) to create a ‘‘digital twin’’ of the Anacostia Watershed. Our team will contribute to the monitoring of environmental processes (land use changes) and the modeling of water quality indicators using hybrid-physics-data models.
[10-31-2017] Preprint of my article on physics-guided neural networks (currently in review at SDM 2018) is available on arXiv.
[10-30-2017] Gave an invited talk at the Big Data and Sustainability session of the Annual Meeting of the American Institute of Chemical Engineers (AIChE).
[10-22-2017] Our grant proposal on ‘‘Model Integration Through Knowledge-Rich Data and Process Composition (MINT), ’’ led by Prof. Yolanda Gil, got accepted for the DARPA World Modelers Program.
[10-12-2017] Paper got accepted at IEEE International Conference on Big Data 2017.
[10-11-2017] We have secured one of University of Michigan submission slots for the upcoming NSF Science and Technology Center (STC) program. This is a multi-institution project led by Prof. Christiane Jablonowski that seeks to establish an inter-disciplinary center for extreme weather and physics-aware data science.
[10-01-2017] Started my PostDoctoral Associate at the University of Minnesota with Prof. Vipin Kumar. The primary focus of my post-doctoral research is on advancing theory-guided data science.
[09-27-2017] Defended my Ph.D. dissertation on ‘‘Predictive Learning with Heterogeneity in Populations’’. Officially, Dr. Anuj Karpatne!
[09-19-2017] Our grant proposal for ‘‘NSF Innovations at the Nexus of Food, Energy and Water Systems (INFEWS)’’ got accepted for funding. See this NSF news story to get more information about this project.
[08-23-2017] Attended the SAMSI Climate Opening Workshop in Raleigh, NC. We had some great talks and discussions by a closely-connected group of leading experts on statistics and machine learning for climate science.
[07-25-2017] I served as an instructor for the summer school on ‘‘Intelligent Systems for Geosciences (IS-GEO)’’ at the University of Texas at Austin. It was great interacting with motivated students and researchers from diverse academic and professional backgrounds to get them excited about the growing field of machine learning and its importance in advancing scientific discovery in geosciences.
[07-19-2017] I am serving as a convener for the session on ‘‘Intelligent Systems for Geosciences: Accelerating Discovery and Building Community’’ at AGU Fall Meeting, Dec 10-15, 2017.
[07-14-2017] Our NSF Expeditions in Computing grant on ‘‘Water in the 21st century: A data-guided approach’’ got accepted for final round of review (acceptance rate less than 4%) and has been invited for reserve site visit at NSF headquarters in November 2017. This 5-year $10M grant builds upon my work on theory-guided data science for creating hybrid models of physics and data science in hydrology.
[06-29-2017] I served on a panel on ‘‘Theory-guided Data Science’’ at the 29th International Conference on Scientific and Statistical Database Management (SSDBM) in Chicago. We had a lively and engaging discussion and it was great to learn from everyone in the panel as well as the audience.
[06-28-2017] My perspective article on theory-guided data science got published at IEEE TKDE. A preprint of this article has already received 1500+ reads on ResearchGate (and still counting) even before publication. This overwhelming response indicates the promise in integrating scientific knowledge with data science methods—a trend that is simultaneously being realized in several scientific disciplines.
[05-25-2017] Our paper on monitoring surface water dynamics (in collaboration with Dr. Dennis Lettenmaier's research group at UCLA) got accepted for publication at Remote Sensing of Environment (RSE) 2017, a top-tier journal in remote sensing.
[05-16-2017] Our paper got accepted at KDD 2017.
[04-29-2017] I served on a panel at the SDM Workshop on Mining Big Data in Climate and Environment, where the topic of discussion was ‘‘Understanding and Narrowing Gaps Between Data Science and Mechanistic Theories in Physical Sciences’’. It was great interacting with everyone and discussing the future of theory-guided data science.
[10-11-2016] Our paper got accepted at IEEE International Conference on Big Data 2016.
[09-21-2016] Our global surface water monitoring system has been invited to contribute to the next generation of Essential Climate Variables (ECV), which will support the climate change adaptation and mitigation efforts of the United Nations Framework Convention on Climate Change (UNFCCC).
[08-23-2016] My work on monitoring surface water dynamics was featured as the central highlight of an NSF news story!
[08-14-2016] Presented our work on modeling the food-energy-water nexus in critical biodiverse landscapes in Cambodia at ACM KDD Workshop on Data Science for Food, Energy and Water. It was a great experience to interact with everyone at KDD!
[06-15-2016] My paper got published in IEEE Geoscience and Remote Sensing Magazine 2016, a top-tier publication venue in remote sensing.
[03-28-2016] Officially became the co-author of the second edition of the textbook ‘‘Introduction to Data Mining.’’ I am excited to be part of this challenging but immensely gratifying journey!
[12-15-2015] My paper got published in IEEE Computing in Science & Engineering 2015.
[11-17-2015] Presented my paper at International Conference on Data Mining (ICDM) 2015.
[05-02-2015] Presented my paper at SIAM International Conference on Data Mining (SDM) 2015.
[04-28-2015] Received the University of Minnesota Doctoral Dissertation Award 2015-16.
[12-15-2014] Received the University of Minnesota Informatics Institute Fellowship 2015-16.
[04-26-2014] Presented my paper at SIAM International Conference on Data Mining (SDM) 2014.
[05-27-2013] Will be working as a summer intern at IBM Research, Yorktown Heights, NY, for the next three months. I am excited to work on spatio-temporal problems in analyzing crime data sets as part of the Smarter Planet Group at IBM.
[10-26-2012] Presented two papers at NASA Conference on Intelligent Data Understanding (CIDU) 2012.

Short Intro: I am interested in developing data mining methods to solve scientific and socially relevant problems. My research focuses on theory-guided data science, a novel paradigm of research that combines scientific knowledge with data science methods for accelerating scientific discovery. I am also a co-author of the second edition of the leading textbook, Introduction to Data Mining. I received my Ph.D. in September 2017 under the guidance of Prof. Vipin Kumar.

NEW: I am applying for tenure-track faculty positions this year. Here are my Job Market Application Materials!
UPDATE: As of Feb 14, 2018, I am interviewing with the following universities (if you are a student interested in working with me, please drop me an email and we can chat during my visit):
  1. [1] Temple University   [01-19-2018]
  2. [2] Michigan State University   [02-16-2018]
  3. [3] Georgia Tech   [02-22-2018]
  4. [4] University of Georgia  [02-23-2018]
  5. [5] University of Illinois at Chicago   [02-27-2018]
  6. [6] Virginia Tech   [03-01-2018 - 03-02-2018]
  7. [7] University of Iowa   [03-19-2018]

Projects

Theory-guided Data Science

TKDE 2017 ArXiv
Theory-guided data science is an emerging paradigm of scientific discovery that aims to integrate scientific knowledge with data science methods to produce physically consistent results. My research builds the foundation of this paradigm and I am currently exploring this paradigm for problems in diverse disciplines such as hydrology, climate science, and computational chemistry.

Introduction to Data Mining (Second Edition)

Book2017
The second edition of this textbook presents new and improved content on several essential topics in data mining such as model overfitting, model evaluation, deep learning, class imbalance, and anomaly detection. Additionally, it introduces an entirely new chapter on avoiding false discoveries for data mining problems–a contribution missing in alternate resources at the required depth and breadth despite its importance.

Spatio-temporal Data Mining

CSUR2017 KDD2017 BigData2017 BigData2016 GRSM2016 CISE2015 ArXiv
Space and time introduce several challenges and opportunities for classical data mining algorithms given the variety of data types, representations, problems, and methods in spatio-temporal settings. My recent survey provides an over-arching structure to the vast and diverse field of spatio-temporal data mining. A recurring theme of my research is to equip data mining methods with a better ability to deal with spatio-temporal data from Earth and environmental sciences.

Predictive Learning with Heterogeneity in Populations

Thesis2017 ICDM2015 SDM2015 SDM2014 CIDU2012
A central challenge in applying standard predictive learning methods for real-world problems is the heterogeneity in data populations, i.e., different groups of instances show different nature of predictive relationships. My dissertation research introduced several novel ways for addressing this challenge, building on ideas from multi-task learning and group-specific local learning.

Global System for Mapping Surface Water Dynamics

RSE2017 CompSus2016
My research has enabled a global surface water monitoring system that provides the first global history of surface water every 8 days for the last 15 years using high-resolution satellite data. This system captures vital information about changes occurring in surface water such as droughts, dam constructions, river meandering, and melting glacial lakes, which was featured in an NSF news story.

Publications

Book

[B1] P. Tan, M. Steinbach, A. Karpatne, and V. Kumar, Introduction to Data Mining, Pearson Addison–Wesley (Second Edition), ISBN-13: 978-0133128901, 2018 [Book Website].  

Journal Articles

[J10] A. Karpatne, G. Atluri, J. Faghmous, M. Steinbach, A. Banerjee, A. Ganguly, S. Shekhar, N. Samatova, and V. Kumar, Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data, IEEE Transactions on Knowledge and Data Engineering (TKDE), 29(10), 2318–2331, 2017 [arXiv, DOI].  
[J9] G. Atluri^{star}, A. Karpatne^{star}, and V. Kumar, Spatio-temporal Data Mining: A Survey of Problems and Methods, ACM Computing Surveys, 2017 (accepted; ^{star} equal contribution ) [arXiv].  
[J8] A. Karpatne, I. Ebert-Uphoff, S. Ravela, H. A. Babaie, and V. Kumar, Machine Learning for the Geosciences: Challenges and Research Opportunities, IEEE TKDE, 2017 (in review) [arXiv].  
[J7] A. Khandelwal^{star}, A. Karpatne^{star}, M.E. Marlier^{star}, J. Kim, D. P. Lettenmaier, and V. Kumar, An Approach for Global Monitoring of Surface Water Extent Variations using MODIS Data, Remote Sensing of Environment, Elsevier, 2017 (^{star} equal contribution) [DOI].  
[J6] A. Karpatne, Z. Jiang, R. R. Vatsavai, S. Shekhar, and V. Kumar, Monitoring Land Cover Changes: A Machine Learning Perspective, IEEE Geoscience and Remote Sensing Magazine, 4(2), 8–21, 2016. [DOI].  
[J5] A. Karpatne and S. Liess, A Guide to Earth Science Data: Summary and Research Challenges, IEEE Computing in Science & Engineering, 17(6), 14–18, 2015. [DOI].  
[J4] F. Schrodt, J. Kattge, H. Shan, F. Fazayeli, J. Joswig, A. Banerjee, M. Reichstein, G. Bonisch, S. Diaz, J. Dickie, A. Gillison, A. Karpatne, S. Lavorel, P.W. Leadley, C. Wirth, I. Wright, S.J. Wright, and P.B. Reich, BHPMF - A Hierarchical Bayesian Approach to Gap-filling and Trait Prediction for Macroecology and Functional Biogeography, Global Ecology and Biogeography, 24(12), 1510–1521, 2015. [DOI].
[J3] R. Khemchandani, A. Karpatne, and S. Chandra, Twin Support Vector Regression for the Simultaneous Learning of a Function and its Derivatives, International Journal of Machine Learning and Cybernetics, 4(1), 51–63, 2013. [DOI].
[J2] R. Khemchandani, A. Karpatne, and S. Chandra, Proximal Support Tensor Machines, International Journal of Machine Learning and Cybernetics, 4(6), 703–712, 2013. [DOI].
[J1] R. Khemchandani, A. Karpatne, and S. Chandra, Generalized Eigenvalue Proximal Support Vector Regressor, Expert Systems with Applications, 38(10), 13136–13142, 2011 [DOI].

Peer-reviewed Conference Papers

[C9] A. Karpatne, W. Watkins, J. Read, and V. Kumar, Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling, SIAM International Conference on Data Mining (SDM), 2018 (in review) [arXiv].  
[C8] X. Jia, Y. Hu, A. Khandelwal, A. Karpatne, and V. Kumar, Joint Sparse Auto-encoder: A Semi-supervised Spatio-temporal Approach in Mapping Large-scale Croplands, IEEE International Conference on Big Data, 2017.  
[C7] S. Agrawal, G. Atluri, A. Karpatne, S. Chatterjee, S. Liess, and V. Kumar, Tripoles: A New Class of Relationships in Time Series Data, ACM International Conference on Knowledge Discovery and Data Mining (KDD), 697–706, 2017 [DOI].  
[C6] X. Jia, X. Chen, A. Karpatne, and V. Kumar, Identifying Dynamic Changes with Noisy Labels in Spatial-temporal Data: A Study on Large-scale Water Monitoring Application, IEEE International Conference on Big Data, 1328–1333, 2016 [DOI].  
[C5] A. Karpatne and V. Kumar, Adaptive heterogeneous ensemble learning using the context of test instances, IEEE International Conference on Data Mining (ICDM), 787–792, 2015. [DOI].  
[C4] A. Karpatne, A. Khandelwal, and V. Kumar, Ensemble learning methods for binary classification with multi-modality within the classes, SDM, (82) 730–738, 2015. [DOI].  
[C3] A. Karpatne, A. Khandelwal, S. Boriah, and V. Kumar, Predictive learning in the presence of heterogeneity and limited training data, SDM, (29) 253–261, 2014. [DOI].  
[C2] A. Karpatne, M. Blank, M. Lau, S. Boriah, K. Steinhaeuser, M. Steinbach, and V. Kumar, Importance of vegetation type in forest cover estimation, NASA Conference on Intelligent Data Understanding (CIDU), 71–78, 2012. [DOI].  
[C1] X. Chen^{star}, A. Karpatne^{star}, Y. Chamber^{star}, V. Mithal, M. Lau, K. Steinhaeuser, S. Boriah, M. Steinbach, V. Kumar, C.S. Potter, S.A. Klooster, T. Abraham, J.D. Stanley, and J.C. Castilla-Rubio, A new data mining framework for forest fire mapping, CIDU, 104–111, 2012 (^{star} equal contribution). [DOI].

Book Chapters

[BC2] A. Karpatne, A. Khandelwal, X. Chen, V. Mithal, J. Faghmous, and V. Kumar, Global monitoring of inland water dynamics: State-of-the-art, challenges, and opportunities, In Computational Sustainability, J. Lassig, K. Kersting, and K. Morik (Eds.), Springer, 121–147, 2016. [DOI].  
[BC1] A. Karpatne, J. Faghmous, J. Kawale, L. Styles, M. Blank, V. Mithal, X. Chen, A. Khandelwal, S. Boriah, K. Steinhaeuser, M. Steinbach, and V. Kumar, Earth science applications of sensor data, In Managing and Mining Sensor Data, C. Aggarwal (Ed.), Springer, 505–530, 2013. [DOI].

Peer-reviewed Workshop Proceedings

[W7] A. Karpatne and V. Kumar, Learning Physics-based Models in Hydrology under the Framework of Generative Adversarial Networks, American Geophysical Union (AGU) Fall Meeting, 2017.
[W6] A. Karpatne, W. Watkins, J. Read, and V. Kumar, Physics-guided Learning of Neural Networks: An Application in Lake Temperature Modeling, NIPS Workshop on Deep Learning for Physical Sciences, 2017.
[W5] A. Karpatne, H. Babaie, S. Ravela, V. Kumar, and I. Ebert-Uphoff, Machine Learning for the Geosciences--Opportunities, Challenges, and Implications for the ML process, SDM Workshop on Mining Big Data in Climate and Environment, 2017.
[W4] S. Gopal, A. Karpatne, and V. Kumar, Modeling the Food-Energy-Water Nexus in Critical Biodiverse Landscapes: A Case Study of Tonle Sap, Cambodia and Tulalip Tribe, USA, ACM KDD Workshop on Data Science for Food, Energy and Water, 2016 [Video].
[W3] A. Karpatne, A. Khandelwal, R. Anderson, M. Blank, S. Boriah, and V. Kumar, Group-specific local learning for global lake monitoring, Fourth International Workshop on Climate Informatics, 2014.
[W2] A. Karpatne, J. Faghmous, M. Blank, R. Anderson, S. Boriah, S. Liess, and V. Kumar, Understanding the Influence of Sea Surface Temperatures on Terrestrial Ecosystem Disturbances, Third International Workshop on Climate Informatics, 2013.
[W1] A. Karpatne, M. Blank, J. Middleton, S. Boriah, K. Steinhaeuser, M. Steinbach, S. Chatterjee, and V. Kumar, Understanding relationships between fire activity and sea surface temperature anomalies, American Geophysical Union (AGU) Fall Meeting, 2012.

Ph.D. Dissertation

Predictive Learning with Heterogeneity in Populations, University of Minnesota, 2017.  

Talks

Panel Discussions

[PD2] Theory-guided Data Science: A New Paradigm for Scientific Discovery, Panel Discussion at International Conference on Scientific and Statistical Database Management, June 29, 2017.
[PD1] Understanding and Narrowing Gaps Between Data Science and Mechanistic Theories in Physical Sciences, Panel Discussion at SDM Workshop on Mining Big Data in Climate and Environment, April 29, 2017.

Invited Talks

[T4] How Can Physics Inform Deep Learning Methods in Earth System Science?: Recent Progress and Future Prospects, Invited Keynote Talk at ICDM Workshop on Data Mining in Earth System Science, November 18, 2017.
[T3] Theory-guided Data Science: A New Paradigm for Scientific Discovery in the Era of Big Data, Invited Talk at American Institute of Chemical Engineers (AIChE) Annual Meeting, October 30, 2017.
[T2] Theory-guided Data Science: A New Paradigm for Advancing Scientific Discovery in the Big Data Era, Invited Seminar Talk at University of Notre Dame, February 9, 2017.
[T1] Global Monitoring of Inland Surface Water Dynamics Using Remote Sensing Data, 96th American Meteorological Society Annual Meeting, January 11—14, 2016.