Bio-anomaly detection and preventing DNS DDoS amplification attack.

Will post about research and materials soon.

Team: Senior Research Scientist Dr. Siva Rajagopalan, Research Scientist Dr. Jun Huh Ho, Researcher Henrik Holmes at Honeywell. Professor Dr. Nina Fefferman and Research Scientist Dr. Ali Hamieh at Rutger University. And of course me :)

Read the Web

Never Ending Language Learning (NELL) was my first project in machine learning. It was part of the my summer intern programme funded by India-Brazil government. In summer 2011, I started working in the MaLL Lab of Federal University of Sao Carlos, Brazil. It was quite exciting, as it was something new that i have never experienced before. Fun of doing research. On first day, my adviser, Prof. Estevam Hrsuchuka introduced me to the world of machine learning and explained one of its application- NELL briefly. Its a machine learning system running at CMU to extract structured information from unstructured web pages. If successful, this will result in a knowledge base (i.e., a relational database) of structured information that mirrors the content of the Web.

We consider the problem of semi-supervised learning approach to extract category instances (e.g. country(USA), city(New York)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. We believe that this problem can be overcome by simultaneously learning independent classifiers in a new approach named Coupled Bayesian Sets algorithm, based on Bayesian Sets, for many different categories and relations (in the presence of an ontology defining constraints that couple the training of these classifiers). Experimental results show that simultaneously learning a coupled collection of classifiers resulted in much more accurate extractions than training classifiers through original Bayesian Sets algorithm, Naive Bayes, BaS-all and Coupled Pattern Learner (the category extractor used in NELL). You can get latest updates here. You can also download our work on NELL.

Download Paper Dwonload ECML PPT

Personalize Expedia Hotel Searches

We picked Personalize Expedia Hotel Searches - ICDM 2013, as our machine learning course project. The project explores Learn to Rank problems in machine learning. Learning to Rank (LeToR) is an important class of machine learning problems that focuses on finding an optimal sequence of documents as a function of some user query. In this project we consider hotel search and click-through data provided by the popular travel website, with the goal of developing a model that will provide a list of hotels ranked by highest likelihood of customer purchase. In this work, we propose to use the framework from the modified logistic regression and random forest to describe a model for ranking. Both models provide the satisfactory results but regrettably unable to match the performance of wining algorithm. As per my knowledge, LamdaMart was the one. It is also the same algorithm which won Yahoo 2010 Learn To Rank Competition!. So, if you have any LeTor problem in hand, begin with this. Download project report from here.

Download Report

Africa Soil Property Prediction Challenge

This is another course project which I enjoyed working on it. It deals with developing regression models for predicting physical and chemical properties of soil using infrared spectral measurements. Thus concerns with modeling high dimensional data along with domain knowledge of infrared spectrum. Multiple versions of SVM and random forest are considered with appropriate feature selection process. One thing I learnt that is to avoid disparties between training and testing error your model should have good generalization capabilities and sometimes cross validation error using K fold is not sufficent to validate any such disparties. But it still remains to be the most promising method for validating the test error.

Download Report