Go to the U of M home page.

Research Projects:

Discovery of Error-Tolerant Patterns from Binary and Real-valued Data

Since traditional frequent pattern mining algorithms works with only binary or boolean attributes, it requires transformation of real-valued attributes to binary attributes, which often results in loss of information. We developed a novel Error-Tolerant Frequent Itemset (ETFI) algorithm for binary as well as real-valued data, which can sequentially discover all ETFIs in bottom-up fashion from any real-valued data sets.

Quantitative Evaluation of Error-Tolerant Pattern Mining Algorithms

Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life data sets, this limits the recovery of frequent patterns as they are fragmented due to random noise and other errors in the data. We implemented a suite of algorithms to discover approximate frequent itemsets in the presence of noise.

Pattern Mining based Integrative Biomarker Discovery Using Gene Expression and Protein-protein Interaction Network

Most of the complex biological problems like biomarker discovery, protein function prediction and others require more information than provided by any individual biological data. For example, biomarkers or functional modules identified using information from both gene expression and protein-protein interaction data are more reliable and biologically plausible than those obtained from individual data sources. This is because both gene expression data and protein interaction data are noisy and therefore a set of genes that co-expresses as well as physically interact with each other is more likely to be significant and biologically relevant. We developed  an association mining based framework to perform an integrated analysis of microarray gene-expression data and protein-protein interaction data in order to efficiently discover active sub-network based biomarkers.

Discovery of Quality Markers of Colonoscopy for Detection of Colorectal Cancer: Characterization of Missed Neoplasia (In Collaboration with Mayo Clinic Rochester)

Colonoscopy is the most accepted screening method for the detection of colorectal cancer or its precursor lesions, colorectal polyps. Though, colonoscopy has contributed to a decline in the number of colorectal cancer related deaths, not all cancers or large polyps are detected at the time of colonoscopy. Hence, it is important to develop methods to understand and quantify the cause of these omissions. In the past, researchers have studied the effect of various factors (mostly one at a time) in predicting adenomas during colonoscopy. For example, recently it was shown that the endoscopist performing the procedure is a more powerful predictor than age or gender of the patient, which were earlier considered to be the most powerful predictors, in detecting adenomas during colonoscopy. The goal of our data mining effort is to systematically identify all the factors (not just the known ones) and more importantly, combinations of them that may result in missed adenomas and polyps during colonoscopy.

Discovery and Characteristics of Patients with Idiosyncratic Drug-Induced Liver Injury (In Collaboration with Mayo Clinic Rochester)

The most common reason for new drugs not passing the final clinical trials for FDA approval is liver injury. Idiosyncratic drug-induced liver injury, its name already says so, at present cannot be predicted. Thus identifying which patients are more likely to develop an idiosyncratic drug reaction would be a major scientific achievement. It is hypothesized that there are identifiable clinical, environmental, and genetic differences between those patients who have had such a reaction and those that did not. The goal is to develop data mining based techniques to identify patients with drug-induced injury and then determine whether these patients have one or more inherited or acquired genome-based susceptibility factors that result in drug-induced liver injury.

Data Mining for Prediction of Degree of Liver Fibrosis (In Collaboration with Mayo Clinic Rochester)

Liver cirrhosis is a common lethal disease that is most often caused by alcoholism and viral hepatitis. Life expectancy is greatly influenced by the degree of liver fibrosis, which in most fibrosis classification is scored from F0 (no fibrosis) to F4 (liver cirrhosis). Currently, the most accurate way of measuring fibrosis is by liver biopsy. However, due to its invasive nature, liver biopsy is not performed frequently, and for this reason, physicians rely on less accurate laboratory tests, with their inherent deficiencies. Therefore, the goal of this work is to develop and apply data mining based techniques to first identify the right features (combination of commonly available laboratory tests, obtained over time during routine patient care), and then use them to predict liver cirrhosis and hepatocellular cancer at an early stage. It is believed that such techniques hold the promise of empowering physicians to improve diagnostic processes without the need for invasive procedures.

Unsupervised Techniques for Finding Overlapping  Co-clusters in the Data

Overlapping co-clustering is an interesting clustering problem, which is of immense use in several real life domains such as gene expression data, documents, and movie recommender systems, where overlapping co-clusters are desired. Most of the current approaches either deal with co-clustering or with overlapping aspect of this problem. We explored two unsupervised learning approaches - frequent pattern mining based approach and alternate minimization based approach - to generate overlapping co-clusters in a given data matrix.Our primary focus is to apply these approaches to gene expression data. We performed experiments both on synthetic and real gene-expression datasets to show the correctness of the algorithms and to show the applicability of the proposed approaches in the domain of microarray data analysis.


Clicky Web Analytics

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.