Research
The two major directions of research in the group are:Computational Approaches for Protein Function Prediction [Poster describing this research]
The knowledge of the functions of proteins in various organisms is a crucial link in the development of new drugs, better crops, and the development of synthetic biochemicals such as biofuels. Recently, the wide availability of a variety of high-throughput experimental genomic data has motivated the development of several computational techniques for producing hypotheses about protein function that can be verified in the laboratory. This poster describes our group's efforts in the advancing the state of the art in computational protein function prediction by applying data mining techniques to address issues that have not been sufficiently addressed in previous work. These issues include effective pre-processing of genomic data sets in order to enhance their functional content, effective use of the arrangement of functional classes in hierarchies such as the Gene Ontology in order to reflect the inter-relationships between them, and the combination of heterogenous data sources to provide richer input data to function prediction algorithms. Our results using interaction networks, gene expression data and phylogenetic profiles indicate that these efforts are indeed useful for producing more accurate hypotheses about protein function and functional relationships that can be verified effectively by functional genomics research.
Data Mining for Connecting Disease Characteristics with Genomic and Phenotypic Factors [Poster describing this research]
The recent availability of electronic medical records and individual genomic information has created the possibility of using this combined patient information to discover important connections between phenotypic expression of disease (signs, symptoms, laboratory tests and images) and genetic factors (DNA sequence, gene copy number, epigenetics). Current techniques that are used to discover such connections have a number of limitations and are sometimes not suitable for discovering the desired connections in the noisy, high-dimensional data sets that will contain this new wealth of phenotypic and genomic patient data. We collaborate with several researchers to develop data mining techniques for the following important healthcare problems.
- Prediction of the degree of liver fibrosis (Collaboration with Mayo Clinic, Rochester, MN)
- Characterization of Missed Neoplasia during Colonoscopy (Collaboration with Mayo Clinic, Rochester, MN)
- Connecting Disease and Genomic/Medical Characteristics (Collaboration with IBM Rochester and IBM T. J. Watson Research Center)