Data mining for biomedical informatics at the UMN

Research

The two major directions of research in the group are:

Data Mining for Connecting Disease Characteristics with Genomic and Clinical Factors

The recent availability of individual genomic, proteomic and metabolomic information and electronic medical records has created the possibility of using this combined patient information to discover important connections between phenotypic expression of disease (signs, symptoms, laboratory tests and images) and genetic factors (DNA sequence, gene copy number, epigenetics, gene expression, metabolomics etc). Current techniques that are used to discover such connections have a number of limitations and are sometimes not suitable for discovering the desired connections in the noisy, high-dimensional data sets that will contain this new wealth of clinical and genomic patient data. We collaborate with several researchers to develop data mining techniques for the following important healthcare problems.

Collaborators

Computational Approaches for Protein Function Prediction [Poster describing this research]

The knowledge of the functions of proteins in various organisms is a crucial link in the development of new drugs, better crops, and the development of synthetic biochemicals such as biofuels. Recently, the wide availability of a variety of high-throughput experimental genomic data has motivated the development of several computational techniques for producing hypotheses about protein function that can be verified in the laboratory. This poster describes our group's efforts in the advancing the state of the art in computational protein function prediction by applying data mining techniques to address issues that have not been sufficiently addressed in previous work. These issues include effective pre-processing of genomic data sets in order to enhance their functional content, effective use of the arrangement of functional classes in hierarchies such as the Gene Ontology in order to reflect the inter-relationships between them, and the combination of heterogenous data sources to provide richer input data to function prediction algorithms. Our results using interaction networks, gene expression data and phylogenetic profiles indicate that these efforts are indeed useful for producing more accurate hypotheses about protein function and functional relationships that can be verified effectively by functional genomics research.

Collaborators