Term Identification

The goal of this research is to explore statistical techniques to identify terms (and/or collocations) from raw text and determine the syntactic structure of noun phrase terms.

Software

LogLikelihood for 3-grams
LogLikelihood for 4-grams
LogLikelihood for 5-grams
LogLikelihood Modeling for 3-grams
LogLikelihood Modeling for 4-grams

Presentations

Incorporating Ngram Statistics in the Normalization of Clinical Notes

Publications

Determining the Syntactic Structure of Medical Terms in Clinical Notes. Bridget T. McInnes, Ted Pedersen, and Serguei V. Pakhomov. In Proceedings of the BioNLP Workshop at ACL, June 29, 2007, Prague, Czech Republic. (slides: ppt)

Resolving Structural Ambiguity of Medical Terms with Statistical Model Fitting. Serguei V. Pakhomov and Bridget T. McInnes, Linguistic Society of America (LSA) Presentation, 2005.

Extending the Log Likelihood Measure to Improve Collocation Identification Bridget Thomson McInnes. Master of Science Thesis. Department of Computer Science, University of Minnesota, Duluth, December, 2004.

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.