Term Identification

The goal of this research is to explore statistical techniques to identify terms (and/or collocations) from raw text and determine the syntactic structure of noun phrase terms.

Software

LogLikelihood for 3-grams
LogLikelihood for 4-grams
LogLikelihood for 5-grams
LogLikelihood Modeling for 3-grams
LogLikelihood Modeling for 4-grams

Presentations

Incorporating Ngram Statistics in the Normalization of Clinical Notes

Publications

Determining the Syntactic Structure of Medical Terms in Clinical Notes. Bridget T. McInnes, Ted Pedersen, and Serguei V. Pakhomov. In Proceedings of the BioNLP Workshop at ACL, June 29, 2007, Prague, Czech Republic. (paper: pdf slides: ppt)

Resolving Structural Ambiguity of Medical Terms with Statistical Model Fitting. Serguei V. Pakhomov and Bridget T. McInnes, Linguistic Society of America (LSA) Presentation, 2005. (abstract: pdf)

Extending the Log Likelihood Measure to Improve Collocation Identification Bridget Thomson McInnes, December 2004, University of Minnesota Duluth. (Masters thesis: ps)