Main»Home Page

Home Page

Retrieving pertinent information from biological scientific literature requires cutting-edge text mining methods which may be able to recognize the meaning of the very ambiguous names of biological entities. Aliases of a same gene share a common vocabulary in their respective collections of PubMed abstracts. This may be true even when these aliases are not associated with the same subset of documents. This gene-specific vocabulary defines a unique fingerprint that can be used to disclose ambiguous aliases. We present herein an original method for automatically assessing the ambiguity levels of gene aliases in large gene terminologies based exclusively in the content of their associated literature. The method can deal with the two major problems restricting the usage of current text mining tools: 1) different names associated with the same gene; and 2) one name associated with multiple genes, or even with non-gene meanings. Additionally, this method does not require training examples.