Enhanced Semantic Access to the Protein Engineering Literature using Ontologies Populated by Text Mining

Abstract

The biomedical literature is growing at an ever-increasing rate, which pronounces the need to support scientists with advanced, automated means of accessing knowledge. We investigate a novel approach employing description logics (DL)-based queries made to formal ontologies that have been created using the results of text mining full-text research papers. In this paradigm, an OWL-DL ontology becomes populated with instances detected through natural language processing (NLP). The generated ontology can be queried by biologists using DL reasoners or integrated into bioinformatics workflows for further automated analyses. We demonstrate the feasibility of this approach with a system targeting the protein mutation literature.

Keywords: text mining; semantic web; ontological NLP; protein mutations; automated reasoning in bioinformatics; querying OWL-DL ontologies; description logics.

An automatically populated organism entity in SWOOP

Reference

Witte, R., Kappler, T. and Baker, C.J.O. (2007) ‘Enhanced semantic access to the protein engineering literature using ontologies populated by text mining’, Int. J. Bioinformatics Research and Applications, Vol. 3, No. 3, pp.389–413.

Bibtex entry (also for download):

@Article{WKB_IJBRA2007,
  author =  {Ren{\'{e}} Witte and Thomas Kappler and Christopher J. O. Baker},
  title =   {Enhanced semantic access to the protein engineering literature 
             using ontologies populated by text mining},
  journal = {Int.\ J.\ Bioinformatics Research and Applications (IJBRA)},
  year =    {2007},
  volume =  {3},
  number =  {3},
  pages =   {389--413},
  note =    {PMID: 18048198}
}

You can also:

Software

For downloading our open source software, please refer to the successor project, Open Mutation Miner (OMM).

Download

Postprint: witte_etal_ijbra2007.pdf
MD5 checksum: 4fc8775371b2fe7a64aac4e73a2e5b53

Copyright © 2007 Inderscience Enterprises Ltd. This is the authors' postprint version of the work. It is posted here by permission of Inderscience Publishers for your personal use. Not for redistribution. The definitive version was published in the International Journal of Bioinformatics Research and Applications (IJBRA), DOI: 10.1504/IJBRA.2007.015009