Combining Biological Databases and Text Mining to support New Bioinformatics Applications

Alicante, Spain

Abstract

A large amount of biological knowledge today is only available from full-text research papers. Since neither manual database curators nor users can keep up with the rapidly expanding volume of scientific literature, natural language processing approaches are becoming increasingly important for bioinformatic projects.

In this paper, we go beyond simply extracting information from full-text articles by describing an architecture that supports targeted access to information from biological databases using the results derived from text mining of research papers, thereby integrating information from both sources within a biological application.

The described architecture is currently being used to extract information about protein mutations from full-text research papers. Text mining results drive the retrieval of sequence information from protein databases and the employment of algorithmic sequence analysis tools, which facilitate further data access from protein structure databases. Complex mapping of NLP derived text annotations to protein structures allows the rendering, with 3D structure visualization, of information not available in databases of mutation annotations.

Reference

René Witte and Christopher J. O. Baker, Combining Biological Databases and Text Mining to support New Bioinformatics Applications. 10th International Conference on Applications of Natural Language to Information Systems (NLDB 2005), Springer LNCS 3513, pp. 310-321, June 15-17, 2005, Alicante, Spain. DOI: 10.1007/11428817_28

Bibtex entry (also for download):

@InProceedings{witte-baker-nldb05,
  author = 	 {Ren{\'{e}} Witte and Christopher J.\,O. Baker},
  title = 	 {{Combining Biological Databases and Text Mining to 
                 support New Bioinformatics Applications}},
  booktitle =	 {Natural Language Processing and Information Systems: 
                 10th International Conference on Applications of 
                 Natural Language to Information Systems (NLDB 2005)},
  pages =	 {310--321},
  year =	 {2005},
  volume =	 {3513},
  series =	 {LNCS},
  address =	 {Alicante, Spain},
  month =	 {June 15--17},
  publisher =	 {Springer-Verlag},
  note =	 {\url{http://dx.doi.org/10.1007/11428817_28}}
}

You can also visit the conference website.

Software

For downloading our open source software, please refer to the successor project, Open Mutation Miner (OMM).

Download

Official web page at SpringerLink, DOI: 10.1007/11428817_28.
In: Springer Lecture Notes in Computer Science, LNCS 3513.
Local copy: witte_baker_nldb05.pdf.
MD5 checksum: acadfef13e019c9998d2ba392f87ccf4

Copyright © 2005 Springer-Verlag. This is the author's version of the work. It is posted here by permission of Springer for your personal use. Not for redistribution.