NLP

Towards a Systematic Evaluation of Protein Mutation Extraction Systems

Abstract

The development of text analysis systems targeting the extraction of information about mutations from research publications is an emergent topic in biomedical research. Current systems differ in both scope and approaches, which prevents a meaningful comparison of their performance and therefore possible synergies. To overcome this "evaluation bottleneck," we developed a comprehensive framework for the systematic analysis of mutation extraction systems, precisely defining tasks and corresponding evaluation metrics that will allow a comparison of existing and future applications.

Keywords: mutation extraction systems; mutation evaluation tasks; mutation evaluation metrics

Protein Domains

Task-Dependent Visualization of Coreference Resolution Results

A single coreference chains visualized as a Topic Map

Abstract

Graphical visualizations of coreference chains support a system developer in analyzing the behavior of a resolution algorithm. In this paper, we state explicit use cases for coreference chain visualizations and show how they can be resolved by transforming chains into other, standardized data formats, namely Topic Maps and Ontologies.

Processing of Beliefs extracted from Reported Speech in Newspaper Articles

A fuzzy believer?

Abstract

The growing number of publicly available information sources makes it impossible for individuals to keep track of all the various opinions on one topic. The goal of our artificial believer system presented in this paper is to extract and analyze statements of opinion from newspaper articles.

Beliefs are modeled using a fuzzy-theoretic approach applied after NLP-based information extraction. A fuzzy believer models a human agent, deciding what statements to believe or reject based on different, configurable strategies.

Next-Generation Summarization: Contrastive, Focused, and Update Summaries

Conference Hotel, Borovets, Bulgaria

Abstract

Classical multi-document summaries focus on the common topics of a document set and omit distinctive themes particular to a single document—thereby often suppressing precisely that kind of information a user might need for a specific task. This can be avoided through advanced multi-document summaries that take a user's context and history into account, by delivering focused, contrastive, or update summaries. To facilitate the generation of these different summaries, we propose to generate all types from a single data structure, topic clusters, which provide for an abstract representation of a set of documents. Evaluations carried out on five years' worth of data from the DUC summarization competition prove the feasibility of this approach.

Connecting Wikis and Natural Language Processing Systems

Palais de Congres, Montreal, Canada

Abstract

We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results from an initial implementation based on the GATE framework for NLP and the MediaWiki system.

General Terms: Design, Human Factors, Languages
Keywords: Self-aware Wiki System, Wiki/NLP Integration

Fuzzy Coreference Resolution for Summarization

Venice

Abstract

We present a fuzzy-theory based approach to coreference resolution and its application to text summarization.

Automatic determination of coreference between noun phrases is fraught with uncertainty. We show how fuzzy sets can be used to design a new coreference algorithm which captures this uncertainty in an explicit way and allows us to define varying degrees of coreference.

The algorithm is evaluated within a system that participated in the 10-word summary task of the DUC 2003 competition.

Using Knowledge-poor Coreference Resolution for Text Summarization

Abstract

Edmonton
We present a system that produces 10-word summaries based on the single summarization strategy of outputting noun phrases representing the most important text entities (as represented by noun phrase coreference chains). The coreference chains were computed using fuzzy set theory combined with knowledge-poor corefernce heuristics.

Multi-ERSS and ERSS 2004

Abstract

Last year, we presented a system, ERSS, which constructed 10 word summaries in form of a list of noun phrases. It was based on a knowledge-poor extraction of noun phrase coreference chains implemented on a fuzzy set theoretic base. This year we present the performance of an improved version, ERSS 2004 and an extension of the same basic system: Multi-ERSS constructs 100-word extract summaries for clusters of texts. With very few modifications we ran ERSS 2004 on Tasks 1 and 3 and Multi-ERSS on Tasks 2, 4, and 5, scoring generally above average in all but the linguistic quality aspects.

The FungalWeb Ontology: Application Scenarios

Abstract

The FungalWeb Ontology aims to support the data integration needs of enzyme biotechnology from inception to product roll out. Serving as a knowledge base for decision support, the conceptualization seeks to link fungal species with enzymes, enzyme substrates, enzyme classifications, enzyme modifications, enzyme retail and applications. We demonstrate how the FungalWeb Ontology supports this remit by presenting application scenarios, conceptualizations of the ontological frame able to support these scenarios and semantic queries typical of a Biotech Manager. Queries to the knowledge base are answered with description logic (DL) and automated reasoning tools.

A Self-Learning Context-Aware Lemmatizer for German

Vancouver Waterfront

Abstract

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

Syndicate content