Recent posts
Semantic Technologies in System Maintenance (STSM 2008)
Abstract
This paper gives a brief overview of the International Workshop on Semantic Technologies in System Maintenance. It describes a number of semantic technologies (e.g., ontologies, text mining, and knowledge integration techniques) and identifies diverse tasks in software maintenance where the use of semantic technologies can be beneficial, such as traceability, system comprehension, software artifact analysis, and information integration.
Enhancing the OpenOffice.org Word Processor with Natural Language Processing Capabilities
Abstract
Today's knowledger workers are often overwhelmed by the vast amount of readily available natural language documents that are potentially relevant for a given task. Natural language processing (NLP) and text mining techniques can deliver automated analysis support, but they are often not integrated into commonly used desktop clients, such as word processors. We present a plug-in for the OpenOffice.org word processor Writer that allows to access any kind of NLP analysis service mediated through a service-oriented architecture. Semantic Assistants can now provide services such as information extraction, question-answering, index generation, or automatic summarization directly within an end user's application.
Professional Activities
I have been involved in a number of review and event organization activities.
New Job, New Website
Submitted by rene on Sat, 2008-05-31 19:00.A Semantic Wiki Approach to Cultural Heritage Data Management
Abstract
Providing access to cultural heritage data beyond book digitization and information retrieval projects is important for delivering advanced semantic support to end users, in order to address their specific needs. We introduce a separation of concerns for heritage data management by explicitly defining different user groups and analyzing their particular requirements. Based on this analysis, we developed a comprehensive system architecture for accessing, annotating, and querying textual historic data. Novel features are the deployment of a Wiki user interface, natural language processing services for end users, metadata generation in OWL ontology format, SPARQL queries on textual data, and the integration of external clients through Web Services. We illustrate these ideas with the management of a historic encyclopedia of architecture.
Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles
Abstract
Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new media that rely heavily on citation and quotation of previous postings, as for instance in blogs or newsgroups. This paper details the basic processing steps for reported speech analysis and reports on performance of an implementation in form of a GATE resource.
Deadline extended for STSM
Submitted by rene on Wed, 2008-04-16 22:36.Traceability in Software Engineering - Past, Present and Future
CASCON 2007 Workshop Report
IBM Technical Report: TR-74-211
October 25, 2007
Abstract
Many changes have occurred in software engineering research and practice since 1968, when software engineering as a research domain was established. One of these research areas is traceability, a key aspect of any engineering discipline, enables engineers to understand the relations and dependencies among various artifacts in a system.
Call for Papers: International Workshop on Semantic Technologies in System Maintenance (STSM 2008)
Submitted by rene on Tue, 2008-02-12 21:19.Together with Jürgen Rilling, Dragan Gaševi?, and Jeff Z. Pan I'm organizing the first International Workshop on Semantic Technologies in System Maintenance (STSM 2008), which will be co-located with the 16th IEEE International Conference on Program Comprehension (ICPC 2008) in Amsterdam, The Netherlands.
Detailed information on the workshop, submission guidelines, and other news are now available from the workshop's webpage.
Workshop on Semantic Technologies in System Maintenance at ICPC 2008
Submitted by rene on Wed, 2008-01-23 21:33.A Unified Ontology-Based Process Model for Software Maintenance and Comprehension
Abstract
In this paper, we present a formal process model to support the comprehension and maintenance of software systems. The model provides a formal ontological representation that supports the use of reasoning services across different knowledge resources. In the presented approach, we employ our Description Logic knowledge base to support the maintenance process management, as well as detailed analyses among resources, e.g., the traceability between various software artifacts. The resulting unified process model provides users with active guidance in selecting and utilizing these resources that are context-sensitive to a particular comprehension task. We illustrate both, the technical foundation based on our existing SOUND environment, as well as the general objectives and goals of our process model.
Keywords: Software maintenance, process modeling, ontological reasoning, software comprehension, traceability, text mining.
An Ontological Software Comprehension Process Model
Abstract
Comprehension is an essential part of software maintenance. Only software that is well understood can evolve in a controlled manner. In this paper, we present a formal process model to support the comprehension of software systems by using Ontology and Description Logic. This formal representation supports the use of reasoning services across different knowledge resources and therefore, enables us to provide users with guidance during the comprehension process that is context sensitive to their particular comprehension task.
Keywords: Software maintenance, program comprehension, process modeling, ontological reasoning
An Ontology-based Approach for the Recovery of Traceability Links
Abstract
Traceability links provide support for software engineers in understanding the relations and dependencies among software artifacts created during the software development process. In this research, we focus on re-establishing traceability links between existing source code and documentation to support reverse engineering. We present a novel approach that addresses this issue by creating formal ontological representations for both the documentation and source code artifacts.
A Context-Driven Software Comprehension Process Model
Abstract
Comprehension is an essential part of software evolution. Only software that is well understood can evolve in a controlled manner. In this paper, we present a formal process model to support the comprehension of software systems by using Ontology and Description Logic. This formal representation supports the use of reasoning services across different knowl- edge resources and therefore, enables us to provide users with guidance during the comprehension process that is context sensitive to their particular comprehension task. As part of the process model, we also adopt a new interactive story metaphor, to represent the interactions between users and the comprehension process.
Keywords: Software evolution, program comprehension, process modeling, story metaphor, ontological reasoning
Ontology-based Program Comprehension Tool Supporting Website Architectural Evolution
Abstract
A challenge of existing program comprehension approaches is to provide consistent and flexible representations for software systems. Maintainers have to match their mental models with the different representations these tools provide. In this paper, we present a novel approach that addresses this issue by providing a consistent ontological representation for both source code and documentation. The ontological representation unifies information from various sources, and therefore reduces the maintainers’ comprehension efforts. In addition, representing software artifacts in a formal ontology enables maintainers to formulate hypotheses about various properties of software systems. These hypotheses can be validated through an iterative exploration of information derived by our ontology inference engine. The implementation of our approach is presented in detail, and a case study is provided to demonstrate the applicability of our approach during the architectural evolution of a website content management system.
Keywords: Program Comprehension, Software Evolution, Ontology, Automated Reasoning
Tutorial: Applications for the Semantic Web
Description
The Semantic Web vision is considered the next generation of the Web that enables sharing data, resources and knowledge between parties that belong to different organizations, different cultures, and/or different communities. Ontologies and rules play the main role in the Semantic Web for publishing community vocabularies and policies, for annotating resources and for turning Web applications into inference-enabled collaboration platforms. After a short introduction into the basic concepts, standards, and tools of the Semantic Web, we present how today's Semantic Web tools, languages, and techniques can be used in various application. We first start from the use of the Semantic Web technologies for providing online educators with feedback about how their students use online courses in learning management systems. Next, we demonstrate the use of the Semantic Web technologies and text mining techniques to improve software development process and software maintenance. Finally, we explain the use of the Semantic Web technologies in multimedia-enhanced applications.
Tutorial: Introduction to Text Mining
Tutorial Description
Do you have a lack of information? Or do you rather feel overwhelmed by the sheer amount of (online) available content, like emails, news, web pages, and electronic documents? The rather young field of Text Mining developed from the observation that most knowledge today - more than 80% of the data stored in databases - is hidden within documents written in natural languages and thus cannot be automatically processed by traditional information systems.
Text Mining, "also known as intelligent text analysis, text data mining or knowledge-discovery in text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text." Text Mining is a highly interdisciplinary field, drawing on foundations and technologies from fields like computational linguistics, database systems, and artificial intelligence, but applying these in new and often unconventional ways.
Text Mining: Wissensgewinnung aus natürlichsprachigen Dokumenten
(This webpage is about a technical report on Text Mining, written in German. Try Google Translate for an English version.)
Interner Bericht 2006-5, Fakultät für Informatik, Universität Karlsruhe (TH), Germany
Herausgegeben von René Witte und Jutta Mülle
ISSN 1432-7864
200 Seiten, 75 Abbildungen
Mutation Miner - Textual Annotation of Protein Structures
Abstract
Protein structure visualization tools render images that allow the user to explore structural features of a protein. Context specific information relating to a particular protein or protein family is not easily integrated and must be uploaded from databases or provided through manual curation of input files. We describe a mixed natural language processing and protein sequence analysis approach for the retrieval of mutation specific annotations from full text articles for rendering with protein structures.
Fuzzy Extensions for Reverse Engineering Repository Models
Abstract
Reverse Engineering is a process fraught with imperfections. The importance of dealing with non-precise, possibly inconsistent data explicitly when interacting with the reverse engineer has been pointed out before.
In this paper, we go one step further: we argue that the complete reverse engineering process must be augmented with a formal representation model capable of modeling imperfections. This includes automatic as well as human-centered tools.
We show how this can be achieved by merging a fuzzy set-theory based knowledge representation model with a reverse engineering repository. Our approach is not only capable of modeling a wide range of different kinds of imperfections (uncertain as well as vague information), but also admits robust processing models by defining explicit degrees of certainty and their modification through fuzzy belief revision operators.
The repository-centered approch is proposed as the foundation for a new generation of reverse engineering tools. We show how various RE tasks can benefit from our approach and state first design ideas for fuzzy reverse engineering tools.
Mutation Miner (CPI 2005)
Introduction
Biological researchers today have access to vast amounts of exponentially growing research data in a structured form within several publicly accessible databases. A large proportion of salient information is however still hidden within individual research papers, since costly manual database curation efforts are overwhelmed by the scale of new information being generated. In the domain of protein engineering, critical units of information required from the literature include: the identity of the mutated protein, the identity and position of wild type residues that are mutated, the identity of the resulting mutant residues and the impacts of the mutations on functional properties of the proteins.
Mutation Miner is a system designed to automate the extraction of mutations and textual annotations describing the impacts of mutations on protein properties (mutation annotations) from full text scientific literature. Furthermore, the system retrieves and carries out bioinformatic analyses on mutated sequences providing the mapped coordinates of mutants on a selected structure. Integration of multiple formatted mutation annotations with associated residue coordinates facilitates their rendering with structure visualization tools. We describe the architecture and tools that support Mutation Miner (Text mining-NLP, Sequence Analysis, Structure Visualization) and present performance evaluations that demonstrate the feasibility of this approach.
Mutation Miner (ISMB 2005)
Introduction
Biological researchers today have access to vast amounts of exponentially growing research data in a structured form within several publicly accessible databases. A large proportion of salient information is however still hidden within individual research papers, since costly manual database curation efforts are overwhelmed by the scale of new information being generated. In the domain of protein engineering, critical units of information required from the literature include: the identity of the mutated protein, the identity and position of wild type residues that are mutated, the identity of the resulting mutant residues and the impacts of the mutations on functional properties of the proteins.
Mutation Miner is a system designed to automate the extraction of mutations and textual annotations describing the impacts of mutations on protein properties (mutation annotations) from full text scientific literature. Furthermore, the system retrieves and carries out bioinformatic analyses on mutated sequences providing the mapped coordinates of mutants on a selected structure. Integration of multiple formatted mutation annotations with associated residue coordinates facilitates their rendering with structure visualization tools. We describe the architecture and tools that support Mutation Miner (Text mining-NLP, Sequence Analysis, Structure Visualization) and present performance evaluations that demonstrate the feasibility of this approach.
Empowering the Enzyme Biotechnologist with Ontologies
Introduction
The FungalWeb Ontology is a knowledge representation vehicle designed to integrate information relevant to industrial applications of enzymes. The ontology integrates information from established sources and supports complex queries to the instantiated FungalWeb knowledge base. The ontology represents prototype Semantic Web technology customized to the domain of industrial enzymes with a focus on enzyme discovery, commercial enzyme products and vendors, and the industrial applications and benefits of industrial enzymes. Using a series of application scenarios we demonstrate the utility of this 'Semantic Web' infrastructure to the enzyme biotechnologist.
Ontology Design for Biomedical Text Mining
Abstract
Text Mining in biology and biomedicine requires a large amount of domain-specific knowledge. Publicly accessible resources hold much of the information needed, yet their practical integration into natural language processing (NLP) systems is fraught with manifold hurdles, especially the problem of semantic disconnectedness throughout the various resources and components. Ontologies can provide the necessary framework for a consistent semantic integration, while additionally delivering formal reasoning capabilities to NLP.
In this chapter, we address four important aspects relating to the integration of ontology and NLP: (i) An analysis of the different integration alternatives and their respective vantages; (ii) The design requirements for an ontology supporting NLP tasks; (iii) Creation and initialization of an ontology using publicly available tools and databases; and (iv) The connection of common NLP tasks with an ontology, including technical aspects of ontology deployment in a text mining framework. A concrete application example—text mining of enzyme mutations—is provided to motivate and illustrate these points.
Keywords: Text Mining, NLP, Ontology Design, Ontology Population, Ontological NLP
Fuzzy Set Theory-Based Belief Processing for Natural Language Texts
Introduction
The growing number of publicly available information sources makes it impossible for individuals to keep track of all the various opinions on one topic. The goal of our artificial believer system we present in this paper is to extract and analyze opinionated statements from newspaper articles.
Beliefs are modeled with a fuzzy-theoretic approach applied after NLP-based information extraction. A fuzzy believer models a human agent, deciding what statements to believe or reject based on different, configurable strategies.