Documents written in natural languages constitute a major part of the software engineering lifecycle artifacts. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. In this paper, we present a text mining system capable of populating a software ontology with information detected in documents.
Graphical visualizations of coreference chains support a system developer in analyzing the behavior of a resolution algorithm. In this paper, we state explicit use cases for coreference chain visualizations and show how they can be resolved by transforming chains into other, standardized data formats, namely Topic Maps and Ontologies.
The growing number of publicly available information sources makes it impossible for individuals to keep track of all the various opinions on one topic. The goal of our artificial believer system presented in this paper is to extract and analyze statements of opinion from newspaper articles.
Beliefs are modeled using a fuzzy-theoretic approach applied after NLP-based information extraction. A fuzzy believer models a human agent, deciding what statements to believe or reject based on different, configurable strategies.
Classical multi-document summaries focus on the common topics of a document set and omit distinctive themes particular to a single document—thereby often suppressing precisely that kind of information a user might need for a specific task. This can be avoided through advanced multi-document summaries that take a user's context and history into account, by delivering focused, contrastive, or update summaries. To facilitate the generation of these different summaries, we propose to generate all types from a single data structure, topic clusters, which provide for an abstract representation of a set of documents. Evaluations carried out on five years' worth of data from the DUC summarization competition prove the feasibility of this approach.
We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results from an initial implementation based on the GATE framework for NLP and the MediaWiki system.
General Terms: Design, Human Factors, Languages
Keywords: Self-aware Wiki System, Wiki/NLP Integration
Fuzzy sets, having been the long-standing mainstay of modeling and manipulating imperfect information, are an obvious candidate for representing uncertain beliefs.
Unfortunately, unadorned fuzzy sets are too limited to capture complex or potentially inconsistent beliefs, because all too often they reduce to absurdities ("nothing is possible") or trivialities ("everything is possible").
However, we show that by combining the syntax of propositional logic with the semantics of fuzzy sets a rich framework for expressing and manipulating uncertain beliefs can be created, admitting Gärdenfors-style expansion, revision, and contraction operators and being moreover amenable to easy integration with conventional ``crisp'' information processing.
The model presented here addresses many of the shortcomings of traditional approaches for building fuzzy data models, which will hopefully lead to a wider adoptance of fuzzy technologies for the creation of information systems.
fuzzy belief revision, fuzzy information systems, soft computing, fuzzy object-oriented data model
We present a fuzzy-theory based approach to coreference resolution and its application to text summarization.
Automatic determination of coreference between noun phrases is fraught with uncertainty. We show how fuzzy sets can be used to design a new coreference algorithm which captures this uncertainty in an explicit way and allows us to define varying degrees of coreference.
The algorithm is evaluated within a system that participated in the 10-word summary task of the DUC 2003 competition.
We present a system that produces 10-word summaries based on the single summarization strategy of outputting noun phrases representing the most important text entities (as represented by noun phrase coreference chains). The coreference chains were computed using fuzzy set theory combined with knowledge-poor corefernce heuristics.
The different stages in the life-cycle of contentcreation, storage, retrieval, and analysisare usually regarded as distinct and isolated steps. In this paper we examine the synergies resulting from their integration within a single architecture.
Our goal is to employ such an architecture to improve user support for knowledge-intensive tasks. We present a case study from the area of building architecture, which is currently ongoing.
Software reverse engineering (RE) is often hindered not by the lack of available data, but by an overabundance of it: the (semi-)automatic analysis of static and dynamic code information, data, and documentation results in a huge heap of often incomparable data. Additionally, the gathered information is typically fraught with various kinds of imperfections, for example conflicting information found in software documentation vs. program code.
Our approach to this problem is twofold: for the management of the diverse RE results we propose the use of a repository, which supports an iterative and incremental discovery process under the aid of a reverse engineer. To deal with imperfections, we propose to enhance the repository model with additional representation and processing capabilities based on fuzzy set theory and fuzzy belief revision.
fuzzy reverse engineering, meta model, extension framework, iterative process, knowledge evolution