On first glance agent technology seems more like a hostile intruder into the database world. On the other hand, the two could easily complement each other, since agents carry out information processes whereas databases supply information to processes. Nonetheless, to view agent technology from a database perspective seems to question some of the basic paradigms of database technology, particularly the premise of semantic consistency of a database. The paper argues that the ensuing uncertainty in distributed databases can be modelled by beliefs, and develops the basic concepts for adjusting peer-to-peer databases to the individual beliefs in single nodes and collective beliefs in the entire distributed database.

The FungalWeb Ontology aims to support the data integration needs of enzyme biotechnology from inception to product roll out. Serving as a knowledge base for decision support, the conceptualization seeks to link fungal species with enzymes, enzyme substrates, enzyme classifications, enzyme modifications, enzyme retail and applications. We demonstrate how the FungalWeb Ontology supports this remit by presenting application scenarios, conceptualizations of the ontological frame able to support these scenarios and semantic queries typical of a Biotech Manager. Queries to the knowledge base are answered with description logic (DL) and automated reasoning tools.

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

We present ERSS 2005, our entry to this year's DUC competition. With only slight modifications from last year's version to accommodate the more complex context information present in DUC 2005, we achieved a similar performance to last year's entry, ranking roughly in the upper third when examining the ROUGE-1 and Basic Element score.

We also participated in the additional manual evaluation based on the new Pyramid method and performed further evaluations based on the Basic Elements method and the automatic generation of Pyramids. Interestingly, the ranking of our system differs greatly between the different measures; we attempt to analyse this effect based on correlations between the different results using the Spearman coefficient.

Constructing focused, context-based multi-document summaries requires an analysis of the context questions, as well as their corresponding document sets. We present a fuzzy cluster graph algorithm that finds entities and their connections between context and documents based on fuzzy coreference chains and describe the design and implementation of the ERSS summarizer implementing these ideas.

Protein structure visualization tools render images that allow the user to explore structural features of a protein. Context specific information relating to a particular protein or protein family is, however, not easily integrated and must be uploaded from databases or provided through manual curation of input files. Protein Engineers spend considerable time iteratively reviewing both literature and protein structure visualizations manually annotated with mutated residues. Meanwhile, text mining tools are increasingly used to extract specific units of raw text from scientific literature and have demonstrated the potential to support the activities of Protein Engineers.

The transfer of mutation specific raw-text annotations to protein structures requires integrated data processing pipelines that can co-ordinate information retrieval, information extraction, protein sequence retrieval, sequence alignment and mutant residue mapping. We describe the Mutation Miner pipeline designed for this purpose and present case study evaluations of the key steps in the process. Starting with literature about mutations made to protein families; haloalkane dehalogenase, bi-phenyl dioxygenase, and xylanase we enumerate relevant documents available for text mining analysis, the available electronic formats, and the number of mutations made to a given protein family. We review the efficiency of NLP driven protein sequence retrieval from databases and report on the effectiveness of Mutation Miner in mapping annotations to protein structure visualizations. We highlight the feasibility and practicability of the approach.


Together with Juergen Rilling from Concordia University and Philippe Charland from the DRDC Canada I'm organizing a workshop at CASCON 2007: Traceability in Software Engineering—Past, Present and Future. It's on October 25 at the Sheraton Parkway Toronto North Hotel and Convention Centre, Ontario, Canada.

A new technical report on Text Mining (in German) is now available. This is a collection of reports written by students within a Hauptseminar, which was given by yours truly and Jutta Mülle at Universität Karlsruhe, Germany.

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.

The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

I just posted a small update to my multi-lingual noun phrase chunker (MuNPEx) for GATE.

Supported languages are now English, German, French, and Spanish (beta).

Invited Talk, Semantic Support for Software Requirements Engineering: The ReqWiki Approach, March 13, 2015, Institute for Infocomm Research (I2R), Singapore.

Invited Talk, Bringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients, May 24, 2011, Workshop on Text Summarization (TS11), St. John's, Newfoundland, Canada.

Keynote Talk, Natural Language Processing for the Masses: The Semantic Assistants Project, November 10th, 2010, Atlantic Workshop on Semantics and Services (AWOSS10.2), Moncton, NB, Canada.

Invited Talk, NLP for the Masses: Integrating GATE with Desktop Clients, September 1st, 2010, 3rd GATE Training Course and Developer Sprint (FIG3), Concordia University, Montréal, Québec, Canada.

Invited Talk, Am Anfang war das Mock-up: Zur Genese des DEAFél aus Sicht des Informatikers, July 2nd, 2010, Heidelberger Akademie der Wissenschaften, Germany.

Invited Talk, Software Engineering and Natural Language Processing: Friends or Foes? February 26th, 2010. University of Sheffield, UK.

Invited Talk, Software Engineering and Natural Language Processing: Friends or Foes? July 30th, 2009. Friedrich Schiller University of Jena, Germany.

Invited Talk, Software Engineering and Natural Language Processing: Friends or Foes? February 26th, 2009. Colloques du DIRO, Université de Montréal, Québec, Canada.

Invited Talk, Semantic Software Engineering. January 14th, 2008. Concordia University, Montréal, Québec, Canada.

Invited Talk, Ontology and Text Mining: Connecting your documents with the real world. July 5th, 2007. University of Aberdeen, Scotland, UK.

Invited Speaker, Empowering Software Maintainers with Semantic Web Technologies. June 6th, 2007. 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007), Innsbruck, Austria.

Invited Talk, Ontology and Text Mining: Connecting your documents with the real world. February 28th, 2007. Institute for Infocomm Research (I2R), Knowledge Discovery Department, Singapore.

Invited Talk, Ontology and Text Mining: Connecting your documents with the real world. February 23rd, 2007. Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea.

Invited Talk, Text Mining. June 28th, 2005. Artificial Intelligence Seminar Series, University of Luxembourg.

Invited Talk, Text Mining. December 9th, 2004. European Media Lab (EML), Villa Bosch, Heidelberg, Germany.

Invited Talk, Representing and Processing Uncertain and Imprecise Information. April 15th, 2002. Concordia University, Montréal, Québec, Canada.



Dr. René Witte is currently a tenured associate professor within the Department of Computer Science and Software Engineering at Concordia University in Montréal, Canada, where he is heading the Semantic Software Lab. Previously, he worked at Universität Karlsruhe (TH) (now Karlsruhe Institute of Technology, KIT) in Germany within Prof. Dr. Peter C. Lockemann's research group at the Institute for Program Structures and Data Organization (IPD). He also has more than five years of professional work experience in the IT and software industry. Dr. Witte received his Diplom (the German equivalent of a M.Sc.) in Informatics (Computer Science) in 1996 and his "Dr.-Ing." (Doctor of Engineering) in 2002, both from the Faculty of Informatics at Universität Karlsruhe. During the last 10 years, he co-authored more than 70 publications and received four best paper&poster awards. His research has been funded by major funding agencies and industry, including NSERC, MITACS, and the DRDC, and spans both national and international cooperations. He has also given invited talks and conference keynote speeches on multiple occasions and is a frequent reviewer for international conferences, workshops, and projects.

(This web page is about my book, "Architecture of Fuzzy Information Systems", which is written in German. You can try a Google translation.)


Informationssysteme gehen heute aufgrund der eingesetzten Modelle und Technologien davon aus, daß die verwalteten Daten immer präzise, sicher und konsistent sind. Doch die Wirklichkeit sieht anders aus: Informationen sind tatsächlich oft ungenau, vage, unsicher oder inkonsistent.

Insbesondere bei komplexen Informationssystemen, die eine möglichst naturgetreue Abbildung der Realität erreichen sollen, möchte man aber diese sogenannten Imperfektionen nicht verlieren, sondern sie vielmehr explizit repräsentieren, um daraus für die Entwicklung und den Anwender Vorteile zu schöpfen: eine Bank etwa hat großes Interesse an einer korrekten Beschreibung der Kreditwürdigkeit eines Kunden, ein Umweltinformationssystem muß glaubwürdige Daten über die Umweltbelastung einer Region vermitteln, ebenso ein Verkehrsleitsystem über mögliche Staugefahr. Business-to-Business Marktplätze brauchen Informationen über die Zuverlässigkeit von Geschäftspartnern, Elektronische Bibliotheken über die Relevanz aufgespürter Textstellen.

Zur Modellierung solcher unscharfer und unsicherer Daten läßt sich die sogenannte Fuzzy-Theorie verwenden, die bereits in vielen anderen Bereichen, wie der Steuer- und Regelungstechnik, erfolgreich industriell eingesetzt wird. Für Informationssysteme existierte jedoch bisher keine systematische Vorgehensweise zur Erweiterung existierender Modelle, Technologien und Architekturen, die kompatibel mit etablierten Standards bleibt und die neuen Möglichkeiten in orthogonaler Weise einbettet. Im vorliegenden Buch, das auf der Dissertation des Autors beruht, wird nun erstmals ein komplettes Architekturmodell für die Entwicklung von Fuzzy-Informationssystemen vorgestellt. Nach einer Einführung in die notwendigen Grundlagen aus der Fuzzy-Theorie wird ein für Informationssysteme geeignetes Modell formal aufgebaut, und es wird gezeigt, wie dieses Modell mit gängigen objektorientierten Sprachen realisiert werden kann. Für die Systementwicklung schließlich wird eine passende Referenzarchitektur vorgestellt, die sich an aktuellen, mehrstufigen Client/Server-Architekturen orientiert.

Darüber hinaus bietet das Buch dem Praktiker zwei konkrete Anwendungsbeispiele, ein Fuzzy-Entscheidungshilfesystem und ein Fuzzy-Textanalysesystem, anhand derer die Entwicklung von Fuzzy-Anwendungen detailliert beschrieben wird.

