Recent posts

The FungalWeb Ontology: Application Scenarios

Abstract

The FungalWeb Ontology aims to support the data integration needs of enzyme biotechnology from inception to product roll out. Serving as a knowledge base for decision support, the conceptualization seeks to link fungal species with enzymes, enzyme substrates, enzyme classifications, enzyme modifications, enzyme retail and applications. We demonstrate how the FungalWeb Ontology supports this remit by presenting application scenarios, conceptualizations of the ontological frame able to support these scenarios and semantic queries typical of a Biotech Manager. Queries to the knowledge base are answered with description logic (DL) and automated reasoning tools.

A Self-Learning Context-Aware Lemmatizer for German

Vancouver Waterfront

Abstract

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

ERSS 2005: Coreference-Based Summarization Reloaded

Abstract

Friendly Meetings in Vancouver
We present ERSS 2005, our entry to this year's DUC competition. With only slight modifications from last year's version to accommodate the more complex context information present in DUC 2005, we achieved a similar performance to last year's entry, ranking roughly in the upper third when examining the ROUGE-1 and Basic Element score.

We also participated in the additional manual evaluation based on the new Pyramid method and performed further evaluations based on the Basic Elements method and the automatic generation of Pyramids. Interestingly, the ranking of our system differs greatly between the different measures; we attempt to analyse this effect based on correlations between the different results using the Spearman coefficient.

Context-based Multi-Document Summarization using Fuzzy Coreference Cluster Graphs

The IPD cluster computing cluster summaries using a clustering algorithm :)

Abstract

Constructing focused, context-based multi-document summaries requires an analysis of the context questions, as well as their corresponding document sets. We present a fuzzy cluster graph algorithm that finds entities and their connections between context and documents based on fuzzy coreference chains and describe the design and implementation of the ERSS summarizer implementing these ideas.

Mutation mining - A prospector's tale

Screenshot of ProSAT/Webmol with MutationMiner annotations

Abstract

Protein structure visualization tools render images that allow the user to explore structural features of a protein. Context specific information relating to a particular protein or protein family is, however, not easily integrated and must be uploaded from databases or provided through manual curation of input files. Protein Engineers spend considerable time iteratively reviewing both literature and protein structure visualizations manually annotated with mutated residues. Meanwhile, text mining tools are increasingly used to extract specific units of raw text from scientific literature and have demonstrated the potential to support the activities of Protein Engineers.

The transfer of mutation specific raw-text annotations to protein structures requires integrated data processing pipelines that can co-ordinate information retrieval, information extraction, protein sequence retrieval, sequence alignment and mutant residue mapping. We describe the Mutation Miner pipeline designed for this purpose and present case study evaluations of the key steps in the process. Starting with literature about mutations made to protein families; haloalkane dehalogenase, bi-phenyl dioxygenase, and xylanase we enumerate relevant documents available for text mining analysis, the available electronic formats, and the number of mutations made to a given protein family. We review the efficiency of NLP driven protein sequence retrieval from databases and report on the effectiveness of Mutation Miner in mapping annotations to protein structure visualizations. We highlight the feasibility and practicability of the approach.

Keywords

Text mining - Protein structure annotation - Protein mutation - Data mining - Haloalkane dehalogenase - Biphenyl dioxygenase - Xylanase

Workshop on Traceability at CASCON 2007

Together with Juergen Rilling from Concordia University and Philippe Charland from the DRDC Canada I'm organizing a workshop at CASCON 2007: Traceability in Software Engineering—Past, Present and Future. It's on October 25 at the Sheraton Parkway Toronto North Hotel and Convention Centre, Ontario, Canada.

Technical Report on Text Mining (in German)

A new technical report on Text Mining (in German) is now available. This is a collection of reports written by students within a Hauptseminar, which was given by yours truly and Jutta Mülle at Universität Karlsruhe, Germany.

Durm German Lemmatizer v1.0 Released

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.

The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

Multi-lingual Noun Phrase Chunker Updated

I just posted a small update to my multi-lingual noun phrase chunker (MuNPEx) for GATE.

Changes in v0.2 are:
o preliminary Spanish support (see below)
o renamed from "NPE" to "MuNPEx" in a blatant attempt on Googlewhacking
o small cleanups
o now comes with a sample NE transducer for number markup to improve chunking
Supported languages are now English, German, French, and Spanish (beta).

Selected Talks

Invited Talks and Keynotes

Invited Talk, Semantic Support for Software Requirements Engineering: The ReqWiki Approach, March 13, 2015, Institute for Infocomm Research (I2R), Singapore.

Invited Talk, Bringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients, May 24, 2011, Workshop on Text Summarization (TS11), St. John's, Newfoundland, Canada.

Keynote Talk, Natural Language Processing for the Masses: The Semantic Assistants Project, November 10th, 2010, Atlantic Workshop on Semantics and Services (AWOSS10.2), Moncton, NB, Canada.

Invited Talk, NLP for the Masses: Integrating GATE with Desktop Clients, September 1st, 2010, 3rd GATE Training Course and Developer Sprint (FIG3), Concordia University, Montréal, Québec, Canada.

Invited Talk, Am Anfang war das Mock-up: Zur Genese des DEAFél aus Sicht des Informatikers, July 2nd, 2010, Heidelberger Akademie der Wissenschaften, Germany.

Invited Talk, Software Engineering and Natural Language Processing: Friends or Foes? February 26th, 2010. University of Sheffield, UK.

Invited Talk, Software Engineering and Natural Language Processing: Friends or Foes? July 30th, 2009. Friedrich Schiller University of Jena, Germany.

Invited Talk, Software Engineering and Natural Language Processing: Friends or Foes? February 26th, 2009. Colloques du DIRO, Université de Montréal, Québec, Canada.

Invited Talk, Semantic Software Engineering. January 14th, 2008. Concordia University, Montréal, Québec, Canada.

Invited Talk, Ontology and Text Mining: Connecting your documents with the real world. July 5th, 2007. University of Aberdeen, Scotland, UK.

Invited Speaker, Empowering Software Maintainers with Semantic Web Technologies. June 6th, 2007. 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007), Innsbruck, Austria.

Invited Talk, Ontology and Text Mining: Connecting your documents with the real world. February 28th, 2007. Institute for Infocomm Research (I2R), Knowledge Discovery Department, Singapore.

Invited Talk, Ontology and Text Mining: Connecting your documents with the real world. February 23rd, 2007. Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea.

Invited Talk, Text Mining. June 28th, 2005. Artificial Intelligence Seminar Series, University of Luxembourg.

Invited Talk, Text Mining. December 9th, 2004. European Media Lab (EML), Villa Bosch, Heidelberg, Germany.

Invited Talk, Representing and Processing Uncertain and Imprecise Information. April 15th, 2002. Concordia University, Montréal, Québec, Canada.

Research

Biography

Dr. René Witte is currently a tenured associate professor within the Department of Computer Science and Software Engineering at Concordia University in Montréal, Canada, where he is heading the Semantic Software Lab. Previously, he worked at Universität Karlsruhe (TH) (now Karlsruhe Institute of Technology, KIT) in Germany within Prof. Dr. Peter C. Lockemann's research group at the Institute for Program Structures and Data Organization (IPD). He also has more than five years of professional work experience in the IT and software industry. Dr. Witte received his Diplom (the German equivalent of a M.Sc.) in Informatics (Computer Science) in 1996 and his "Dr.-Ing." (Doctor of Engineering) in 2002, both from the Faculty of Informatics at Universität Karlsruhe. During the last 10 years, he co-authored more than 70 publications and received four best paper&poster awards. His research has been funded by major funding agencies and industry, including NSERC, MITACS, and the DRDC, and spans both national and international cooperations. He has also given invited talks and conference keynote speeches on multiple occasions and is a frequent reviewer for international conferences, workshops, and projects.

Architektur von Fuzzy-Informationssystemen

(This web page is about my book, "Architecture of Fuzzy Information Systems", which is written in German. You can try a Google translation.)

Buch-Cover

Architektur von Fuzzy-Informationssystemen

von René Witte

ISBN 3-8311-4149-5

330 Seiten, 82 Abbildungen

Copyright © 2002 René Witte
Alle Rechte liegen beim Autor.

Bezugsquellen

Inhaltsbeschreibung

Informationssysteme gehen heute aufgrund der eingesetzten Modelle und Technologien davon aus, daß die verwalteten Daten immer präzise, sicher und konsistent sind. Doch die Wirklichkeit sieht anders aus: Informationen sind tatsächlich oft ungenau, vage, unsicher oder inkonsistent.

Insbesondere bei komplexen Informationssystemen, die eine möglichst naturgetreue Abbildung der Realität erreichen sollen, möchte man aber diese sogenannten Imperfektionen nicht verlieren, sondern sie vielmehr explizit repräsentieren, um daraus für die Entwicklung und den Anwender Vorteile zu schöpfen: eine Bank etwa hat großes Interesse an einer korrekten Beschreibung der Kreditwürdigkeit eines Kunden, ein Umweltinformationssystem muß glaubwürdige Daten über die Umweltbelastung einer Region vermitteln, ebenso ein Verkehrsleitsystem über mögliche Staugefahr. Business-to-Business Marktplätze brauchen Informationen über die Zuverlässigkeit von Geschäftspartnern, Elektronische Bibliotheken über die Relevanz aufgespürter Textstellen.

Zur Modellierung solcher unscharfer und unsicherer Daten läßt sich die sogenannte Fuzzy-Theorie verwenden, die bereits in vielen anderen Bereichen, wie der Steuer- und Regelungstechnik, erfolgreich industriell eingesetzt wird. Für Informationssysteme existierte jedoch bisher keine systematische Vorgehensweise zur Erweiterung existierender Modelle, Technologien und Architekturen, die kompatibel mit etablierten Standards bleibt und die neuen Möglichkeiten in orthogonaler Weise einbettet. Im vorliegenden Buch, das auf der Dissertation des Autors beruht, wird nun erstmals ein komplettes Architekturmodell für die Entwicklung von Fuzzy-Informationssystemen vorgestellt. Nach einer Einführung in die notwendigen Grundlagen aus der Fuzzy-Theorie wird ein für Informationssysteme geeignetes Modell formal aufgebaut, und es wird gezeigt, wie dieses Modell mit gängigen objektorientierten Sprachen realisiert werden kann. Für die Systementwicklung schließlich wird eine passende Referenzarchitektur vorgestellt, die sich an aktuellen, mehrstufigen Client/Server-Architekturen orientiert.

Darüber hinaus bietet das Buch dem Praktiker zwei konkrete Anwendungsbeispiele, ein Fuzzy-Entscheidungshilfesystem und ein Fuzzy-Textanalysesystem, anhand derer die Entwicklung von Fuzzy-Anwendungen detailliert beschrieben wird.

Welcome to Dr. René Witte's Homepage




Welcome to my personal homepage. Here you can find information on my current research as well as my publications and other activities. More research-related information is published on semanticsoftware.info, where you can also contact me. For the socially networked, I'm also on Google+, LinkedIn, Xing and Twitter.