Durm German Lemmatizer v1.0 Released

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.

The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

The Durm German Lemmatization System consists of a number of GATE components and resources that perform morphological analysis and lemmatization for German nouns. It includes:
o The Case Tagger, which adds case information (Nominativ, Genitiv, Dativ, Akkusativ) to nouns;
o The POS-based Number Tagger, which adds number information (singular, plural);
o The Morphological Analyzer, which classifies nouns into morphological classes;
o The Lemmatizer, which annotates nouns with their lemma.

Additionally, it comes with two main resources:
o Case Tagger Probabilities, a set of resource files with statistical information for the HMM module
o German Lexicon, an automatically created and updated German lexicon containing lemma, number, and case information for nouns.

The Durm Lemmatizer has been developed by Praharshana Perera within René Witte's Text Mining Group at the IPD, Universität Karlsruhe, Germany. All components, as well as the lexicon, are distributed as free/open source software under the GNU GPL license.

For the motivation and theoretical background behind this system have a look at our paper A Self-Learning Context-Aware Lemmatizer for German.