Languages

A Self-Learning Context-Aware Lemmatizer for German

Vancouver Waterfront

Abstract

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

Durm German Lemmatizer v1.0 Released

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.

The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

Multi-lingual Noun Phrase Chunker Updated

I just posted a small update to my multi-lingual noun phrase chunker (MuNPEx) for GATE.

Changes in v0.2 are:
o preliminary Spanish support (see below)
o renamed from "NPE" to "MuNPEx" in a blatant attempt on Googlewhacking
o small cleanups
o now comes with a sample NE transducer for number markup to improve chunking
Supported languages are now English, German, French, and Spanish (beta).

Syndicate content