A Self-Learning Context-Aware Lemmatizer for German

Vancouver Waterfront

Abstract

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

Reference

Praharshana Perera and René Witte, A Self-Learning Context-Aware Lemmatizer for German. Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pp. 636–643, October 6–8, 2005, Vancouver, B.C., Canada.

Bibtex entry (also for download):

@InProceedings{perera-witte:2005:HLTEMNLP,
  author    = {Praharshana Perera and Ren\'{e} Witte},
  title     = {{A Self-Learning Context-Aware Lemmatizer for German}},
  booktitle = {Proceedings of Human Language Technology Conference and 
Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005)},
  month     = {October 6--8},
  year      = {2005},
  address   = {Vancouver, British Columbia, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {636--643},
  url       = {http://www.aclweb.org/anthology/H/H05/H05-1080}
}

You can also visit the conference website.

Software

The Durm German Lemmatization System is available as free/open source software.

Download

URL: http://acl.ldc.upenn.edu//H/H05/H05-1080.pdf. Also available: local copy.
MD5 checksum: 967bc9caf77b5ab09dbfecfa6b74b973

Copyright © 2005 ACL.