Tutorial: Introduction to Text Mining

Tutorial Description

Do you have a lack of information? Or do you rather feel overwhelmed by the sheer amount of (online) available content, like emails, news, web pages, and electronic documents? The rather young field of Text Mining developed from the observation that most knowledge today - more than 80% of the data stored in databases - is hidden within documents written in natural languages and thus cannot be automatically processed by traditional information systems.

Text Mining, "also known as intelligent text analysis, text data mining or knowledge-discovery in text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text." Text Mining is a highly interdisciplinary field, drawing on foundations and technologies from fields like computational linguistics, database systems, and artificial intelligence, but applying these in new and often unconventional ways.

There already exists a high commercial interest as well, both in the areas of applied research performed by companies like Google or IBM and industrially deployed applications, especially in the pharmaceutical domain and within governmental intelligence agencies.

In this tutorial, we will give an introduction to the field of text mining and provide participants with the necessary theoretical and technical foundations for an understanding of current and emerging research and technology. Several application examples are then examined in detail, like the automatic summarization of documents, the extraction of biological knowledge from research papers, and the analysis of user's opinions on products from web sites.

Target audience are researchers and practitioners with a database and information system background, but no experience in natural language processing (NLP) or computational linguistics. Thus, the tutorial will provide the necessary theoretical foundations and terminology to understand issues and methods from the text mining domain. A strong focus will be placed on practical applications of text mining, introducing publicly available tools and resources in the presentation.

Additional Information

This tutorial was presented at the 10th International Conference on Extending Database Technology (EDBT 2006), 26-31 March 2006, Munich, Germany.

Copyright © 2006 René Witte. All rights reserved.