Text Mining and Software Engineering: An Integrated Source Code and Document Analysis Approach



Documents written in natural languages constitute a major part of the artifacts produced during the software engineering lifecycle. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. In this paper, we present a text mining system capable of populating a software ontology with information detected in documents. A particular novelty is the integration of results from automated source code analysis into an NLP pipeline, allowing to cross-link software artifacts represented in code and natural language on a semantic level.


René Witte, Qiangqiang Li, Yonggang Zhang, and Juergen Rilling. Text Mining and Software Engineering: An Integrated Source Code and Document Analysis Approach. IET Software Journal, Volume 2, Issue 1, 2008, pp.3-16. DOI: 10.1049/iet-sen:20070110. Special Section on Natural Language in Software Engineering.

