BUILDING SEMANTIC NETWORKS FROM PLAIN TEXT AND WIKIPEDIA WITH APPLICATION TO SEMANTIC RELATEDNESS AND NOUN COMPOUND PARAPHRASING
The construction of suitable and scalable representations of semantic knowledge is a core challenge in Semantic Computing. Manually created resources such as WordNet have been shown to be useful for many AI and NLP tasks, but they are inherently restricted in their coverage and scalability. In addition, they have been challenged by simple distributional models on very large corpora, questioning the advantage of structured knowledge representations. We present a framework for building large-scale semantic networks automatically from plain text and Wikipedia articles using only linguistic analysis tools. Our constructed resources cover up to 2 million concepts and were built in less than 6 days. Using the task of measuring semantic relatedness, we show that we achieve results comparable to the best WordNet based methods as well as the best distributional methods while using a corpus of a size several magnitudes smaller. In addition, we show that we can outperform both types of methods by combining the results of our two network variants. Initial experiments on noun compound paraphrasing show similar results, underlining the quality as well as the flexibility of our constructed resources.