Wikipedia-based Semantic Interpretation for Natural Language Processing

Journal of Artificial Intelligence Research ◽

10.1613/jair.2669 ◽

2009 ◽

Vol 34 ◽

pp. 443-498 ◽

Cited By ~ 137

Author(s):

E. Gabrilovich ◽

S. Markovitch

Keyword(s):

Natural Language ◽

Language Processing ◽

Text Categorization ◽

Semantic Analysis ◽

Dimensional Space ◽

Semantic Relatedness ◽

Knowledge Bases ◽

Semantic Interpretation ◽

World Knowledge ◽

Fine Grained

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

Download Full-text

Design and realization of a modular architecture for textual entailment

Natural Language Engineering ◽

10.1017/s1351324913000351 ◽

2013 ◽

Vol 21 (2) ◽

pp. 167-200 ◽

Cited By ~ 10

Author(s):

SEBASTIAN PADÓ ◽

TAE-GIL NOH ◽

ASHER STERN ◽

RUI WANG ◽

ROBERTO ZANOLI

Keyword(s):

Natural Language ◽

Language Processing ◽

Linguistic Analysis ◽

Knowledge Bases ◽

Practical Implementation ◽

World Knowledge ◽

Modular Architecture ◽

Open Platform ◽

The Core ◽

Textual Entailment

AbstractA key challenge at the core of many Natural Language Processing (NLP) tasks is the ability to determine which conclusions can be inferred from a given natural language text. This problem, called theRecognition of Textual Entailment (RTE), has initiated the development of a range of algorithms, methods, and technologies. Unfortunately, research on Textual Entailment (TE), like semantics research more generally, is fragmented into studies focussing on various aspects of semantics such as world knowledge, lexical and syntactic relations, or more specialized kinds of inference. This fragmentation has problematic practical consequences. Notably, interoperability among the existing RTE systems is poor, and reuse of resources and algorithms is mostly infeasible. This also makes systematic evaluations very difficult to carry out. Finally, textual entailment presents a wide array of approaches to potential end users with little guidance on which to pick. Our contribution to this situation is the novel EXCITEMENT architecture, which was developed to enable and encourage the consolidation of methods and resources in the textual entailment area. It decomposes RTE into components with strongly typed interfaces. We specify (a) a modular linguistic analysis pipeline and (b) a decomposition of the ‘core’ RTE methods into top-level algorithms and subcomponents. We identify four major subcomponent types, including knowledge bases and alignment methods. The architecture was developed with a focus on generality, supporting all major approaches to RTE and encouraging language independence. We illustrate the feasibility of the architecture by constructing mappings of major existing systems onto the architecture. The practical implementation of this architecture forms the EXCITEMENT open platform. It is a suite of textual entailment algorithms and components which contains the three systems named above, including linguistic-analysis pipelines for three languages (English, German, and Italian), and comprises a number of linguistic resources. By addressing the problems outlined above, the platform provides a comprehensive and flexible basis for research and experimentation in textual entailment and is available as open source software under the GNU General Public License.

Download Full-text

Knowledge bases and description logics applications to natural language texts analysis

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2020.02-03.259 ◽

2020 ◽

pp. 259-269

Author(s):

H.I. Hoherchak ◽

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Description Logics ◽

Knowledge Bases ◽

Parts Of Speech ◽

Resolution Problem ◽

Concepts Of Knowledge ◽

Speech Tagging

The article describes some ways of knowledge bases application to natural language texts analysis and solving some of their processing tasks. The basic problems of natural language processing are considered, which are the basis for their semantic analysis: problems of tokenization, parts of speech tagging, dependency parsing, correference resolution. The basic concepts of knowledge bases theory are presented and the approach to their filling based on Universal Dependencies framework and the correference resolution problem is proposed. Examples of applications for knowledge bases filled with natural language texts in practical problems are given, including checking constructed syntactic and semantic models for consistency and question answering.

Download Full-text

Can MOOC Instructor Be Portrayed by Semantic Features? Using Discourse and Clustering Analysis to Identify Lecture-Style of Instructors in MOOCs

Frontiers in Psychology ◽

10.3389/fpsyg.2021.751492 ◽

2021 ◽

Vol 12 ◽

Author(s):

Changcheng Wu ◽

Junyi Li ◽

Ye Zhang ◽

Chunmei Lan ◽

Kaiji Zhou ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Analysis ◽

Sentence Length ◽

Semantic Features ◽

Course Quality ◽

Fine Grained ◽

Massive Open Online Course ◽

Complex Words

Nowadays, most courses in massive open online course (MOOC) platforms are xMOOCs, which are based on the traditional instruction-driven principle. Course lecture is still the key component of the course. Thus, analyzing lectures of the instructors of xMOOCs would be helpful to evaluate the course quality and provide feedback to instructors and researchers. The current study aimed to portray the lecture styles of instructors in MOOCs from the perspective of natural language processing. Specifically, 129 course transcripts were downloaded from two major MOOC platforms. Two semantic analysis tools (linguistic inquiry and word count and Coh-Metrix) were used to extract semantic features including self-reference, tone, effect, cognitive words, cohesion, complex words, and sentence length. On the basis of the comments of students, course video review, and the results of cluster analysis, we found four different lecture styles: “perfect,” “communicative,” “balanced,” and “serious.” Significant differences were found between the different lecture styles within different disciplines for notes taking, discussion posts, and overall course satisfaction. Future studies could use fine-grained log data to verify the results of our study and explore how to use the results of natural language processing to improve the lecture of instructors in both MOOCs and traditional classes.

Download Full-text

State of Art for Semantic Analysis of Natural Language Processing

Qubahan Academic Journal ◽

10.48161/qaj.v1n2a44 ◽

2021 ◽

Vol 1 (2) ◽

Author(s):

Dastan Hussen Maulud ◽

Subhi R. M. Zeebaree ◽

Karwan Jacksi ◽

Mohammed A. Mohammed Sadeeq ◽

Karzan Hussein Sharif

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Prediction Error ◽

Semantic Analysis ◽

Essential Feature ◽

Semantic Interpretation ◽

The Subject ◽

State Of Art

Semantic analysis is an essential feature of the NLP approach. It indicates, in the appropriate format, the context of a sentence or paragraph. Semantics is about language significance study. The vocabulary used conveys the importance of the subject because of the interrelationship between linguistic classes. In this article, semantic interpretation is carried out in the area of Natural Language Processing. The findings suggest that the best-achieved accuracy of checked papers and those who relied on the Sentiment Analysis approach and the prediction error is minimal.

Download Full-text

State of Art for Semantic Analysis of Natural Language Processing

Qubahan Academic Journal ◽

10.48161/qaj.v1n2a40 ◽

2021 ◽

Vol 1 (2) ◽

pp. 21-28

Author(s):

Dastan Hussen Maulud ◽

Subhi R. M. Zeebaree ◽

Karwan Jacksi ◽

Mohammed Mohammed Sadeeq ◽

Karzan Hussein Sharif

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Prediction Error ◽

Semantic Analysis ◽

Essential Feature ◽

Semantic Interpretation ◽

The Subject ◽

State Of Art

Download Full-text

Filtered collocations as features in verbal polysemy disambiguation

Language and Linguistics ◽

10.1075/lali.00003.cha ◽

2018 ◽

Vol 19 (1) ◽

pp. 61-79

Author(s):

Yu-Yun Chang ◽

Shu-Kai Hsieh

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Real World ◽

Support Vector ◽

World Knowledge ◽

Fine Grained ◽

Generative Lexicon

Abstract In Generative Lexicon Theory (glt) (Pustejovsky 1995), co-composition is one of the generative devices proposed to explain the cases of verbal polysemous behavior where more than one function application is allowed. The English baking verbs were used as examples to illustrate how their arguments co-specify the verb with qualia unification. Some studies (Blutner 2002; Carston 2002; Falkum 2007) stated that the information of pragmatics and world knowledge need to be considered as well. Therefore, this study would like to examine whether glt could be practiced in a real-world Natural Language Processing (nlp) application using collocations. We have conducted a fine-grained logical polysemy disambiguation task, taking the open-sourced Leiden Weibo Corpus as resource and computing with Support Vector Machine (svm) classifier. Within the classifier, we have taken collocated verbs under glt as main features. In addition, measure words and syntactic patterns are extracted as additional features for comparison. Our study investigates the logical polysemy of the Chinese verb kao ‘bake’. We find that glt could help in identifying logically polysemous cases; additional features would help the classifier achieve a higher performance.

Download Full-text

A WORD-BASED CHINESE LANGUAGE UNDERSTANDING SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001488000042 ◽

1988 ◽

Vol 02 (01) ◽

pp. 25-35

Author(s):

TIAN-SHUN YAO

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chinese Language ◽

Computer Programs ◽

World Knowledge ◽

Knowledge Source ◽

Language Understanding ◽

Language Analysis ◽

The World

With the word-based theory of natural language processing, a word-based Chinese language understanding system has been developed. In the light of psychological language analysis and the features of the Chinese language, this theory of natural language processing is presented with the description of the computer programs based on it. The heart of the system is to define a Total Information Dictionary and the World Knowledge Source used in the system. The purpose of this research is to develop a system which can understand not only Chinese sentences but also the whole text.

Download Full-text

Spatio-temporal Semantic Analysis of Safety Production Accidents in Grain Depot based on Natural Language Processing

2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) ◽

10.1109/wiiat50758.2020.00142 ◽

2020 ◽

Author(s):

Xie Wang ◽

Yun Cao ◽

Bo Mao

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Analysis ◽

Safety Production ◽

Spatio Temporal

Download Full-text

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

Parsing

10.1093/oxfordhb/9780199276349.013.0012 ◽

2012 ◽

Author(s):

John Carroll

Keyword(s):

Natural Language ◽

Language Processing ◽

Real World ◽

Level Of Detail ◽

Semantic Interpretation ◽

Syntactic Analysis ◽

Real World Applications ◽

Grammar Formalisms ◽

Speech Recognizer ◽

Context Free

This article introduces the concepts and techniques for natural language (NL) parsing, which signifies, using a grammar to assign a syntactic analysis to a string of words, a lattice of word hypotheses output by a speech recognizer or similar. The level of detail required depends on the language processing task being performed and the particular approach to the task that is being pursued. This article further describes approaches that produce ‘shallow’ analyses. It also outlines approaches to parsing that analyse the input in terms of labelled dependencies between words. Producing hierarchical phrase structure requires grammars that have at least context-free (CF) power. CF algorithms that are widely used in parsing of NL are described in this article. To support detailed semantic interpretation more powerful grammar formalisms are required, but these are usually parsed using extensions of CF parsing algorithms. Furthermore, this article describes unification-based parsing. Finally, it discusses three important issues that have to be tackled in real-world applications of parsing: evaluation of parser accuracy, parser efficiency, and measurement of grammar/parser coverage.

Download Full-text