Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR

The most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words under the same context based on the conceptual system learned. In this study, a word-to-vector and conceptual relationship (Word2VnCR) algorithm is proposed that replaces an OOV word leading to an erroneous morphemic analysis with an appropriate synonym. TheWord2VnCR algorithm is an improvement over the conventional Word2Vec algorithm, which has a problem in suggesting a replacement word by not determining the similarity of the word. After word-embedding learning is conducted using the learning dataset, the replacement word candidates of the OOV word are extracted. The semantic similarities of the extracted replacement word candidates are measured with the surrounding neighboring words of the OOV word, and a replacement word having the highest similarity value is selected as a replacement. To evaluate the performance of the proposed Word2VnCR algorithm, a comparative experiment was conducted using the Word2VnCR and Word2Vec algorithms. As the experimental results indicate, the proposed algorithm shows a higher accuracy than the Word2Vec algorithm.

Download Full-text

Evaluating word embedding models: methods and experimental results

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2019.12 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 12

Author(s):

Bin Wang ◽

Angela Wang ◽

Fenxiao Chen ◽

Yuncheng Wang ◽

C.-C. Jay Kuo

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Performance Metrics ◽

Word Embedding ◽

Experimental Results ◽

Extensive Evaluation ◽

Work First ◽

Study Performance

AbstractExtensive evaluation on a large number of word embedding models for language processing applications is conducted in this work. First, we introduce popular word embedding models and discuss desired properties of word models and evaluation methods (or evaluators). Then, we categorize evaluators into intrinsic and extrinsic two types. Intrinsic evaluators test the quality of a representation independent of specific natural language processing tasks while extrinsic evaluators use word embeddings as input features to a downstream task and measure changes in performance metrics specific to that task. We report experimental results of intrinsic and extrinsic evaluators on six word embedding models. It is shown that different evaluators focus on different aspects of word models, and some are more correlated with natural language processing tasks. Finally, we adopt correlation analysis to study performance consistency of extrinsic and intrinsic evaluators.

Download Full-text

Breakdown Detection in Negotiation Dialogues (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7257 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13969-13970

Author(s):

Atsuki Yamaguchi ◽

Katsuhide Fujita

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Language Model ◽

Experimental Results ◽

Conflicts Of Interests ◽

Early Stages ◽

End To End ◽

Gated Recurrent Unit

In human-human negotiation, reaching a rational agreement can be difficult, and unfortunately, the negotiations sometimes break down because of conflicts of interests. If artificial intelligence can play a role in assisting with human-human negotiation, it can assist in avoiding negotiation breakdown, leading to a rational agreement. Therefore, this study focuses on end-to-end tasks for predicting the outcome of a negotiation dialogue in natural language. Our task is modeled using a gated recurrent unit and a pre-trained language model: BERT as the baseline. Experimental results demonstrate that the proposed tasks are feasible on two negotiation dialogue datasets, and that signs of a breakdown can be detected in the early stages using the baselines even if the models are used in a partial dialogue history.

Download Full-text

Grounding Action Descriptions in Videos

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00207 ◽

2013 ◽

Vol 1 ◽

pp. 25-36 ◽

Cited By ~ 63

Author(s):

Michaela Regneri ◽

Marcus Rohrbach ◽

Dominikus Wetzel ◽

Stefan Thater ◽

Bernt Schiele ◽

...

Keyword(s):

Natural Language ◽

Recent Work ◽

Visual Information ◽

General Purpose ◽

Experimental Results ◽

High Quality ◽

Improve Model ◽

Model Predictions ◽

Static Images

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions.

Download Full-text

Convolution–deconvolution word embedding: An end-to-end multi-prototype fusion embedding method for natural language processing

Information Fusion ◽

10.1016/j.inffus.2019.06.009 ◽

2020 ◽

Vol 53 ◽

pp. 112-122 ◽

Cited By ~ 9

Author(s):

Kai Shuang ◽

Zhixuan Zhang ◽

Jonathan Loo ◽

Sen Su

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embedding ◽

Embedding Method ◽

End To End

Download Full-text

A Deep Paraphrase Identification Model Interacting Semantics with Syntax

Complexity ◽

10.1155/2020/9757032 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Leilei Kong ◽

Zhongyuan Han ◽

Yong Han ◽

Haoliang Qi

Keyword(s):

Neural Network ◽

Natural Language ◽

Convolutional Neural Network ◽

Semantic Representation ◽

Experimental Results ◽

Plagiarism Detection ◽

Linguistic Features ◽

Syntactic Structures ◽

Syntactic Features ◽

Identification Model

Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.

Download Full-text

Improving Text Categorization by Multicriteria Feature Selection

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2005.p0570 ◽

2005 ◽

Vol 9 (5) ◽

pp. 570-575

Author(s):

Son Doan ◽

◽

Susumu Horiguchi ◽

Keyword(s):

Feature Selection ◽

Natural Language ◽

Text Categorization ◽

Naive Bayes ◽

Naïve Bayes ◽

Experimental Results ◽

Benchmark Data ◽

Bayes Algorithm

Text categorization involves assigning a natural language document to one or more predefined classes. One of the most interesting issues is feature selection. We propose an approach using multicriteria ranking of eatures, a new procedure for feature selection, and apply these to text categorization. Experimental results dealing with Reuters-21578 and 20Newsgroups benchmark data and the naive Bayes algorithm show that our proposal outperforms conventional feature selection in text categorization performance.

Download Full-text

GLOBAL RULE INDUCTION FOR INFORMATION EXTRACTION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213004001831 ◽

2004 ◽

Vol 13 (04) ◽

pp. 813-828 ◽

Cited By ~ 1

Author(s):

JING XIAO ◽

TAT-SENG CHUA ◽

JIMIN LIU

Keyword(s):

Natural Language ◽

Information Extraction ◽

Rule Induction ◽

Important Task ◽

Learning System ◽

Experimental Results ◽

Free Text ◽

Global Feature ◽

Test Set ◽

Potential Applications

The ability to extract desired pieces of information from natural language texts is an important task with a growing number of potential applications. This paper presents a novel pattern rule induction learning system, GRID, which emphasizes the use of global feature distribution in all of the training instances in order to make better decision on rule induction. GRID uses chunks as contextual units instead of tokens, and incorporates features at lexical, syntactical and semantic levels simultaneously. The features chosen in GRID are general and they were applied successfully to both semi-structured text and free text. Our experimental results on some publicly available webpage corpora and MUC-4 test set indicate that our approach is effective.

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Using Edge AI and Language Understanding for Predictive Modeling of Acute Medical Intoxications

The Journal of CIEES ◽

10.48149/jciees.2021.1.2.3 ◽

2021 ◽

Vol 1 (2) ◽

pp. 18-22

Author(s):

Strahil Sokolov ◽

Stanislava Georgieva

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Predictive Modeling ◽

Personal Data ◽

Experimental Results ◽

Language Understanding ◽

New Approach ◽

Data Anonymization ◽

Model Training

This paper presents a new approach to processing and categorization of text from patient documents in Bulgarian language using Natural Language Processing and Edge AI. The proposed algorithm contains several phases - personal data anonymization, pre-processing and conversion of text to vectors, model training and recognition. The experimental results in terms of achieved accuracy are comparable with modern approaches.

Download Full-text

Bilateral Multi-Perspective Matching for Natural Language Sentences

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/579 ◽

2017 ◽

Cited By ~ 92

Author(s):

Zhiguo Wang ◽

Wael Hamza ◽

Radu Florian

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

The Other ◽

Multiple Perspectives ◽

Time Step ◽

Benchmark Datasets ◽

Sentence Matching ◽

Fully Connected

Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model. Given two sentences P and Q, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions P against Q and P against Q. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, a decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.

Download Full-text