Research on Sustainable Mining Engineering

This article provides a brief introduction to Natural Language Processing and basic knowledge of Chinese Word Segmentation at first. Chinese Word Segmentation is a process of turning a series of Chinese characters into a series of Chinese words with some rules. As the fundamental component of Chinese information processing, it is wildly used in correlative areas. Accordingly, research on Chinese Word Segmentation has important theoretic and realistic meaning. In this paper, we mainly introduces the challenge in Chinese Word Segmentation, and presented the categories of Chinese Word Segmentation method.

Download Full-text

Étudier l'écrit SMS: Un objectif du projet sms4science

Linguistik Online ◽

10.13092/lo.48.331 ◽

2011 ◽

Vol 48 (4) ◽

Author(s):

Louise-Amélie Cougnon ◽

Thomas François

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Statistical Study ◽

Text Message ◽

Scientific Research ◽

Text Messages ◽

International Project ◽

Message Length

This paper details an international project called sms4science that aims to collect text message corpora (hereafter referred to as "SMS corpora") from across the globe for scientific research. The project already has ten participating regions, including Belgium, Réunion, Switzerland and Quebec. This article first presents the initial corpora collected from these four areas (resulting in a combined total of 116'000 text messages) and the accompanying methodology. It then exposes the research possibilities related to it: the corpus-based studies pertain as much to linguistics and sociolinguistics as they do to natural language processing and statistics. A specific statistical study is thus presented here and its possible conclusions outline the differences in SMS practices between regions, notably when you consider abbreviation rate or message length. Finally, the paper delineates the project obstacles and correspondingly proposes fresh perspectives for the ongoing year (2011).

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Natural Language Processing in Serious Games: A state of the art.

International Journal of Serious Games ◽

10.17083/ijsg.v2i3.87 ◽

2015 ◽

Vol 2 (3) ◽

Cited By ~ 5

Author(s):

Davide Picca ◽

Dominique Jaccard ◽

Gérald Eberlé

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Serious Games ◽

State Of The Art ◽

Serious Game ◽

The Other ◽

Other Hand ◽

The One ◽

High Level

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure: first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.

Download Full-text

Researching in Shallow Parsing Based on Function Invariance of Maximum Likelihood Estimation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.274.359 ◽

2013 ◽

Vol 274 ◽

pp. 359-362

Author(s):

Shuang Zhang ◽

Shi Xiong Zhang

Keyword(s):

Natural Language Processing ◽

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Natural Language ◽

Language Processing ◽

Likelihood Estimation ◽

The Other ◽

Applied Technology ◽

New Strategy ◽

Shallow Parsing

Abstract. Shallow parsing is a new strategy of language processing in the domain of natural language processing recently years. It is not focus on the obtaining of the full parsing tree but requiring of the recognition of some simple composition of some structure. It separated parsing into two subtasks: one is the recognition and analysis of chunks the other is the analysis of relationships among chunks. In this essay, some applied technology of shallow parsing is introduced and a new method of it is experimented.

Download Full-text

Sentiment Analysis of Review Data of a Product Using Python

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.16452 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 674

Author(s):

P Santhi Priya ◽

T Venkateswara Rao

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

The Other ◽

Product Reviews ◽

Prime Problem

The other name of sentiment analysis is the opinion mining. It’s one of the primary objectives in a Natural Language Processing(NLP). Opinion mining is having a lot of audience lately. In our research we have taken up a prime problem of opinion mining which is theSentiment Polarity Categorization(SPC) that is very influential. We proposed a methodology for the SPC with explanations to the minute level. Apart from theories computations are made on both review standard and sentence standard categorization with benefitting outcomes. Also, the data that is represented here is from the product reviews given on the shopping site called Amazon.

Download Full-text

NMT Multi-Sense Embeddings per Word

10.31219/osf.io/k623t ◽

2019 ◽

Author(s):

William Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

The Semantics and Collocations Relation in Food Reviews

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128372 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Fazel Keshtkar ◽

Ledong Shi ◽

Syed Ahmad Chan Bukhari

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Topic Modeling ◽

State Of The Art ◽

Semantic Relation ◽

The Other ◽

Good Place ◽

The Common

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.

Download Full-text

Measuring Language Distance of Isolated European Languages

Information ◽

10.3390/info11040181 ◽

2020 ◽

Vol 11 (4) ◽

pp. 181 ◽

Cited By ~ 1

Author(s):

Pablo Gamallo ◽

José Ramom Pichel ◽

Iñaki Alegria

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Historical Linguistics ◽

Rooted Tree ◽

The Other ◽

Historical Evolution ◽

European Languages ◽

Shed Light ◽

European Family

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this article is to select some isolated languages and measure the distance between them and from the other European languages, so as to shed light on the linguistic distances and proximities of these controversial languages without considering phylogenetic issues. The experiments were carried out with 40 European languages including six languages that are isolated in their corresponding families: Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.

Download Full-text

The Empty Lexicon

International Journal of Corpus Linguistics ◽

10.1075/ijcl.1.1.07sin ◽

1996 ◽

Vol 1 (1) ◽

pp. 99-119 ◽

Cited By ~ 8

Author(s):

John McH. Sinclair

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Knowledge Base ◽

Language Processing ◽

The Other ◽

Vast Amount ◽

Made In

This paper1 contrasts two views on the analysis of language. In one view, language is primarily seen as a carrier of messages in sentences whose propo-sitional content can be retrieved, and symbolised in a knowledge base. In the other, language is seen as a means of communication that deals in much more complex matters than just carrying messages. In relation to vocabulary and the design of lexicons, the model of terminology suits the first position, while in the other the lexicon is considered empty at the start and is gradually filled with the evidence of usage. Similar contrasts are made in other areas relevant to natural language processing. In one approach, the expectation is of tidiness and conformity to rules; the other stresses the inherently provisional nature of the organisation of language and, therefore, the meanings. As these two approaches encounter the vast amount of evidence stored in today's corpora, their methods and responses contrast in interesting ways.

Download Full-text