Measuring Language Distance of Isolated European Languages

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this article is to select some isolated languages and measure the distance between them and from the other European languages, so as to shed light on the linguistic distances and proximities of these controversial languages without considering phylogenetic issues. The experiments were carried out with 40 European languages including six languages that are isolated in their corresponding families: Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.

Download Full-text

Research on Sustainable Mining Engineering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.340.126 ◽

2013 ◽

Vol 340 ◽

pp. 126-130 ◽

Cited By ~ 2

Author(s):

Xiao Guang Yue ◽

Guang Zhang ◽

Qing Guo Ren ◽

Wen Cheng Liao ◽

Jing Xi Chen ◽

...

Keyword(s):

Information Processing ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Social Economy ◽

Scientific Research ◽

The Other ◽

Mining Engineering ◽

Chinese Information Processing ◽

Development Tendency

The concepts of Chinese information processing and natural language processing (NLP) and their development tendency are summarized. There are different comprehension of Chinese information processing and natural language processing in China and the other countries. But the work appears to emerge in the study of key point of languages processing. Mining engineering is very important for our country. Though the final task of languages processing is difficult, Chinese information processing has contributed substantially to our scientific research and social economy and it will play an important part for mining engineering in our future.

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Natural Language Processing in Serious Games: A state of the art.

International Journal of Serious Games ◽

10.17083/ijsg.v2i3.87 ◽

2015 ◽

Vol 2 (3) ◽

Cited By ~ 5

Author(s):

Davide Picca ◽

Dominique Jaccard ◽

Gérald Eberlé

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Serious Games ◽

State Of The Art ◽

Serious Game ◽

The Other ◽

Other Hand ◽

The One ◽

High Level

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure: first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.

Download Full-text

Researching in Shallow Parsing Based on Function Invariance of Maximum Likelihood Estimation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.274.359 ◽

2013 ◽

Vol 274 ◽

pp. 359-362

Author(s):

Shuang Zhang ◽

Shi Xiong Zhang

Keyword(s):

Natural Language Processing ◽

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Natural Language ◽

Language Processing ◽

Likelihood Estimation ◽

The Other ◽

Applied Technology ◽

New Strategy ◽

Shallow Parsing

Abstract. Shallow parsing is a new strategy of language processing in the domain of natural language processing recently years. It is not focus on the obtaining of the full parsing tree but requiring of the recognition of some simple composition of some structure. It separated parsing into two subtasks: one is the recognition and analysis of chunks the other is the analysis of relationships among chunks. In this essay, some applied technology of shallow parsing is introduced and a new method of it is experimented.

Download Full-text

Sentiment Analysis of Review Data of a Product Using Python

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.16452 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 674

Author(s):

P Santhi Priya ◽

T Venkateswara Rao

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

The Other ◽

Product Reviews ◽

Prime Problem

The other name of sentiment analysis is the opinion mining. It’s one of the primary objectives in a Natural Language Processing(NLP). Opinion mining is having a lot of audience lately. In our research we have taken up a prime problem of opinion mining which is theSentiment Polarity Categorization(SPC) that is very influential. We proposed a methodology for the SPC with explanations to the minute level. Apart from theories computations are made on both review standard and sentence standard categorization with benefitting outcomes. Also, the data that is represented here is from the product reviews given on the shopping site called Amazon.

Download Full-text

NMT Multi-Sense Embeddings per Word

10.31219/osf.io/k623t ◽

2019 ◽

Author(s):

William Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Download Full-text

The Semantics and Collocations Relation in Food Reviews

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128372 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Fazel Keshtkar ◽

Ledong Shi ◽

Syed Ahmad Chan Bukhari

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Topic Modeling ◽

State Of The Art ◽

Semantic Relation ◽

The Other ◽

Good Place ◽

The Common

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.

Download Full-text

Are Atypical Things More Popular?

Psychological Science ◽

10.1177/0956797618759465 ◽

2018 ◽

Vol 29 (7) ◽

pp. 1178-1184 ◽

Cited By ~ 12

Author(s):

Jonah Berger ◽

Grant Packard

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Cultural Dynamics ◽

The Relationship ◽

Shed Light

Why do some cultural items become popular? Although some researchers have argued that success is random, we suggest that how similar items are to each other plays an important role. Using natural language processing of thousands of songs, we examined the relationship between lyrical differentiation (i.e., atypicality) and song popularity. Results indicated that the more different a song’s lyrics are from its genre, the more popular it becomes. This relationship is weaker in genres where lyrics matter less (e.g., dance) or where differentiation matters less (e.g., pop) and occurs for lyrical topics but not style. The results shed light on cultural dynamics, why things become popular, and the psychological foundations of culture more broadly.

Download Full-text

Guidelines for normalising Early Modern English corpora: Decisions and justifications

ICAME Journal ◽

10.1515/icame-2015-0001 ◽

2015 ◽

Vol 39 (1) ◽

pp. 5-24 ◽

Cited By ~ 8

Author(s):

Dawn Archer ◽

Merja Kytö ◽

Alistair Baron ◽

Paul Rayson

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Early Modern ◽

Language Processing ◽

Corpus Linguistics ◽

Large Scale ◽

Historical Linguistics ◽

Early Modern English ◽

Textual Data ◽

Do So

Abstract Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.

Download Full-text

The Empty Lexicon

International Journal of Corpus Linguistics ◽

10.1075/ijcl.1.1.07sin ◽

1996 ◽

Vol 1 (1) ◽

pp. 99-119 ◽

Cited By ~ 8

Author(s):

John McH. Sinclair

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Knowledge Base ◽

Language Processing ◽

The Other ◽

Vast Amount ◽

Made In

This paper1 contrasts two views on the analysis of language. In one view, language is primarily seen as a carrier of messages in sentences whose propo-sitional content can be retrieved, and symbolised in a knowledge base. In the other, language is seen as a means of communication that deals in much more complex matters than just carrying messages. In relation to vocabulary and the design of lexicons, the model of terminology suits the first position, while in the other the lexicon is considered empty at the start and is gradually filled with the evidence of usage. Similar contrasts are made in other areas relevant to natural language processing. In one approach, the expectation is of tidiness and conformity to rules; the other stresses the inherently provisional nature of the organisation of language and, therefore, the meanings. As these two approaches encounter the vast amount of evidence stored in today's corpora, their methods and responses contrast in interesting ways.

Download Full-text