scholarly journals Measuring Language Distance of Isolated European Languages

Information ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 181 ◽  
Author(s):  
Pablo Gamallo ◽  
José Ramom Pichel ◽  
Iñaki Alegria

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this article is to select some isolated languages and measure the distance between them and from the other European languages, so as to shed light on the linguistic distances and proximities of these controversial languages without considering phylogenetic issues. The experiments were carried out with 40 European languages including six languages that are isolated in their corresponding families: Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.

2013 ◽  
Vol 340 ◽  
pp. 126-130 ◽  
Author(s):  
Xiao Guang Yue ◽  
Guang Zhang ◽  
Qing Guo Ren ◽  
Wen Cheng Liao ◽  
Jing Xi Chen ◽  
...  

The concepts of Chinese information processing and natural language processing (NLP) and their development tendency are summarized. There are different comprehension of Chinese information processing and natural language processing in China and the other countries. But the work appears to emerge in the study of key point of languages processing. Mining engineering is very important for our country. Though the final task of languages processing is difficult, Chinese information processing has contributed substantially to our scientific research and social economy and it will play an important part for mining engineering in our future.


2020 ◽  
Author(s):  
Masashi Sugiyama

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.


Author(s):  
Davide Picca ◽  
Dominique Jaccard ◽  
Gérald Eberlé

In the last decades, Natural Language Processing (NLP) has obtained a high level of success. Interactions between NLP and Serious Games have started and some of them already include NLP techniques. The objectives of this paper are twofold: on the one hand, providing a simple framework to enable analysis of potential uses of NLP in Serious Games and, on the other hand, applying the NLP framework to existing Serious Games and giving an overview of the use of NLP in pedagogical Serious Games. In this paper we present 11 serious games exploiting NLP techniques. We present them systematically, according to the following structure:  first, we highlight possible uses of NLP techniques in Serious Games, second, we describe the type of NLP implemented in the each specific Serious Game and, third, we provide a link to possible purposes of use for the different actors interacting in the Serious Game.


2013 ◽  
Vol 274 ◽  
pp. 359-362
Author(s):  
Shuang Zhang ◽  
Shi Xiong Zhang

Abstract. Shallow parsing is a new strategy of language processing in the domain of natural language processing recently years. It is not focus on the obtaining of the full parsing tree but requiring of the recognition of some simple composition of some structure. It separated parsing into two subtasks: one is the recognition and analysis of chunks the other is the analysis of relationships among chunks. In this essay, some applied technology of shallow parsing is introduced and a new method of it is experimented.


2018 ◽  
Vol 7 (3.12) ◽  
pp. 674
Author(s):  
P Santhi Priya ◽  
T Venkateswara Rao

The other name of sentiment analysis is the opinion mining. It’s one of the primary objectives in a Natural Language Processing(NLP). Opinion mining is having a lot of audience lately. In our research we have taken up a prime problem of opinion mining which is theSentiment Polarity Categorization(SPC) that is very influential. We proposed a methodology for the SPC with explanations to the minute level. Apart from theories computations are made on both review standard and sentence standard categorization with benefitting outcomes. Also, the data that is represented here is from the product reviews given on the shopping site called Amazon.  


2019 ◽  
Author(s):  
William Jin

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.


Author(s):  
Fazel Keshtkar ◽  
Ledong Shi ◽  
Syed Ahmad Chan Bukhari

Finding our favorite dishes have became a hard task since restaurants are providing more choices and va- rieties. On the other hand, comments and reviews of restaurants are a good place to look for the answer. The purpose of this study is to use computational linguistics and natural language processing to categorise and find semantic relation in various dishes based on reviewers’ comments and menus description. Our goal is to imple- ment a state-of-the-art computational linguistics meth- ods such as, word embedding model, word2vec, topic modeling, PCA, classification algorithm. For visualiza- tions, t-Distributed Stochastic Neighbor Embedding (t- SNE) was used to explore the relation within dishes and their reviews. We also aim to extract the common pat- terns between different dishes among restaurants and reviews comment, and in reverse, explore the dishes with a semantics relations. A dataset of articles related to restaurant and located dishes within articles used to find comment patterns. Then we applied t-SNE visual- izations to identify the root of each feature of the dishes. As a result, to find a dish our model is able to assist users by several words of description and their inter- est. Our dataset contains 1,000 articles from food re- views agency on a variety of dishes from different cul- tures: American, i.e. ’steak’, hamburger; Chinese, i.e. ’stir fry’, ’dumplings’; Japanese, i.e., ’sushi’.


2018 ◽  
Vol 29 (7) ◽  
pp. 1178-1184 ◽  
Author(s):  
Jonah Berger ◽  
Grant Packard

Why do some cultural items become popular? Although some researchers have argued that success is random, we suggest that how similar items are to each other plays an important role. Using natural language processing of thousands of songs, we examined the relationship between lyrical differentiation (i.e., atypicality) and song popularity. Results indicated that the more different a song’s lyrics are from its genre, the more popular it becomes. This relationship is weaker in genres where lyrics matter less (e.g., dance) or where differentiation matters less (e.g., pop) and occurs for lyrical topics but not style. The results shed light on cultural dynamics, why things become popular, and the psychological foundations of culture more broadly.


ICAME Journal ◽  
2015 ◽  
Vol 39 (1) ◽  
pp. 5-24 ◽  
Author(s):  
Dawn Archer ◽  
Merja Kytö ◽  
Alistair Baron ◽  
Paul Rayson

Abstract Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.


1996 ◽  
Vol 1 (1) ◽  
pp. 99-119 ◽  
Author(s):  
John McH. Sinclair

This paper1 contrasts two views on the analysis of language. In one view, language is primarily seen as a carrier of messages in sentences whose propo-sitional content can be retrieved, and symbolised in a knowledge base. In the other, language is seen as a means of communication that deals in much more complex matters than just carrying messages. In relation to vocabulary and the design of lexicons, the model of terminology suits the first position, while in the other the lexicon is considered empty at the start and is gradually filled with the evidence of usage. Similar contrasts are made in other areas relevant to natural language processing. In one approach, the expectation is of tidiness and conformity to rules; the other stresses the inherently provisional nature of the organisation of language and, therefore, the meanings. As these two approaches encounter the vast amount of evidence stored in today's corpora, their methods and responses contrast in interesting ways.


Sign in / Sign up

Export Citation Format

Share Document