scholarly journals Étudier l'écrit SMS: Un objectif du projet sms4science

2011 ◽  
Vol 48 (4) ◽  
Author(s):  
Louise-Amélie Cougnon ◽  
Thomas François

This paper details an international project called sms4science that aims to collect text message corpora (hereafter referred to as "SMS corpora") from across the globe for scientific research. The project already has ten participating regions, including Belgium, Réunion, Switzerland and Quebec. This article first presents the initial corpora collected from these four areas (resulting in a combined total of 116'000 text messages) and the accompanying methodology. It then exposes the research possibilities related to it: the corpus-based studies pertain as much to linguistics and sociolinguistics as they do to natural language processing and statistics. A specific statistical study is thus presented here and its possible conclusions outline the differences in SMS practices between regions, notably when you consider abbreviation rate or message length. Finally, the paper delineates the project obstacles and correspondingly proposes fresh perspectives for the ongoing year (2011).

Author(s):  
Evrenii Polyakov ◽  
Leonid Voskov ◽  
Pavel Abramov ◽  
Sergey Polyakov

Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study andamount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformationsand their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models.Purpose: Developing and exploring a generalized approach to building a model, which consists in sequentially passing throughthe stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, andmodeling. Results: Comparative experiments conducted using a generalized approach for classical machine learning and deeplearning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processinghave demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increasein quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that theuse of automatic machine learning which uses classical classification algorithms is comparable in quality to manual modeldevelopment; however, it takes much longer. The use of transfer learning has a small but positive effect on the classificationquality. Practical relevance: The proposed sequential approach can significantly improve the quality of models under developmentin natural language processing problems.


2013 ◽  
Vol 340 ◽  
pp. 126-130 ◽  
Author(s):  
Xiao Guang Yue ◽  
Guang Zhang ◽  
Qing Guo Ren ◽  
Wen Cheng Liao ◽  
Jing Xi Chen ◽  
...  

The concepts of Chinese information processing and natural language processing (NLP) and their development tendency are summarized. There are different comprehension of Chinese information processing and natural language processing in China and the other countries. But the work appears to emerge in the study of key point of languages processing. Mining engineering is very important for our country. Though the final task of languages processing is difficult, Chinese information processing has contributed substantially to our scientific research and social economy and it will play an important part for mining engineering in our future.


2021 ◽  
pp. 142-147
Author(s):  
M Muliyono ◽  
S Sumijan

Chatbot is a software with artificial intelligence that can imitate human conversations through text messages or voice messages. This chatbot can convey information, according to the knowledge that has been given previously. Helping the limitations of the academic section in answering questions posed by students. The method in this study was sourced from a questionnaire distributed to students at the Muhammadiyah University of West Sumatra. Based on the analysis of the questionnaire, there are 40 questions that are often asked by students to the academic section. Then it is processed using Natural Language Processing (NLP). Natural Language Processing is a branch of science from artificial intelligence that is able to study communication between humans and computers through natural language. The processing stage is to identify the intent, process the input and display the results according to the input. The results of the test using a questionnaire addressed to 227 students got a score of 3,55 with a very good predicate. Then do the test using 40 question and answer data. So, obtained 37 appropriate answers and 3 answers that are not in accordance with the percentage of answer accuracy generated from the chatbot is 92.5 percent. The results of this test have been able to respond to the questions asked by students. This chatbot can make it easier for students to get information with a very good level of accuracy


2020 ◽  
Vol 209 ◽  
pp. 03015
Author(s):  
Alex Kopaygorodsky

The article deals with the application of natural language processing methods to support research and forecasting the innovative development of energy infrastructure. The main methods of NLP, which are used to build an intelligent system to support scientific research, are considered. Methods of building infrastructure for processing Open Linked Data and Big Data are described. Semantic analysis and knowledge integration are based on ontology system. Applying suggested methods allow increasing quality of scientific research in this area and make it more effectively


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Sign in / Sign up

Export Citation Format

Share Document