scholarly journals The Sentiment is in the Details: A Language-agnostic Approach to Sentence-level Sentiment Analysis in News Media

2021 ◽  
Author(s):  
Erik de Vries

Determining the sentiment in the individual sentences of a newspaper article in an automated fashion is a major challenge. Manually created sentiment dictionaries often fail to meet the required standards. And while computer-generated dictionaries show promise, they are often limited by the availability of suitable linguistic resources. I propose and test a novel, language-agnostic and resource-efficient way of constructing sentiment dictionaries, based on word embedding models. The dictionaries are constructed and evaluated based on four corpora containing two decades of Danish, Dutch (Flanders and the Netherlands), English, and Norwegian newspaper articles, which are cleaned and parsed using Natural Language Processing. Concurrent validity is evaluated using a dataset of human-coded newspaper sentences, and compared to the performance of Polyglot dictionaries. Predictive validity is tested through two long-standing hypotheses on the negativity bias in political news. Results show that both the concurrent validity and predictive validity is good. The dictionaries outperform their Polyglot counterparts, and are able to detect a negativity bias, which is stronger for tabloids. The method is resource-efficient in terms of manual labor when compared to manually constructed dictionaries, and requires a limited amount of computational power.

2017 ◽  
Vol 39 (2) ◽  
pp. 275-301 ◽  
Author(s):  
DANIEL FELLMAN ◽  
ANNA SOVERI ◽  
CHARLOTTE VIKTORSSON ◽  
SARAH HAGA ◽  
JOHANNES NYLUND ◽  
...  

ABSTRACTWorking memory (WM) is one of the most studied cognitive constructs in psychology, because of its relevance to human performance, including language processing. When measuring verbal WM for sentences, the reading span task is the most widely used WM measure for this purpose. However, comparable sentence-level updating tasks are missing. Hence, we sought to develop a WM updating task, which we termed the selective updating of sentences (SUS) task, which taps the ability to constantly update sentences. In two experiments with Finnish-speaking young adults, we examined the internal consistency and concurrent validity of the SUS task. It exhibited adequate internal consistency and correlated positively with well-established working memory measures. Moreover, the SUS task also showed positive correlations with verbal episodic memory tasks employing sentences and paragraphs. These results indicate that the SUS task is a promising new task for psycholinguistic studies addressing verbal WM updating.


Author(s):  
Justin D Martin ◽  
Fouad Hassan

This study examined media use and attitudinal predictors of public willingness to censor fake online political news among representative samples in Lebanon, Saudi Arabia, and Tunisia (total N = 2880). The study utilized research on the corrective action hypothesis (CAH) and the theory of presumed media influence (TPMI) as frameworks. The CAH holds that an individual’s belief that media are hostile and influential increases the likelihood that the individual will participate in public discourse urging countermeasures. TPMI maintains that the belief that media are influential is associated with attitudes about media, though those attitudes need not be negative. Perceived exposure to fake news online positively predicted willingness to censor fake news in all countries, aligning with some prior research on both the CAH and the TPMI. Facebook use was negatively associated with willingness to censor fake news in two of the countries, while trust in news media was a positive correlate in two countries. Implications for research on both willingness to censor and on fake news are discussed.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


2021 ◽  
pp. 1-13
Author(s):  
Lamiae Benhayoun ◽  
Daniel Lang

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


Author(s):  
Dang Van Thin ◽  
Ngan Luu-Thuy Nguyen ◽  
Tri Minh Truong ◽  
Lac Si Le ◽  
Duy Tin Vo

Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site. 1


2014 ◽  
Vol 40 (2) ◽  
pp. 469-510 ◽  
Author(s):  
Khaled Shaalan

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.


Author(s):  
Necva Bölücü ◽  
Burcu Can

Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing, as it assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective, etc.). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g., dependency parsing) and thereby extract the meaning of the sentence (e.g., semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.


2016 ◽  
Vol 23 (4) ◽  
pp. 802-811 ◽  
Author(s):  
Kirk Roberts ◽  
Dina Demner-Fushman

Abstract Objective To understand how consumer questions on online resources differ from questions asked by professionals, and how such consumer questions differ across resources. Materials and Methods Ten online question corpora, 5 consumer and 5 professional, with a combined total of over 40 000 questions, were analyzed using a variety of natural language processing techniques. These techniques analyze questions at the lexical, syntactic, and semantic levels, exposing differences in both form and content. Results Consumer questions tend to be longer than professional questions, more closely resemble open-domain language, and focus far more on medical problems. Consumers ask more sub-questions, provide far more background information, and ask different types of questions than professionals. Furthermore, there is substantial variance of these factors between the different consumer corpora. Discussion The form of consumer questions is highly dependent upon the individual online resource, especially in the amount of background information provided. Professionals, on the other hand, provide very little background information and often ask much shorter questions. The content of consumer questions is also highly dependent upon the resource. While professional questions commonly discuss treatments and tests, consumer questions focus disproportionately on symptoms and diseases. Further, consumers place far more emphasis on certain types of health problems (eg, sexual health). Conclusion Websites for consumers to submit health questions are a popular online resource filling important gaps in consumer health information. By analyzing how consumers write questions on these resources, we can better understand these gaps and create solutions for improving information access. This article is part of the Special Focus on Person-Generated Health and Wellness Data, which published in the May 2016 issue, Volume 23, Issue 3.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Stefan Nilsson ◽  
Berit Finnström ◽  
Evalotte Mörelius ◽  
Maria Forsner

Needle fear is a common problem in children undergoing immunization. To ensure that the individual child’s needs are met during a painful procedure it would be beneficial to be able to predict whether there is a need for extra support. The self-reporting instrument facial affective scale (FAS) could have potential for this purpose. The aim of this study was to evaluate whether the FAS can predict pain unpleasantness in girls undergoing immunization. Girls, aged 11-12 years, reported their expected pain unpleasantness on the FAS at least two weeks before and then experienced pain unpleasantness immediately before each vaccination. The experienced pain unpleasantness during the vaccination was also reported immediately after each immunization. The level of anxiety was similarly assessed during each vaccination and supplemented with stress measures in relation to the procedure in order to assess and evaluate concurrent validity. The results show that the FAS is valid to predict pain unpleasantness in 11-12-year-old girls who undergo immunizations and that it has the potential to be a feasible instrument to identify children who are in need of extra support to cope with immunization. In conclusion, the FAS measurement can facilitate caring interventions.


Sign in / Sign up

Export Citation Format

Share Document