scholarly journals Evaluation of Named Entity Recognition Algorithms in Short Texts

Author(s):  
Edgar Casasola Murillo ◽  
Raquel Fonseca

Abstract: One of the major consequences of the growth of social networks has been the generation of huge volumes of content. The text that is generated in social networks constitutes a new type of content, that is short, informal, lacking grammar in some cases, and noise prone. Given the volume of information that is produced every day, a manual processing of this data is unpractical, causing the need of exploring and applying automatic processing strategies, like Entity Recognition (ER). It becomes necessary to evaluate the performance of traditional ER algorithms in corpus with those characteristics. This paper presents the results of applying AlchemyAPI y Dandelion API algorithms in a corpus provided by The SemEval-2015 Aspect Based Sentiment Analysis Conference. The entities recognized by each algorithm were compared against the ones annotated in the collection in order to calculate their precision and recall. Dandelion API got better results than AlchemyAPI with the given corpus.  Spanish Abstract: Una de las principales consecuencias del auge actual de las redes sociales es la generación de grandes volúmenes de información. El texto generado en estas redes corresponde a un nuevo género de texto: corto, informal, gramaticalmente deficiente y propenso a ruido. Debido a la tasa de producción de la información, el procesamiento manual resulta poco práctico, surgiendo así la necesidad de aplicar estrategias de procesamiento automático, como Reconocimiento de Entidades (RE). Debido a las características del contenido, surge además la necesidad de evaluar el desempeño de los algoritmos tradicionales, en corpus extraídos de estas redes sociales. Este trabajo presenta los resultados obtenidos al aplicar los algoritmos de AlchemyAPI y Dandelion API en un corpus provisto por la conferencia The SemEval-2015 Aspect Based Sentiment Analysis. Las entidades reconocidas por cada algoritmo fueron comparadas con las anotadas en la colección, para calcular su precisión y exhaustividad. Dandelion API obtuvo mejores resultados que AlchemyAPI en el corpus dado.

2020 ◽  
Vol 34 (05) ◽  
pp. 9274-9281
Author(s):  
Qianhui Wu ◽  
Zijia Lin ◽  
Guoxin Wang ◽  
Hui Chen ◽  
Börje F. Karlsson ◽  
...  

For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER). While all existing methods directly transfer from source-learned model to a target language, in this paper, we propose to fine-tune the learned model with a few similar examples given a test case, which could benefit the prediction by leveraging the structural and semantic information conveyed in such similar examples. To this end, we present a meta-learning algorithm to find a good model parameter initialization that could fast adapt to the given test case and propose to construct multiple pseudo-NER tasks for meta-training by computing sentence similarities. To further improve the model's generalization ability across different languages, we introduce a masking scheme and augment the loss function with an additional maximum term during meta-training. We conduct extensive experiments on cross-lingual named entity recognition with minimal resources over five target languages. The results show that our approach significantly outperforms existing state-of-the-art methods across the board.


2020 ◽  
Vol 49 (4) ◽  
pp. 564-582
Author(s):  
Jibran Mir ◽  
Azhar Mahmood

Aspect Based Sentiment Analysis techniques have been applied in several application domains. From the last two decades, these techniques have been developed mostly for product and service application domains. However, very few aspect-based sentiment techniques have been proposed for the movie application domain. Moreover, these techniques only mine specific aspects (Script, Director, and Actor) of a movie application domain, nevertheless, the movie application domain is more complex than the product and service application domain. Since, it contains NER (Named Entity Recognition) problem and it cannot be ignored, since there is an opinion often associated with it. Consequently, in this paper MAIM (Movie Aspect Identification Model) is proposed that can extract not only movie specific aspects, also identifies NEs (Named Entities) such as Person Name and Movie Title. The three main contributions are 1) the identification of infrequent aspects, 2) the identification of NE (named entity) in movie application domain, 3) identifying N-gram opinion words as an entity. MAIM incorporates the BiLSTM-CRF hybrid technique and is implemented on the movie application domain having precision 89.9%, recall 88.9% and f1-measure 89.4%. The experimental results show that MAIM performs better than baseline models CRF and LSTM-CRF.


2019 ◽  
Vol 5 ◽  
pp. e189 ◽  
Author(s):  
Niels Dekker ◽  
Tobias Kuhn ◽  
Marieke van Erp

The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels, the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th and early 20th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day literature as they are to those older novels. We present a study in which we evaluate natural language processing tools for the automatic extraction of social networks from novels as well as their network structure. We find that there are no significant differences between old and modern novels but that both are subject to a large amount of variance. Furthermore, we identify several issues that complicate named entity recognition in our set of novels and we present methods to remedy these. We see this work as a step in creating more culturally-aware AI systems.


2018 ◽  
Author(s):  
Niels Dekker ◽  
Tobias Kuhn ◽  
Marieke van Erp

The analysis of literary works has experienced a surge in computer-assisted processing. To obtain insights into the community structures and social interactions portrayed in novels the creation of social networks from novels has gained popularity. Many methods rely on identifying named entities and relations for the construction of these networks, but many of these tools are not specifically created for the literary domain. Furthermore, many of the studies on information extraction from literature typically focus on 19th century source material. Because of this, it is unclear if these techniques are as suitable to modern-day science fiction and fantasy literature as they are to those 19th century classics. We present a study to compare classic literature to modern literature in terms of performance of natural language processing tools for the automatic extraction of social networks as well as their network structure. We find that there are no significant differences between the two sets of novels but that both are subject to a high amount of variance. Furthermore, we identify several issues that complicate named entity recognition in modern novels and we present methods to remedy these.


Author(s):  
Aldo Hernandez-Suarez ◽  
Gabriel Sanchez-Perez ◽  
Karina Toscano-Medina ◽  
Hector Perez-Meana ◽  
Jose Portillo-Portillo ◽  
...  

In recent years, online social networks have received important consideration in spatial modelling fields given the critical information that can be extracted from them for events in real time; one of the most latent issues is that regarding various natural disasters such as earthquakes. Although it is possible to retrieve data from these social networks with embedded geographic information provided by GPS, in many cases this is not possible. An alternative solution is to reconstruct specific locations using probabilistic language models, more specifically those based on Name Entity Recognition (NER), which extracts names from a user’s description about an event occurring in a specific place (e.g., a collapsed building on a specific avenue). In this work, we present a methodology to use twitter as a social sensor system for disasters. The methodology scores NER locations with a kernel density estimation function for different subtopics originating from a natural disaster and that maps them into a geographic space is proposed. The proposed methodology is evaluated with tweets related to the 2017 earthquake in Mexico.


Sign in / Sign up

Export Citation Format

Share Document