scholarly journals Natural Language Processing and Its Challenges on Omotic Language Group of Ethiopia

2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Girma Yohannis Bade

This article reviews Natural Language Processing (NLP) and its challenge on Omotic language groups. All technological achievements are partially fuelled by the recent developments in NLP. NPL is one of component of an artificial intelligence (AI) and offers the facility to the companies that need to analyze their reliable business data. However, there are many challenges that tackle the effectiveness of NLP applications on Omotic language groups (Ometo) of Ethiopia. These challenges are irregularity of the words, stop word identification problem, compounding and languages ‘digital data resource limitation. Thus, this study opens the room to the upcoming researchers to further investigate the NLP application on these language groups.

Author(s):  
Sandeep Mathias ◽  
Diptesh Kanojia ◽  
Abhijit Mishra ◽  
Pushpak Bhattacharya

Gaze behaviour has been used as a way to gather cognitive information for a number of years. In this paper, we discuss the use of gaze behaviour in solving different tasks in natural language processing (NLP) without having to record it at test time. This is because the collection of gaze behaviour is a costly task, both in terms of time and money. Hence, in this paper, we focus on research done to alleviate the need for recording gaze behaviour at run time. We also mention different eye tracking corpora in multiple languages, which are currently available and can be used in natural language processing. We conclude our paper by discussing applications in a domain - education - and how learning gaze behaviour can help in solving the tasks of complex word identification and automatic essay grading.


2011 ◽  
Vol 474-476 ◽  
pp. 460-465
Author(s):  
Bo Sun ◽  
Sheng Hui Huang ◽  
Xiao Hua Liu

Unknown word is a kind of word that is not included in the sub_word vocabulary, but must be cut out by the word segmentation program. Peoples’ names, place names and translated names are the major unknown words.Unknown Chinese words is a difficult problem in natural language processing, and also contributed to the low rate of correct segmention. This paper introduces the finite multi-list method that using the word fragments’ capability to composite a word and the location in the word tree to process the unknown Chinese words.The experiment recall is 70.67% ,the correct rate is 43.65% .The result of the experiment shows that unknown Chinese word identification based on the finite multi-list method is feasible.


Author(s):  
Shruthi J. ◽  
Suma Swamy

In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model.


Author(s):  
Constantin Orasan ◽  
Ruslan Mitkov

Natural Language Processing (NLP) is a dynamic and rapidly developing field in which new trends, techniques, and applications are constantly emerging. This chapter focuses mainly on recent developments in NLP which could not be covered in other chapters of the Handbook. Topics such as crowdsourcing and processing of large datasets, which are no longer that recent but are widely used and not covered at length in any other chapter, are also presented. The chapter starts by describing how the availability of tools and resources has had a positive impact on the field. The proliferation of user-generated content has led to the emergence of research topics such as sarcasm and irony detection, automatic assessment of user-generated content, and stance detection. All of these topics are discussed in the chapter. The field of NLP is approaching maturity, a fact corroborated by the latest developments in the processing of texts for financial purposes and for helping users with disabilities, two topics that are also discussed here. The chapter presents examples of how researchers have successfully combined research in computer vision and natural language processing to enable the processing of multimodal information, as well as how the latest advances in deep learning have revitalized research on chatbots and conversational agents. The chapter concludes with a comprehensive list of further reading material and additional resources.


2021 ◽  
Vol 10 (5) ◽  
pp. 9-16
Author(s):  
Aditya Mandke ◽  
Onkar Litake ◽  
Dipali Kadam

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints.


2015 ◽  
Author(s):  
Vijaykumar Yogesh Muley ◽  
Anne Hahn ◽  
Pravin Paikrao

Natural language processing continues to gain importance in a thriving scientific community that communicates its latest results in such a frequency that following up on the most recent developments even in a specific field cannot be managed by human readers alone. Here we summarize and compare the publishing activity of the previous years on a distinct topic across several countries, addressing not only publishing frequency and history, but also stylistic characteristics that are accessible by means of natural language processing. Though there are no profound differences in the sentence lengths or lexical diversity among different countries, writing styles approached by Part-Of-Speech tagging are similar among countries that share history or official language or those are spatially close.


2018 ◽  
Vol 10 (3) ◽  
pp. 467-493 ◽  
Author(s):  
OANA DAVID ◽  
TEENIE MATLOCK

abstractConceptual metaphor research has benefited from advances in discourse analytic and corpus linguistic methodologies over the years, especially given recent developments with Natural Language Processing (NLP) technologies. Such technologies are now capable of identifying metaphoric expressions across large bodies of text. Here we focus on how one particular analytic tool, MetaNet, can be used to study everyday discourse about personal and social problems, in particular, poverty and cancer, by leveraging reusable networks of primary metaphors enhanced with specific metaphor subcases. We discuss the advantages of this approach in allowing us to gain valuable insights into cross-linguistic metaphor commonalities and variation. To demonstrate its utility, we analyze corpus data from English and Spanish.


2019 ◽  
Vol 43 (4) ◽  
pp. 676-690
Author(s):  
Zehra Taskin ◽  
Umut Al

Purpose With the recent developments in information technologies, natural language processing (NLP) practices have made tasks in many areas easier and more practical. Nowadays, especially when big data are used in most research, NLP provides fast and easy methods for processing these data. The purpose of this paper is to identify subfields of library and information science (LIS) where NLP can be used and to provide a guide based on bibliometrics and social network analyses for researchers who intend to study this subject. Design/methodology/approach Within the scope of this study, 6,607 publications, including NLP methods published in the field of LIS, are examined and visualized by social network analysis methods. Findings After evaluating the obtained results, the subject categories of publications, frequently used keywords in these publications and the relationships between these words are revealed. Finally, the core journals and articles are classified thematically for researchers working in the field of LIS and planning to apply NLP in their research. Originality/value The results of this paper draw a general framework for LIS field and guides researchers on new techniques that may be useful in the field.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Arlene Casey ◽  
Emma Davidson ◽  
Michael Poon ◽  
Hang Dong ◽  
Daniel Duma ◽  
...  

Abstract Background Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. Methods We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. Results We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. Conclusions Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.


2010 ◽  
Vol 36 (1) ◽  
pp. 129-149 ◽  
Author(s):  
Paul Cook ◽  
Suzanne Stevenson

Newly coined words pose problems for natural language processing systems because they are not in a system's lexicon, and therefore no lexical information is available for such words. A common way to form new words is lexical blending, as in cosmeceutical, a blend of cosmetic and pharmaceutical. We propose a statistical model for inferring a blend's source words drawing on observed linguistic properties of blends; these properties are largely based on the recognizability of the source words in a blend. We annotate a set of 1,186 recently coined expressions which includes 515 blends, and evaluate our methods on a 324-item subset. In this first study of novel blends we achieve an accuracy of 40% on the task of inferring a blend's source words, which corresponds to a reduction in error rate of 39% over an informed baseline. We also give preliminary results showing that our features for source word identification can be used to distinguish blends from other kinds of novel words.


Sign in / Sign up

Export Citation Format

Share Document