New Media for Endangered Languages

Electronically Mediated Communication

New Media ◽

Natural Language ◽

Language Processing ◽

Good News ◽

Endangered Languages ◽

Mediated Communication ◽

Language Community ◽

The World ◽

This chapter presents three technologies essential to enabling any language in the digital domain: language identifiers (ISO 639-3), Unicode (including fonts and keyboards), and the building of corpora to enable natural language processing. Just a few major languages of the world are well-enabled for use with electronically mediated communication. Another few hundred languages are arguably on their way to being well-enabled, if for market reasons alone. For all the remaining languages of the world, inclusion in the digital domain remains a distant possibility, and one that likely requires sustained interest, attention, and resources on the part of the language community itself. The good news is that the same technologies that enable the more widespread languages can also enable the less widespread, and even endangered ones, and bootstrapping is possible for all of them. The examples and resources described in this chapter can serve as inspiration and guidance in getting started.

Building natural language processing tools for Runyakitara

Applied Linguistics Review ◽

10.1515/applirev-2020-2004 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Fridah Katushemererwe ◽

Andrew Caines ◽

Paula Buttery

Keyword(s):

Natural Language ◽

Language Learning ◽

Language Processing ◽

Primary Data ◽

Computer Assisted ◽

Endangered Languages ◽

Test Case ◽

Short Supply ◽

Linguistic Resources

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.

A WORD-BASED CHINESE LANGUAGE UNDERSTANDING SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001488000042 ◽

1988 ◽

Vol 02 (01) ◽

pp. 25-35

Author(s):

TIAN-SHUN YAO

Keyword(s):

Natural Language ◽

Language Processing ◽

Chinese Language ◽

Computer Programs ◽

World Knowledge ◽

Knowledge Source ◽

Language Understanding ◽

Language Analysis ◽

With the word-based theory of natural language processing, a word-based Chinese language understanding system has been developed. In the light of psychological language analysis and the features of the Chinese language, this theory of natural language processing is presented with the description of the computer programs based on it. The heart of the system is to define a Total Information Dictionary and the World Knowledge Source used in the system. The purpose of this research is to develop a system which can understand not only Chinese sentences but also the whole text.

How Language Shapes Prejudice Against Women: An Examination Across 45 World Languages

10.31234/osf.io/mrbcf ◽

2020 ◽

Author(s):

David DeFranza ◽

Himanshu Mishra ◽

Arul Mishra

Keyword(s):

Natural Language ◽

Language Processing ◽

Ongoing Debate ◽

Text Data ◽

Gender Prejudice ◽

World Languages ◽

The World ◽

Present Context ◽

The Common

Language provides an ever-present context for our cognitions and has the ability to shape them. Languages across the world can be gendered (language in which the form of noun, verb, or pronoun is presented as female or male) versus genderless. In an ongoing debate, one stream of research suggests that gendered languages are more likely to display gender prejudice than genderless languages. However, another stream of research suggests that language does not have the ability to shape gender prejudice. In this research, we contribute to the debate by using a Natural Language Processing (NLP) method which captures the meaning of a word from the context in which it occurs. Using text data from Wikipedia and the Common Crawl project (which contains text from billions of publicly facing websites) across 45 world languages, covering the majority of the world’s population, we test for gender prejudice in gendered and genderless languages. We find that gender prejudice occurs more in gendered rather than genderless languages. Moreover, we examine whether genderedness of language influences the stereotypic dimensions of warmth and competence utilizing the same NLP method.

Money Makes the World Go Frowned. Analyzing the Impact of Chinese Foreign Aid on States’ Sentiment Using Natural Language Processing

Chinas Rolle in einer neuen Weltordnung ◽

10.5771/9783828876361-241 ◽

2021 ◽

pp. 241-264

Author(s):

Dennis Hammerschmidt ◽

Cosima Meyer

Keyword(s):

Natural Language ◽

Foreign Aid ◽

Language Processing ◽

The World ◽

Chinese Foreign Aid ◽

The Impact

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

Fake News Detection using Deep Learning

10.35940/ijitee.i7059.079920 ◽

2020 ◽

Vol 9 (9) ◽

pp. 226-228

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language ◽

Language Processing ◽

Deep Neural Network ◽

Negative Impact ◽

Research Area ◽

Fake News ◽

News is a routine in everyone's life. It helps in enhancing the knowledge on what happens around the world. Fake news is a fictional information madeup with the intension to delude and hence the knowledge acquired becomes of no use. As fake news spreads extensively it has a negative impact in the society and so fake news detection has become an emerging research area. The paper deals with a solution to fake news detection using the methods, deep learning and Natural Language Processing. The dataset is trained using deep neural network. The dataset needs to be well formatted before given to the network which is made possible using the technique of Natural Language Processing and thus predicts whether a news is fake or not.

Multilingual Facilitation

10.31885/9789515150257 ◽

2021 ◽

Keyword(s):

Natural Language ◽

Open Source ◽

Language Processing ◽

Scientific Work ◽

Endangered Languages ◽

Sign Languages ◽

Digital Resources ◽

Digital Documentation ◽

Greater Good

This is the Festschrift of Dr. Jack Rueter. The book presents peer-reviewed scientific work from Dr. Rueter’s colleagues related to the latest advances in natural language processing, digital resources and endangered languages in a variety of languages such as historical English, Chukchi, Mansi, Erzya, Komi, Finnish, Apurina, Sign Languages, Sami languages and Japanese. Most of the papers present work on endangered languages or on domains with a limited number of resources available for NLP. This book collects original and insightful papers from well-established researchers in NLP, linguistics, philology and digital humanities. This is a tribute to Dr. Rueter’s long career that is characterized by constant altruistic work towards a greater good in building free and open-source tools and resources for endangered languages. Dr. Rueter is a true pioneer in the field of digital documentation of endangered languages.

Trend of Social Media News: A Viewpoint of COVID-19 Tweets Using Natural Language Processing

10.31219/osf.io/zjkra ◽

2021 ◽

Author(s):

AISDL

Keyword(s):

Social Media ◽

Natural Language ◽

Language Processing ◽

Developed Countries ◽

World Health ◽

The World ◽

Twitter Users ◽

The Developed Countries ◽

Health Organization

The meteoric rise of social media news during the ongoing COVID-19 is worthy of advanced research. Freedom of speech in many parts of the world, especially the developed countries and liberty of socialization, calls for noteworthy information sharing during the panic pandemic. However, as a communication intervention during crises in the past, social media use is remarkable; the Tweets generated via Twitter during the ongoing COVID-19 is incomparable with the former records. This study examines social media news trends and compares the Tweets on COVID-19 as a corpus from Twitter. By deploying Natural Language Processing (NLP) methods on tweets, we were able to extract and quantify the similarities between some tweets over time, which means that some people say the same thing about the pandemic while other Twitter users view it differently. The tools we used are Spacy, Networkx, WordCloud, and Re. This study contributes to the social media literature by understanding the similarity and divergence of COVID-19 tweets of the public and health agencies such as the World Health Organization (WHO). The study also sheds more light on the COVID-19 sparse and densely text network and their implications for the policymakers. The study explained the limitations and proposed future studies.

Checking in on grammar checking

Natural Language Engineering ◽

10.1017/s1351324916000061 ◽

2016 ◽

Vol 22 (3) ◽

pp. 491-495 ◽

Cited By ~ 1

Author(s):

ROBERT DALE

Keyword(s):

Natural Language ◽

Language Processing ◽

AbstractTen years ago, Microsoft Word's grammar checker was really the only game in town. The software world, and the world of natural language processing, have changed a lot in that time, so what does the grammar checker marketplace have to offer today?

A survey of diacritic restoration in abjad and alphabet writing systems

Natural Language Engineering ◽

10.1017/s1351324917000407 ◽

2017 ◽

Vol 24 (1) ◽

pp. 123-154 ◽

Cited By ~ 2

Author(s):

FRANKLIN ỌLÁDIÍPỌ̀ ASAHIAH ◽

ỌDẸ́TÚNJÍ ÀJÀDÍ ỌDẸ́JỌBÍ ◽

EMMANUEL RÓTÌMÍ ADÁGÚNODÒ

Keyword(s):

Natural Language ◽

Language Processing ◽

Evaluation Metrics ◽

Writing System ◽

Writing Systems ◽

The World ◽

Intractable Problems

AbstractA diacritic is a mark placed near or through a character to alter its original phonetic or orthographic value. Many languages around the world use diacritics in their orthography, whatever the writing system the orthography is based on. In many languages, diacritics are ignored either by convention or as a matter of convenience. For users who are not familiar with the text domain, the absence of diacritics within text has been known to cause mild to serious readability and comprehension problems. However, the absence of diacritics in text causes near-intractable problems for natural language processing systems. This situation has led to extensive research on diacritization. Several techniques have been applied to address diacritic restoration (or diacritization) but the existing surveys of techniques have been restricted to some languages and hence left gaps for practitioners to fill. Our survey examined diacritization from the angle of resources deployed and various formulation employed for diacritization. It was concluded by recommending that (a) any proposed technique for diacritization should consider the language features and the purpose served by diacritics, (b) that evaluation metrics needed to be more rigorously defined for easy comparison of performance of models.

Ontología y Procesamiento de Lenguaje Natural

KnE Engineering ◽

10.18502/keg.v3i1.1453 ◽

2018 ◽

Vol 3 (1) ◽

pp. 492

Author(s):

Denis Cedeño Moreno ◽

Miguel Vargas Lombardo

Keyword(s):

Artificial Intelligence ◽

Knowledge Management ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Management Model ◽

Ontological Engineering ◽

Design And Implementation ◽

At present, the convergence of several areas of knowledge has led to the design and implementation of ICT systems that support the integration of heterogeneous tools, such as artificial intelligence (AI), statistics and databases (BD), among others. Ontologies in computing are included in the world of AI and refer to formal representations of an area of knowledge or domain. The discipline that is in charge of the study and construction of tools to accelerate the process of creation of ontologies from the natural language is the ontological engineering. In this paper, we propose a knowledge management model based on the clinical histories of patients (HC) in Panama, based on information extraction (EI), natural language processing (PLN) and the development of a domain ontology.Keywords: Knowledge, information extraction, ontology, automatic population of ontologies, natural language processing.