low resource Latest Research Papers

Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.

Download Full-text

A modified six-minute walk test (6MWT) for low-resource settings-a cross-sectional study

Heart & Lung ◽

10.1016/j.hrtlng.2021.12.008 ◽

2022 ◽

Vol 52 ◽

pp. 117-122

Author(s):

Brittany Fell ◽

Susan Hanekom ◽

Martin Heine

Keyword(s):

Cross Sectional Study ◽

Walk Test ◽

Sectional Study ◽

Cross Sectional ◽

Low Resource Settings ◽

Low Resource ◽

Six Minute Walk Test ◽

Minute Walk

Download Full-text

Risk factors for cervical cancer among distinct populations in low-resource countries: feasibility of cervical cancer screen-and-treat program on ukerewe island of lake victoria, Tanzania

Current Opinion in Obstetrics & Gynecology ◽

10.1097/gco.0000000000000758 ◽

2022 ◽

Vol 34 (1) ◽

pp. 20-27

Author(s):

Anjali Y. Hari ◽

Megan Bernstein ◽

Jamie Temko ◽

Danielle E. Brabender ◽

Aricia Shen ◽

...

Keyword(s):

Risk Factors ◽

Cervical Cancer ◽

Lake Victoria ◽

Cancer Screen ◽

Low Resource ◽

Cervical Cancer Screen

Download Full-text

Interleukin-18 binding protein in infants and children hospitalized with pneumonia in low-resource settings

Cytokine ◽

10.1016/j.cyto.2021.155775 ◽

2022 ◽

Vol 150 ◽

pp. 155775

Author(s):

Emily R. Konrad ◽

Jeremy Soo ◽

Andrea L. Conroy ◽

Sophie Namasopo ◽

Robert O. Opoka ◽

...

Keyword(s):

Binding Protein ◽

Infants And Children ◽

Interleukin 18 ◽

Low Resource Settings ◽

Low Resource ◽

In Infants And Children

Download Full-text

Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation

Tsinghua Science & Technology ◽

10.26599/tst.2020.9010029 ◽

2022 ◽

Vol 27 (1) ◽

pp. 150-163 ◽

Cited By ~ 1

Author(s):

Mieradilijiang Maimaiti ◽

Yang Liu ◽

Huanbo Luan ◽

Maosong Sun

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3472619 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-20

Author(s):

Arkadipta De ◽

Dibyanayan Bandyopadhyay ◽

Baban Gain ◽

Asif Ekbal

Keyword(s):

Language Processing ◽

English Language ◽

Neural Model ◽

Fake News ◽

Web Content ◽

Low Resource ◽

Domain Specific ◽

Cross Domain ◽

Current State ◽

Domain Transfer

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.

Download Full-text

Multilingual Offensive Language Identification for Low-resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3457610 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-13

Author(s):

Tharindu Ranasinghe ◽

Marcos Zampieri

Keyword(s):

Transfer Learning ◽

Hate Speech ◽

Training And Development ◽

Language Identification ◽

Shared Task ◽

Low Resource ◽

Government Organizations ◽

Cross Lingual ◽

Offensive Language ◽

Clear Majority

Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task [23], 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020 [58], 0.8568 F1 macro for Hindi in HASOC 2019 shared task [27], and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) [7], showing that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages. Additionally, we report competitive performance on Arabic and Turkish using the training and development sets of OffensEval 2020 shared task. The results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task.

Download Full-text

Enriching Conventional Ensemble Learner with Deep Contextual Semantics to Detect Fake News in Urdu

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3461614 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-19

Author(s):

Ramsha Saeed ◽

Hammad Afzal ◽

Haider Abbas ◽

Maheen Fatima

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

English Language ◽

Information Dissemination ◽

Majority Voting ◽

Public Image ◽

Gradient Boosting ◽

Fake News ◽

Adaptive Boosting ◽

Low Resource

Increased connectivity has contributed greatly in facilitating rapid access to information and reliable communication. However, the uncontrolled information dissemination has also resulted in the spread of fake news. Fake news might be spread by a group of people or organizations to serve ulterior motives such as political or financial gains or to damage a country’s public image. Given the importance of timely detection of fake news, the research area has intrigued researchers from all over the world. Most of the work for detecting fake news focuses on the English language. However, automated detection of fake news is important irrespective of the language used for spreading false information. Recognizing the importance of boosting research on fake news detection for low resource languages, this work proposes a novel semantically enriched technique to effectively detect fake news in Urdu—a low resource language. A model based on deep contextual semantics learned from the convolutional neural network is proposed. The features learned from the convolutional neural network are combined with other n-gram-based features and are fed to a conventional majority voting ensemble classifier fitted with three base learners: Adaptive Boosting, Gradient Boosting, and Multi-Layer Perceptron. Experiments are performed with different models, and results show that enriching the traditional ensemble learner with deep contextual semantics along with other standard features shows the best results and outperforms the state-of-the-art Urdu fake news detection model.

Download Full-text

Denigrate Comment Detection in Low-Resource Hindi Language Using Attention-Based Residual Networks

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3431729 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-14

Author(s):

Saurabh R. Sangwan ◽

M. P. S. Bhatia

Keyword(s):

English Language ◽

Activation Function ◽

Low Resource ◽

Or Groups ◽

Proposed Model ◽

Public Data ◽

Positive Class ◽

Negative Class ◽

Hindi Language ◽

Online Reputation

Cyberspace has been recognized as a conducive environment for use of various hostile, direct, and indirect behavioural tactics to target individuals or groups. Denigration is one of the most frequently used cyberbullying ploys to actively damage, humiliate, and disparage the online reputation of target by sending, posting, or publishing cruel rumours, gossip, and untrue statements. Previous pertinent studies report detecting profane, vulgar, and offensive words primarily in the English language. This research puts forward a model to detect online denigration bullying in low-resource Hindi language using attention residual networks. The proposed model Hindi Denigrate Comment–Attention Residual Network (HDC-ARN) intends to uncover defamatory posts (denigrate comments) written in Hindi language which stake and vilify a person or an entity in public. Data with 942 denigrate comments and 1499 non-denigrate comments is scraped using certain hashtags from two recent trending events in India: Tablighi Jamaat spiked Covid-19 (April 2020, Event 1) and Sushant Singh Rajput Death (June 2020: Event 2). Only text-based features, that is, the actual content of the post, are considered. The pre-trained word embedding for Hindi language from fastText is used. The model has three ResNet blocks with an attention layer that generates a post vector for a single input, which is passed through a sigmoid activation function to get the final output as either denigrate (positive class) or non-denigrate (negative class). An F-1 score of 0.642 is achieved on the dataset.

Download Full-text

Introduction to Special Issue on Misinformation, Fake News and Rumor Detection in Low-Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3505588 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-3

Author(s):

Akshi Kumar ◽

Christian Esposito ◽

Dimitrios A. Karras

Keyword(s):

Fake News ◽

Special Issue ◽

Low Resource ◽

Rumor Detection

Download Full-text

low resource
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation

A modified six-minute walk test (6MWT) for low-resource settings-a cross-sectional study

Risk factors for cervical cancer among distinct populations in low-resource countries: feasibility of cervical cancer screen-and-treat program on ukerewe island of lake victoria, Tanzania

Interleukin-18 binding protein in infants and children hospitalized with pneumonia in low-resource settings

Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

Multilingual Offensive Language Identification for Low-resource Languages

Enriching Conventional Ensemble Learner with Deep Contextual Semantics to Detect Fake News in Urdu

Denigrate Comment Detection in Low-Resource Hindi Language Using Attention-Based Residual Networks

Introduction to Special Issue on Misinformation, Fake News and Rumor Detection in Low-Resource Languages

Export Citation Format

low resourceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation

A modified six-minute walk test (6MWT) for low-resource settings-a cross-sectional study

Risk factors for cervical cancer among distinct populations in low-resource countries: feasibility of cervical cancer screen-and-treat program on ukerewe island of lake victoria, Tanzania

Interleukin-18 binding protein in infants and children hospitalized with pneumonia in low-resource settings

Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

Multilingual Offensive Language Identification for Low-resource Languages

Enriching Conventional Ensemble Learner with Deep Contextual Semantics to Detect Fake News in Urdu

Denigrate Comment Detection in Low-Resource Hindi Language Using Attention-Based Residual Networks

Introduction to Special Issue on Misinformation, Fake News and Rumor Detection in Low-Resource Languages

low resource
Recently Published Documents