shared task Latest Research Papers

Multilingual Offensive Language Identification for Low-resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3457610 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-13

Author(s):

Tharindu Ranasinghe ◽

Marcos Zampieri

Keyword(s):

Transfer Learning ◽

Hate Speech ◽

Training And Development ◽

Language Identification ◽

Shared Task ◽

Low Resource ◽

Government Organizations ◽

Cross Lingual ◽

Offensive Language ◽

Clear Majority

Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g., hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this article, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task [23], 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020 [58], 0.8568 F1 macro for Hindi in HASOC 2019 shared task [27], and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) [7], showing that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages. Additionally, we report competitive performance on Arabic and Turkish using the training and development sets of OffensEval 2020 shared task. The results for all languages confirm the robustness of cross-lingual contextual embeddings and transfer learning for this task.

Fake news spreaders profiling using N-grams of various types and SHAP-based feature selection

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219233 ◽

2021 ◽

pp. 1-12

Author(s):

Fazlourrahman Balouchzahi ◽

Grigori Sidorov ◽

Hosahalli Lakshmaiah Shashirekha

Keyword(s):

Language Processing ◽

Support Vector ◽

Feature Engineering ◽

Learning Approaches ◽

Fake News ◽

Shared Task ◽

Kernel Logistic Regression ◽

Average Accuracy ◽

Rbf Kernel ◽

Character Sequences

Complex learning approaches along with complicated and expensive features are not always the best or the only solution for Natural Language Processing (NLP) tasks. Despite huge progress and advancements in learning approaches such as Deep Learning (DL) and Transfer Learning (TL), there are many NLP tasks such as Text Classification (TC), for which basic Machine Learning (ML) classifiers perform superior to DL or TL approaches. Added to this, an efficient feature engineering step can significantly improve the performance of ML based systems. To check the efficacy of ML based systems and feature engineering on TC, this paper explores char, character sequences, syllables, word n-grams as well as syntactic n-grams as features and SHapley Additive exPlanations (SHAP) values to select the important features from the collection of extracted features. Voting Classifiers (VC) with soft and hard voting of four ML classifiers, namely: Support Vector Machine (SVM) with Linear and Radial Basis Function (RBF) kernel, Logistic Regression (LR), and Random Forest (RF) was trained and evaluated on Fake News Spreaders Profiling (FNSP) shared task dataset in PAN 2020. This shared task consists of profiling fake news spreaders in English and Spanish languages. The proposed models exhibited an average accuracy of 0.785 for both languages in this shared task and outperformed the best models submitted to this task.

On Organizing a Shared Task for the Digital Humanities – Conclusions and Future Paths

Journal of Cultural Analytics ◽

10.22148/001c.30697 ◽

2021 ◽

Author(s):

Evelyn Gius ◽

Marcus Willand ◽

Nils Reiter

Keyword(s):

Digital Humanities ◽

Shared Task

Exploration of Effective Attention Strategies for Neural Automatic Post-editing with Transformer

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3465383 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-17

Author(s):

Jaehun Shin ◽

Wonkee Lee ◽

Byung-Hyun Go ◽

Baikjin Jung ◽

Youngkil Kim ◽

...

Keyword(s):

Controlled Environment ◽

Shared Task ◽

Translation Quality ◽

Source Sentence ◽

The Individual ◽

Joint Encoding ◽

Translation Errors ◽

Encoding Strategy ◽

Proper Strategy

Automatic post-editing (APE) is the study of correcting translation errors in the output of an unknown machine translation (MT) system and has been considered as a method of improving translation quality without any modification to conventional MT systems. Recently, several variants of Transformer that take both the MT output and its corresponding source sentence as inputs have been proposed for APE; and models introducing an additional attention layer into the encoder to jointly encode the MT output with its source sentence recorded a high-rank in the WMT19 APE shared task. We examine the effectiveness of such joint-encoding strategy in a controlled environment and compare four types of decoder multi-source attention strategies that have been introduced into previous APE models. The experimental results indicate that the joint-encoding strategy is effective and that taking the final encoded representation of the source sentence is the more proper strategy than taking such representation within the same encoder stack. Furthermore, among the multi-source attention strategies combined with the joint-encoding, the strategy that applies attention to the concatenated input representation and the strategy that adds up the individual attention to each input improve the quality of APE results over the strategy using the joint-encoding only.

Automatisiertes klinisches Codieren

Information - Wissenschaft & Praxis ◽

10.1515/iwp-2021-2174 ◽

2021 ◽

Vol 72 (5-6) ◽

pp. 285-290

Author(s):

Susan Illing

Keyword(s):

Mean Average Precision ◽

Shared Task ◽

Average Precision

Zusammenfassung Die in diesem Artikel vorgestellte Bachelorarbeit behandelt die Ergebnisse einer Shared Task im Bereich eHealth. Es wird untersucht, ob die Klassifikationsgenauigkeit ausgewählter klinischer Codiersysteme durch den Einsatz von Ensemble-Methoden verbessert werden kann. Entscheidend dafür sind die Werte der Evaluationsmaße Mean Average Precision und F1-Maß.

Multistage BiCross encoder for multilingual access to COVID-19 health information

PLoS ONE ◽

10.1371/journal.pone.0256874 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0256874

Author(s):

Iknoor Singh ◽

Carolina Scarton ◽

Kalina Bontcheva

Keyword(s):

Health Information ◽

State Of The Art ◽

Information Access ◽

Semantic Search ◽

Retrieval Algorithm ◽

Shared Task ◽

High Recall ◽

The Given ◽

Multiple Languages ◽

Search And Retrieval

The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.

The TAL System for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Childrens Speech

10.21437/interspeech.2021-1104 ◽

2021 ◽

Author(s):

Gaopeng Xu ◽

Song Yang ◽

Lu Ma ◽

Chengfei Li ◽

Zhongqin Wu

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Shared Task

Low Resource German ASR with Untranscribed Data Spoken by Non-Native Children — INTERSPEECH 2021 Shared Task SPAPL System

10.21437/interspeech.2021-1974 ◽

2021 ◽

Author(s):

Jinhan Wang ◽

Yunzheng Zhu ◽

Ruchao Fan ◽

Wei Chu ◽

Abeer Alwan

Keyword(s):

Shared Task ◽

Low Resource

ETLT 2021: Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech

10.21437/interspeech.2021-1237 ◽

2021 ◽

Author(s):

R. Gretter ◽

Marco Matassoni ◽

D. Falavigna ◽

A. Misra ◽

C.W. Leong ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Shared Task

Corrigendum to: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocab153 ◽

2021 ◽

Author(s):

Sam Henry ◽

Yanshan Wang ◽

Feichen Shen ◽

Ozlem Uzuner

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Shared Task ◽

Clinical Records ◽

Clinical Concept

shared task
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multilingual Offensive Language Identification for Low-resource Languages

Fake news spreaders profiling using N-grams of various types and SHAP-based feature selection

On Organizing a Shared Task for the Digital Humanities – Conclusions and Future Paths

Exploration of Effective Attention Strategies for Neural Automatic Post-editing with Transformer

Automatisiertes klinisches Codieren

Multistage BiCross encoder for multilingual access to COVID-19 health information

The TAL System for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Childrens Speech

Low Resource German ASR with Untranscribed Data Spoken by Non-Native Children — INTERSPEECH 2021 Shared Task SPAPL System

ETLT 2021: Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech

Corrigendum to: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records

Export Citation Format

shared taskRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multilingual Offensive Language Identification for Low-resource Languages

Fake news spreaders profiling using N-grams of various types and SHAP-based feature selection

On Organizing a Shared Task for the Digital Humanities – Conclusions and Future Paths

Exploration of Effective Attention Strategies for Neural Automatic Post-editing with Transformer

Automatisiertes klinisches Codieren

Multistage BiCross encoder for multilingual access to COVID-19 health information

The TAL System for the INTERSPEECH2021 Shared Task on Automatic Speech Recognition for Non-Native Childrens Speech

Low Resource German ASR with Untranscribed Data Spoken by Non-Native Children — INTERSPEECH 2021 Shared Task SPAPL System

ETLT 2021: Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech

Corrigendum to: The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records

shared task
Recently Published Documents