A Survey on Machine Reading Comprehension—Tasks, Evaluation Metrics and Benchmark Datasets

Machine Reading Comprehension (MRC) is a challenging Natural Language Processing (NLP) research field with wide real-world applications. The great progress of this field in recent years is mainly due to the emergence of large-scale datasets and deep learning. At present, a lot of MRC models have already surpassed human performance on various benchmark datasets despite the obvious giant gap between existing MRC models and genuine human-level reading comprehension. This shows the need for improving existing datasets, evaluation metrics, and models to move current MRC models toward “real” understanding. To address the current lack of comprehensive survey of existing MRC tasks, evaluation metrics, and datasets, herein, (1) we analyze 57 MRC tasks and datasets and propose a more precise classification method of MRC tasks with 4 different attributes; (2) we summarized 9 evaluation metrics of MRC tasks, 7 attributes and 10 characteristics of MRC datasets; (3) We also discuss key open issues in MRC research and highlighted future research directions. In addition, we have collected, organized, and published our data on the companion website where MRC researchers could directly access each MRC dataset, papers, baseline projects, and the leaderboard.

Download Full-text

Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Wireless Communications and Mobile Computing ◽

10.1155/2021/5375334 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Changchang Zeng ◽

Shaobo Li

Keyword(s):

Reading Comprehension ◽

Language Processing ◽

Question Answering ◽

Multiple Choice ◽

Length Distribution ◽

Research Field ◽

Evaluation Framework ◽

Language Models ◽

Training Objective ◽

Machine Reading

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Download Full-text

Neural Machine Reading Comprehension: Methods and Trends

Applied Sciences ◽

10.3390/app9183698 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3698 ◽

Cited By ~ 9

Author(s):

Shanshan Liu ◽

Xin Zhang ◽

Sheng Zhang ◽

Hui Wang ◽

Weiming Zhang

Keyword(s):

Reading Comprehension ◽

Deep Learning ◽

Research Field ◽

The Past ◽

Learning Techniques ◽

Comprehensive Survey ◽

Recent Trends ◽

General Architecture ◽

Machine Reading ◽

Open Issues

Machine reading comprehension (MRC), which requires a machine to answer questions based on a given context, has attracted increasing attention with the incorporation of various deep-learning techniques over the past few years. Although research on MRC based on deep learning is flourishing, there remains a lack of a comprehensive survey summarizing existing approaches and recent trends, which motivated the work presented in this article. Specifically, we give a thorough review of this research field, covering different aspects including (1) typical MRC tasks: their definitions, differences, and representative datasets; (2) the general architecture of neural MRC: the main modules and prevalent approaches to each; and (3) new trends: some emerging areas in neural MRC as well as the corresponding challenges. Finally, considering what has been achieved so far, the survey also envisages what the future may hold by discussing the open issues left to be addressed.

Download Full-text

You can Try without Visiting: A Comprehensive Survey on Virtually Try-on Outfits

10.36227/techrxiv.13904099.v2 ◽

2021 ◽

Author(s):

Hajer Ghodhbani ◽

Adel Alimi ◽

Mohamed Neji ◽

Imran Razzak

Keyword(s):

Deep Learning ◽

Literature Review ◽

Research Field ◽

Future Research ◽

Fashion Industry ◽

Research Directions ◽

Comprehensive Literature Review ◽

Benchmark Datasets ◽

Comprehensive Survey ◽

Future Research Directions

<p>Our work aims to conduct a comprehensive literature review of deep learning methods applied in the fashion industry and, especially, the image-based virtual fitting task by citing research works published in the last years. We have summarized their challenges, their main frameworks, the popular benchmark datasets, and the different evaluation metrics. Also, some promising future research directions are discussed to propose improvements in this research field.</p>

Download Full-text

A Deep Cascade Model for Multi-Document Reading Comprehension

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017354 ◽

2019 ◽

Vol 33 ◽

pp. 7354-7361 ◽

Cited By ~ 3

Author(s):

Ming Yan ◽

Jiangnan Xia ◽

Chen Wu ◽

Bin Bi ◽

Zhongzhou Zhao ◽

...

Keyword(s):

Reading Comprehension ◽

Large Scale ◽

Question Answering ◽

Answer Extraction ◽

Benchmark Datasets ◽

Previous State ◽

Document Extraction ◽

Machine Reading ◽

System Effectiveness ◽

Extraction Experiment

A fundamental trade-off between effectiveness and efficiency needs to be balanced when designing an online question answering system. Effectiveness comes from sophisticated functions such as extractive machine reading comprehension (MRC), while efficiency is obtained from improvements in preliminary retrieval components such as candidate document selection and paragraph ranking. Given the complexity of the real-world multi-document MRC scenario, it is difficult to jointly optimize both in an end-to-end system. To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the documentlevel and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension. Specifically, irrelevant documents and paragraphs are first filtered out with simple functions for efficiency consideration. Then we jointly train three modules on the remaining texts for better tracking the answer: the document extraction, the paragraph extraction and the answer extraction. Experiment results show that the proposed method outperforms the previous state-of-the-art methods on two large-scale multidocument benchmark datasets, i.e., TriviaQA and DuReader. In addition, our online system can stably serve typical scenarios with millions of daily requests in less than 50ms.

Download Full-text

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Journal Of Big Data ◽

10.1186/s40537-021-00444-8 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Laith Alzubaidi ◽

Jinglan Zhang ◽

Amjad J. Humaidi ◽

Ayad Al-Dujaili ◽

Ye Duan ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Human Performance ◽

Medical Information ◽

Holistic Approach ◽

Computing Paradigm ◽

Starting Point ◽

Wide Range ◽

Benchmark Datasets ◽

Comprehensive Survey

AbstractIn the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.

Download Full-text

Keyword extraction method for machine reading comprehension based on natural language processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1955/1/012072 ◽

2021 ◽

Vol 1955 (1) ◽

pp. 012072

Author(s):

Ruiheng Li ◽

Xuan Zhang ◽

Chengdong Li ◽

Zhongju Zheng ◽

Zihang Zhou ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Extraction Method ◽

Keyword Extraction ◽

Machine Reading

Download Full-text

Opportunistic Large Array Propagation Models: A Comprehensive Survey

Sensors ◽

10.3390/s21124206 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4206

Author(s):

Farhan Nawaz ◽

Hemant Kumar ◽

Syed Ali Hassan ◽

Haejoon Jung

Keyword(s):

Energy Efficient ◽

Large Scale ◽

Performance Metrics ◽

Experimental Studies ◽

Cooperative Transmission ◽

Analytical Models ◽

Future Research ◽

Large Array ◽

Propagation Models ◽

Comprehensive Survey

Enabled by the fifth-generation (5G) and beyond 5G communications, large-scale deployments of Internet-of-Things (IoT) networks are expected in various application fields to handle massive machine-type communication (mMTC) services. Device-to-device (D2D) communications can be an effective solution in massive IoT networks to overcome the inherent hardware limitations of small devices. In such D2D scenarios, given that a receiver can benefit from the signal-to-noise-ratio (SNR) advantage through diversity and array gains, cooperative transmission (CT) can be employed, so that multiple IoT nodes can create a virtual antenna array. In particular, Opportunistic Large Array (OLA), which is one type of CT technique, is known to provide fast, energy-efficient, and reliable broadcasting and unicasting without prior coordination, which can be exploited in future mMTC applications. However, OLA-based protocol design and operation are subject to network models to characterize the propagation behavior and evaluate the performance. Further, it has been shown through some experimental studies that the most widely-used model in prior studies on OLA is not accurate for networks with networks with low node density. Therefore, stochastic models using quasi-stationary Markov chain are introduced, which are more complex but more exact to estimate the key performance metrics of the OLA transmissions in practice. Considering the fact that such propagation models should be selected carefully depending on system parameters such as network topology and channel environments, we provide a comprehensive survey on the analytical models and framework of the OLA propagation in the literature, which is not available in the existing survey papers on OLA protocols. In addition, we introduce energy-efficient OLA techniques, which are of paramount importance in energy-limited IoT networks. Furthermore, we discuss future research directions to combine OLA with emerging technologies.

Download Full-text

An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/525 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xin Liu ◽

Kai Liu ◽

Xiang Li ◽

Jinsong Su ◽

Yubin Ge ◽

...

Keyword(s):

Reading Comprehension ◽

Knowledge Transfer ◽

Training Data ◽

Target Domain ◽

Domain Specific ◽

Mutual Knowledge ◽

Benchmark Datasets ◽

Knowledge Distillation ◽

The Many ◽

Machine Reading

The lack of sufficient training data in many domains, poses a major challenge to the construction of domain-specific machine reading comprehension (MRC) models with satisfying performance. In this paper, we propose a novel iterative multi-source mutual knowledge transfer framework for MRC. As an extension of the conventional knowledge transfer with one-to-one correspondence, our framework focuses on the many-to-many mutual transfer, which involves synchronous executions of multiple many-to-one transfers in an iterative manner.Specifically, to update a target-domain MRC model, we first consider other domain-specific MRC models as individual teachers, and employ knowledge distillation to train a multi-domain MRC model, which is differentially required to fit the training data and match the outputs of these individual models according to their domain-level similarities to the target domain. After being initialized by the multi-domain MRC model, the target-domain MRC model is fine-tuned to match both its training data and the output of its previous best model simultaneously via knowledge distillation. Compared with previous approaches, our framework can continuously enhance all domain-specific MRC models by enabling each model to iteratively and differentially absorb the domain-shared knowledge from others. Experimental results and in-depth analyses on several benchmark datasets demonstrate the effectiveness of our framework.

Download Full-text

A Comprehensive Taxonomy of Dynamic Texture Representation

ACM Computing Surveys ◽

10.1145/3487892 ◽

2023 ◽

Vol 55 (1) ◽

pp. 1-39

Author(s):

Thanh Tuan Nguyen ◽

Thanh Phuong Nguyen

Keyword(s):

Large Scale ◽

Environmental Changes ◽

State Of The Art ◽

The State ◽

Future Research ◽

Research Activities ◽

Potential Applications ◽

Benchmark Datasets ◽

Negative Impacts ◽

Made In

Representing dynamic textures (DTs) plays an important role in many real implementations in the computer vision community. Due to the turbulent and non-directional motions of DTs along with the negative impacts of different factors (e.g., environmental changes, noise, illumination, etc.), efficiently analyzing DTs has raised considerable challenges for the state-of-the-art approaches. For 20 years, many different techniques have been introduced to handle the above well-known issues for enhancing the performance. Those methods have shown valuable contributions, but the problems have been incompletely dealt with, particularly recognizing DTs on large-scale datasets. In this article, we present a comprehensive taxonomy of DT representation in order to purposefully give a thorough overview of the existing methods along with overall evaluations of their obtained performances. Accordingly, we arrange the methods into six canonical categories. Each of them is then taken in a brief presentation of its principal methodology stream and various related variants. The effectiveness levels of the state-of-the-art methods are then investigated and thoroughly discussed with respect to quantitative and qualitative evaluations in classifying DTs on benchmark datasets. Finally, we point out several potential applications and the remaining challenges that should be addressed in further directions. In comparison with two existing shallow DT surveys (i.e., the first one is out of date as it was made in 2005, while the newer one (published in 2016) is an inadequate overview), we believe that our proposed comprehensive taxonomy not only provides a better view of DT representation for the target readers but also stimulates future research activities.

Download Full-text

Interactive Dual Attention Network for Text Sentiment Classification

Computational Intelligence and Neuroscience ◽

10.1155/2020/8858717 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Yinglin Zhu ◽

Wenbin Zheng ◽

Hong Tang

Keyword(s):

Language Processing ◽

Classification Performance ◽

Research Field ◽

Sentiment Classification ◽

Attention Network ◽

Linguistic Resources ◽

Benchmark Datasets ◽

Interactive Relationship ◽

Conventional Machine ◽

Result Analysis

Text sentiment classification is an essential research field of natural language processing. Recently, numerous deep learning-based methods for sentiment classification have been proposed and achieved better performances compared with conventional machine learning methods. However, most of the proposed methods ignore the interactive relationship between contextual semantics and sentimental tendency while modeling their text representation. In this paper, we propose a novel Interactive Dual Attention Network (IDAN) model that aims to interactively learn the representation between contextual semantics and sentimental tendency information. Firstly, we design an algorithm that utilizes linguistic resources to obtain sentimental tendency information from text and then extract word embeddings from the BERT (Bidirectional Encoder Representations from Transformers) pretraining model as the embedding layer of IDAN. Next, we use two Bidirectional LSTM (BiLSTM) networks to learn the long-range dependencies of contextual semantics and sentimental tendency information, respectively. Finally, two types of attention mechanisms are implemented in IDAN. One is multihead attention, which is the next layer of BiLSTM and is used to learn the interactive relationship between contextual semantics and sentimental tendency information. The other is global attention that aims to make the model focus on the important parts of the sequence and generate the final representation for classification. These two attention mechanisms enable IDAN to interactively learn the relationship between semantics and sentimental tendency information and improve the classification performance. A large number of experiments on four benchmark datasets show that our IDAN model is superior to competitive methods. Moreover, both the result analysis and the attention weight visualization further demonstrate the effectiveness of our proposed method.

Download Full-text