Automatic question generation based on sentence structure analysis using machine learning approach

Natural Language Engineering ◽

10.1017/s1351324921000139 ◽

2021 ◽

pp. 1-31

Author(s):

Miroslav Blšták ◽

Viera Rozinajová

Keyword(s):

Machine Learning ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Sentence Structure ◽

Question Generation ◽

Transformation Rules ◽

Input Text ◽

Automatic Question Generation

Abstract Automatic question generation is one of the most challenging tasks of Natural Language Processing. It requires “bidirectional” language processing: first, the system has to understand the input text (Natural Language Understanding), and it then has to generate questions also in the form of text (Natural Language Generation). In this article, we introduce our framework for generating the factual questions from unstructured text in the English language. It uses a combination of traditional linguistic approaches based on sentence patterns with several machine learning methods. We first obtain lexical, syntactic and semantic information from an input text, and we then construct a hierarchical set of patterns for each sentence. The set of features is extracted from the patterns, and it is then used for automated learning of new transformation rules. Our learning process is totally data-driven because the transformation rules are obtained from a set of initial sentence–question pairs. The advantages of this approach lie in a simple expansion of new transformation rules which allows us to generate various types of questions and also in the continuous improvement of the system by reinforcement learning. The framework also includes a question evaluation module which estimates the quality of generated questions. It serves as a filter for selecting the best questions and eliminating incorrect ones or duplicates. We have performed several experiments to evaluate the correctness of generated questions, and we have also compared our system with several state-of-the-art systems. Our results indicate that the quality of generated questions outperforms the state-of-the-art systems and our questions are also comparable to questions created by humans. We have also created and published an interface with all created data sets and evaluated questions, so it is possible to follow up on our work.

Download Full-text

A novel framework for Automatic Chinese Question Generation based on multi-feature neural network model

Computer Science and Information Systems ◽

10.2298/csis171121018z ◽

2018 ◽

Vol 15 (3) ◽

pp. 487-499 ◽

Cited By ~ 3

Author(s):

Hai-Tao Zheng ◽

Jinxin Han ◽

Jinyuan Chen ◽

Arun Sangaiah

Keyword(s):

Neural Network ◽

Network Model ◽

Language Processing ◽

Neural Network Model ◽

State Of The Art ◽

Ranking Methods ◽

Question Generation ◽

Human Evaluation ◽

Automatic Question Generation

Automatic question generation from text or paragraph is a great challenging task which attracts broad attention in natural language processing. Because of the verbose texts and fragile ranking methods, the quality of top generated questions is poor. In this paper, we present a novel framework Automatic Chinese Question Generation (ACQG) to generate questions from text or paragraph. In ACQG, we use an adopted TextRank to extract key sentences and a template-based method to construct questions from key sentences. Then a multi-feature neural network model is built for ranking to obtain the top questions. The automatic evaluation result reveals that the proposed framework outperforms the state-of-the-art systems in terms of perplexity. In human evaluation, questions generated by ACQG rate a higher score.

Download Full-text

Automatic Identification of Information Quality Metrics in Health News Stories

Frontiers in Public Health ◽

10.3389/fpubh.2020.515347 ◽

2020 ◽

Vol 8 ◽

Author(s):

Majed Al-Jefri ◽

Roger Evans ◽

Joon Lee ◽

Pietro Ghezzi

Keyword(s):

Machine Learning ◽

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Information Quality ◽

Evaluation Process ◽

Health News ◽

News Stories

Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning.Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT.Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F1 measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process.Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.

Download Full-text

Deep learning for brain disorders: from data processing to disease treatment

Briefings in Bioinformatics ◽

10.1093/bib/bbaa310 ◽

2020 ◽

Author(s):

Ninon Burgos ◽

Simona Bottani ◽

Johann Faouzi ◽

Elina Thibeau-Sutre ◽

Olivier Colliot

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Imaging Genetics ◽

Environmental Data ◽

Brain Disorders ◽

Disease Treatment ◽

Clinical Routine

Abstract In order to reach precision medicine and improve patients’ quality of life, machine learning is increasingly used in medicine. Brain disorders are often complex and heterogeneous, and several modalities such as demographic, clinical, imaging, genetics and environmental data have been studied to improve their understanding. Deep learning, a subpart of machine learning, provides complex algorithms that can learn from such various data. It has become state of the art in numerous fields, including computer vision and natural language processing, and is also growingly applied in medicine. In this article, we review the use of deep learning for brain disorders. More specifically, we identify the main applications, the concerned disorders and the types of architectures and data used. Finally, we provide guidelines to bridge the gap between research studies and clinical routine.

Download Full-text

A Survey of Citation Recommendation Tasks and Methods

Journal of Computing and Information Technology ◽

10.20532/cit.2020.1005160 ◽

2021 ◽

Vol 28 (3) ◽

pp. 183-205

Keyword(s):

Machine Learning ◽

Language Processing ◽

State Of The Art ◽

Scientific Production ◽

Machine Learning Methods ◽

Citation Function ◽

Key Aspects ◽

Global And Local ◽

Machine Learning Models

Scientific articles store vast amounts of knowledge amassed through many decades of research. They serve to communicate research results among scientists but also for learning and tracking progress in the field. However, scientific production has risen to levels that make it difficult even for experts to keep up with work in their field. As a remedy, specialized search engines are being deployed, incorporating novel natural language processing and machine learning methods. The task of citation recommendation, in particular, has attracted much interest as it holds promise for improving the quality of scientific production. In this paper, we present the state-of-the-art in citation recommendation: we survey the methods for global and local approaches to the task, the evaluation setups and datasets, and the most successful machine learning models. In addition, we overview two tasks complementary to citation recommendation: extraction of key aspects and entities from articles and citation function classification. With this survey, we hope to provide the ground for understanding current efforts and stimulate further research in this exciting and promising field.

Download Full-text

Goal-Driven Visual Question Generation from Radiology Images

10.20944/preprints202107.0385.v1 ◽

2021 ◽

Author(s):

Mourad Sarrouti ◽

Asma Ben Abacha ◽

Dina Demner-Fushman

Keyword(s):

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Data Augmentation ◽

Question Generation ◽

Human Evaluation ◽

Question Category ◽

Medical Domain ◽

The Impact

Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG.

Download Full-text

Goal-Driven Visual Question Generation from Radiology Images

Information ◽

10.3390/info12080334 ◽

2021 ◽

Vol 12 (8) ◽

pp. 334

Author(s):

Mourad Sarrouti ◽

Asma Ben Abacha ◽

Dina Demner-Fushman

Keyword(s):

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Data Augmentation ◽

Question Generation ◽

Human Evaluation ◽

Question Category ◽

Medical Domain ◽

The Impact

Download Full-text

Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network

Computational Intelligence and Neuroscience ◽

10.1155/2020/8823906 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Shaolin Zhu ◽

Yong Yang ◽

Chun Xu

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Shared Task ◽

Translation Systems

Collecting parallel sentences from nonparallel data is a long-standing natural language processing research problem. In particular, parallel training sentences are very important for the quality of machine translation systems. While many existing methods have shown encouraging results, they cannot learn various alignment weights in parallel sentences. To address this issue, we propose a novel parallel hierarchical attention neural network which encodes monolingual sentences versus bilingual sentences and construct a classifier to extract parallel sentences. In particular, our attention mechanism structure can learn different alignment weights of words in parallel sentences. Experimental results show that our model can obtain state-of-the-art performance on the English-French, English-German, and English-Chinese dataset of BUCC 2017 shared task about parallel sentences’ extraction.

Download Full-text

Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05

10.3115/1610230 ◽

2005 ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Feature Engineering

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

An Automatic Question Generation System using Rule-Based Approach in Bloom’s Taxonomy

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191113143335 ◽

2019 ◽

Vol 13 ◽

Author(s):

G Deena ◽

K Raja ◽

K Kannan

Keyword(s):

Language Processing ◽

Learning Process ◽

Question Generation ◽

Test Question ◽

Rule Based ◽

Part Of Speech ◽

Core Idea ◽

Rule Based Approach ◽

Teaching Learning ◽

Automatic Question Generation

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.

Download Full-text