Abstract Syntax as Interlingua: Scaling Up the Grammatical Framework                     from Controlled Languages to Robust Pipelines

Abstract syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF.

Download Full-text

Grammatical Framework

Journal of Functional Programming ◽

10.1017/s0956796803004738 ◽

2004 ◽

Vol 14 (2) ◽

pp. 145-189 ◽

Cited By ~ 74

Author(s):

AARNE RANTA

Keyword(s):

Language Processing ◽

Partial Evaluation ◽

Type System ◽

Abstract Syntax ◽

Natural Languages ◽

Gradual Introduction ◽

Concrete Syntax ◽

And Gender ◽

Grammatical Framework ◽

Syntax Trees

Grammatical Framework (GF) is a special-purpose functional language for defining grammars. It uses a Logical Framework (LF) for a description of abstract syntax, and adds to this a notation for defining concrete syntax. GF grammars themselves are purely declarative, but can be used both for linearizing syntax trees and parsing strings. GF can describe both formal and natural languages. The key notion of this description is a grammatical object, which is not just a string, but a record that contains all information on inflection and inherent grammatical features such as number and gender in natural languages, or precedence in formal languages. Grammatical objects have a type system, which helps to eliminate run-time errors in language processing. In the same way as a LF, GF uses dependent types in abstract syntax to express semantic conditions, such as well-typedness and proof obligations. Multilingual grammars, where one abstract syntax has many parallel concrete syntaxes, can be used for reliable and meaning-preserving translation. They can also be used in authoring systems, where syntax trees are constructed in an interactive editor similar to proof editors based on LF. While being edited, the trees can simultaneously be viewed in different languages. This paper starts with a gradual introduction to GF, going through a sequence of simpler formalisms till the full power is reached. The introduction is followed by a systematic presentation of the GF formalism and outlines of the main algorithms: partial evaluation and parser generation. The paper concludes by brief discussions of the Haskell implementation of GF, existing applications, and related work.

Download Full-text

Identifying Diagnosis Evidence of Liver Cancer in Chinese Radiology Reports Using BERT-based Deep Learning Method (Preprint)

10.2196/preprints.19689 ◽

2020 ◽

Author(s):

Hui Chen ◽

Honglei Liu ◽

Ni Wang ◽

Yanqun Huang ◽

Zhiqiang Zhang ◽

...

Keyword(s):

Deep Learning ◽

Liver Cancer ◽

Language Processing ◽

Cancer Diagnosis ◽

Data Driven ◽

Learning Method ◽

Rule Based ◽

Radiological Features ◽

Radiology Reports ◽

Computer Aided

BACKGROUND Liver cancer remains to be a substantial disease burden in China. As one of the primary diagnostic means for liver cancer, the dynamic enhanced computed tomography (CT) scan provides detailed diagnosis evidence that is recorded in the free-text radiology reports. OBJECTIVE In this study, we combined knowledge-driven deep learning methods and data-driven natural language processing (NLP) methods to extract the radiological features from these reports, and designed a computer-aided liver cancer diagnosis framework.In this study, we combined knowledge-driven deep learning methods and data-driven natural language processing (NLP) methods to extract the radiological features from these reports, and designed a computer-aided liver cancer diagnosis framework. METHODS We collected 1089 CT radiology reports in Chinese. We proposed a pre-trained fine-tuning BERT (Bidirectional Encoder Representations from Transformers) language model for word embedding. The embedding served as the inputs for BiLSTM (Bidirectional Long Short-Term Memory) and CRF (Conditional Random Field) model (BERT-BiLSTM-CRF) to extract features of hyperintense enhancement in the arterial phase (APHE) and hypointense in the portal and delayed phases (PDPH). Furthermore, we also extracted features using the traditional rule-based NLP method based on the content of radiology reports. We then applied random forest for liver cancer diagnosis and calculated the Gini impurity for the identification of diagnosis evidence. RESULTS The BERT-BiLSTM-CRF predicted the features of APHE and PDPH with an F1 score of 98.40% and 90.67%, respectively. The prediction model using combined features had a higher performance (F1 score, 88.55%) than those using the single kind of features obtained by BERT-BiLSTM-CRF (84.88%) or traditional rule-based NLP method (83.52%). The features of APHE and PDPH were the top two essential features for the liver cancer diagnosis. CONCLUSIONS We proposed a BERT-based deep learning method for diagnosis evidence extraction based on clinical knowledge. With the recognized features of APHE and PDPH, the liver cancer diagnosis could get a high performance, which was further increased by combining with the radiological features obtained by the traditional rule-based NLP method. The BERT-BiLSTM-CRF had achieved the state-of-the-art performance in this study, which could be extended to other kinds of Chinese clinical texts. CLINICALTRIAL None

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

An Automatic Question Generation System using Rule-Based Approach in Bloom’s Taxonomy

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191113143335 ◽

2019 ◽

Vol 13 ◽

Author(s):

G Deena ◽

K Raja ◽

K Kannan

Keyword(s):

Language Processing ◽

Learning Process ◽

Question Generation ◽

Test Question ◽

Rule Based ◽

Part Of Speech ◽

Core Idea ◽

Rule Based Approach ◽

Teaching Learning ◽

Automatic Question Generation

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.

Download Full-text

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Science Advances ◽

10.1126/sciadv.abe4166 ◽

2021 ◽

Vol 7 (15) ◽

pp. eabe4166

Author(s):

Philippe Schwaller ◽

Benjamin Hoover ◽

Jean-Louis Reymond ◽

Hendrik Strobelt ◽

Teodoro Laino

Keyword(s):

Organic Chemistry ◽

Neural Networks ◽

Chemical Synthesis ◽

Unsupervised Learning ◽

Chemical Reactions ◽

Data Driven ◽

Experimental Task ◽

Rule Based ◽

Atom Mapping ◽

Mapping Information

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter

Information Systems Frontiers ◽

10.1007/s10796-021-10169-x ◽

2021 ◽

Author(s):

Irina Wedel ◽

Michael Palk ◽

Stefan Voß

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Modeling ◽

New Product ◽

Business Value ◽

Data Driven ◽

New Product Introduction ◽

Social Media Analytics ◽

Product Introduction ◽

Textual Data

AbstractSocial media enable companies to assess consumers’ opinions, complaints and needs. The systematic and data-driven analysis of social media to generate business value is summarized under the term Social Media Analytics which includes statistical, network-based and language-based approaches. We focus on textual data and investigate which conversation topics arise during the time of a new product introduction on Twitter and how the overall sentiment is during and after the event. The analysis via Natural Language Processing tools is conducted in two languages and four different countries, such that cultural differences in the tonality and customer needs can be identified for the product. Different methods of sentiment analysis and topic modeling are compared to identify the usability in social media and in the respective languages English and German. Furthermore, we illustrate the importance of preprocessing steps when applying these methods and identify relevant product insights.

Download Full-text

Crossover: a unified view

Journal of Linguistics ◽

10.1017/s0022226797006841 ◽

1998 ◽

Vol 34 (1) ◽

pp. 73-124 ◽

Cited By ~ 1

Author(s):

RUTH KEMPSON ◽

DOV GABBAY

Keyword(s):

Language Processing ◽

Deductive System ◽

Anaphora Resolution ◽

Unified View ◽

On Line

This paper informally outlines a Labelled Deductive System for on-line language processing. Interpretation of a string is modelled as a composite lexically driven process of type deduction over labelled premises forming locally discrete databases, with rules of database inference then dictating their mode of combination. The particular LDS methodology is illustrated by a unified account of the interaction of wh-dependency and anaphora resolution, the so-called ‘cross-over’ phenomenon, currently acknowledged to resist a unified explanation. The shift of perspective this analysis requires is that interpretation is defined as a proof structure for labelled deduction, and assignment of such structure to a string is a dynamic left-right process in which linearity considerations are ineliminable.

Download Full-text

HISTORICAL RETE NETWORKS TO SUPPORT THE DEBUGGING OF FORWARD-CHAINING RULE-BASED PROGRAMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213093000059 ◽

1993 ◽

Vol 02 (01) ◽

pp. 47-70

Author(s):

SHARON M. TUTTLE ◽

CHRISTOPH F. EICK

Keyword(s):

Working Memory ◽

Data Driven ◽

Rule Based ◽

Historical Information ◽

Forward Chaining ◽

Changing Environments ◽

Time Performance ◽

Inference Network ◽

One Step ◽

Explanation System

Forward-chaining rule-based programs, being data-driven, can function in changing environments in which backward-chaining rule-based programs would have problems. But, degugging forward-chaining programs can be tedious; to debug a forward-chaining rule-based program, certain ‘historical’ information about the program run is needed. Programmers should be able to directly request such information, instead of having to rerun the program one step at a time or search a trace of run details. As a first step in designing an explanation system for answering such questions, this paper discusses how a forward-chaining program run’s ‘historical’ details can be stored in its Rete inference network, used to match rule conditions to working memory. This can be done without seriously affecting the network’s run-time performance. We call this generalization of the Rete network a historical Rete network. Various algorithms for maintaining this network are discussed, along with how it can be used during debugging, and a debugging tool, MIRO, that incorporates these techniques is also discussed.

Download Full-text

Rule-Based Data-Driven Analytics for Wide-Area Fault Detection Using Synchrophasor Data

IEEE Transactions on Industry Applications ◽

10.1109/tia.2016.2644621 ◽

2017 ◽

Vol 53 (3) ◽

pp. 1789-1798 ◽

Cited By ~ 19

Author(s):

Xiaodong Liang ◽

Scott A. Wallace ◽

Duc Nguyen

Keyword(s):

Fault Detection ◽

Data Driven ◽

Wide Area ◽

Rule Based

Download Full-text