Solving Math Word Problems with Teacher Supervision

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/485 ◽

2021 ◽

Author(s):

Zhenwen Liang ◽

Xiangliang Zhang

Keyword(s):

Natural Language ◽

Word Problems ◽

State Of The Art ◽

Mathematical Expression ◽

Teacher Supervision ◽

Correct Solution ◽

Math Problem ◽

Latent Space ◽

Math Problems ◽

Formula Expression

Math word problems (MWPs) have been recently addressed with Seq2Seq models by `translating' math problems described in natural language to a mathematical expression, following a typical encoder-decoder structure. Although effective in solving classical math problems, these models fail when a subtle variation is applied to the word expression of a math problem, and leads to a remarkably different answer. We find the failure is because MWPs with different answers but similar math formula expression are encoded closely in the latent space. We thus designed a teacher module to make the MWP encoding vector match the correct solution and disaccord from the wrong solutions, which are manipulated from the correct solution. Experimental results on two benchmark MWPs datasets verified that our proposed solution outperforms the state-of-the-art models.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

Recommending Relevant Tutorial Fragments for API-Related Natural Language Questions

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500406 ◽

2021 ◽

Vol 31 (09) ◽

pp. 1251-1275

Author(s):

Di Wu ◽

Xiao-Yuan Jing ◽

Haowen Chen ◽

Xiaohui Kong ◽

Jifeng Xuan

Keyword(s):

Natural Language ◽

State Of The Art ◽

Metric Learning ◽

Application Programming Interface ◽

Manual Annotation ◽

Candidate List ◽

Novel Approach ◽

Application Programming ◽

Programming Interface ◽

Reciprocal Rank

Application Programming Interface (API) tutorial is an important API learning resource. To help developers learn APIs, an API tutorial is often split into a number of consecutive units that describe the same topic (i.e. tutorial fragment). We regard a tutorial fragment explaining an API as a relevant fragment of the API. Automatically recommending relevant tutorial fragments can help developers learn how to use an API. However, existing approaches often employ supervised or unsupervised manner to recommend relevant fragments, which suffers from much manual annotation effort or inaccurate recommended results. Furthermore, these approaches only support developers to input exact API names. In practice, developers often do not know which APIs to use so that they are more likely to use natural language to describe API-related questions. In this paper, we propose a novel approach, called Tutorial Fragment Recommendation (TuFraRec), to effectively recommend relevant tutorial fragments for API-related natural language questions, without much manual annotation effort. For an API tutorial, we split it into fragments and extract APIs from each fragment to build API-fragment pairs. Given a question, TuFraRec first generates several clarification APIs that are related to the question. We use clarification APIs and API-fragment pairs to construct candidate API-fragment pairs. Then, we design a semi-supervised metric learning (SML)-based model to find relevant API-fragment pairs from the candidate list, which can work well with a few labeled API-fragment pairs and a large number of unlabeled API-fragment pairs. In this way, the manual effort for labeling the relevance of API-fragment pairs can be reduced. Finally, we sort and recommend relevant API-fragment pairs based on the recommended strategy. We evaluate TuFraRec on 200 API-related natural language questions and two public tutorial datasets (Java and Android). The results demonstrate that on average TuFraRec improves NDCG@5 by 0.06 and 0.09, and improves Mean Reciprocal Rank (MRR) by 0.07 and 0.09 on two tutorial datasets as compared with the state-of-the-art approach.

Download Full-text

Data-Efficient Sensor Upgrade Path Using Knowledge Distillation

Sensors ◽

10.3390/s21196523 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6523

Author(s):

Pieter Van Van Molle ◽

Cedric De De Boom ◽

Tim Verbelen ◽

Bert Vankeirsbilck ◽

Jonas De De Vylder ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Original Data ◽

Radar Data ◽

Teacher Supervision ◽

Multispectral Images ◽

Test Set ◽

Time To Market ◽

Speed Up ◽

Knowledge Distillation

Deep neural networks have achieved state-of-the-art performance in image classification. Due to this success, deep learning is now also being applied to other data modalities such as multispectral images, lidar and radar data. However, successfully training a deep neural network requires a large reddataset. Therefore, transitioning to a new sensor modality (e.g., from regular camera images to multispectral camera images) might result in a drop in performance, due to the limited availability of data in the new modality. This might hinder the adoption rate and time to market for new sensor technologies. In this paper, we present an approach to leverage the knowledge of a teacher network, that was trained using the original data modality, to improve the performance of a student network on a new data modality: a technique known in literature as knowledge distillation. By applying knowledge distillation to the problem of sensor transition, we can greatly speed up this process. We validate this approach using a multimodal version of the MNIST dataset. Especially when little data is available in the new modality (i.e., 10 images), training with additional teacher supervision results in increased performance, with the student network scoring a test set accuracy of 0.77, compared to an accuracy of 0.37 for the baseline. We also explore two extensions to the default method of knowledge distillation, which we evaluate on a multimodal version of the CIFAR-10 dataset: an annealing scheme for the hyperparameter α and selective knowledge distillation. Of these two, the first yields the best results. Choosing the optimal annealing scheme results in an increase in test set accuracy of 6%. Finally, we apply our method to the real-world use case of skin lesion classification.

Download Full-text

Analyzing Compositionality-Sensitivity of NLI Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016867 ◽

2019 ◽

Vol 33 ◽

pp. 6867-6874 ◽

Cited By ~ 1

Author(s):

Yixin Nie ◽

Yicheng Wang ◽

Mohit Bansal

Keyword(s):

Natural Language ◽

High Probability ◽

State Of The Art ◽

Model Performance ◽

Analysis Tool ◽

Bag Of Words ◽

Compositional Semantics ◽

Sensitivity Testing ◽

Performance Loss ◽

Future Work

Success in natural language inference (NLI) should require a model to understand both lexical and compositional semantics. However, through adversarial evaluation, we find that several state-of-the-art models with diverse architectures are over-relying on the former and fail to use the latter. Further, this compositionality unawareness is not reflected via standard evaluation on current datasets. We show that removing RNNs in existing models or shuffling input words during training does not induce large performance loss despite the explicit removal of compositional information. Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i.e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models’ actual compositionality awareness. We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models’ compositional understanding.

Download Full-text

A Goal-Driven Tree-Structured Neural Model for Math Word Problems

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/736 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhipeng Xie ◽

Shichao Sun

Keyword(s):

Word Problems ◽

State Of The Art ◽

Neural Model ◽

Neural Models ◽

Top Down ◽

Feedforward Networks ◽

Whole Process ◽

Expression Tree ◽

Generate Solution ◽

Human Problem

Most existing neural models for math word problems exploit Seq2Seq model to generate solution expressions sequentially from left to right, whose results are far from satisfactory due to the lack of goal-driven mechanism commonly seen in human problem solving. This paper proposes a tree-structured neural model to generate expression tree in a goal-driven manner. Given a math word problem, the model first identifies and encodes its goal to achieve, and then the goal gets decomposed into sub-goals combined by an operator in a top-down recursive way. The whole process is repeated until the goal is simple enough to be realized by a known quantity as leaf node. During the process, two-layer gated-feedforward networks are designed to implement each step of goal decomposition, and a recursive neural network is used to encode fulfilled subtrees into subtree embeddings, which provides a better representation of subtrees than the simple goals of subtrees. Experimental results on the dataset Math23K have shown that our tree-structured model outperforms significantly several state-of-the-art models.

Download Full-text

Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/137 ◽

2019 ◽

Author(s):

Siying Wu ◽

Zheng-Jun Zha ◽

Zilei Wang ◽

Houqiang Li ◽

Feng Wu

Keyword(s):

Natural Language ◽

State Of The Art ◽

Cross Entropy ◽

Image Captioning ◽

Value Network ◽

Entropy Loss ◽

Fine Grained ◽

Performance Improvements ◽

Single Sentence ◽

Multiple State

Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.

Download Full-text

Teaching Mathematics in English for Elementary Students of Asak Paroki MKK

Social Economics and Ecology International Journal (SEEIJ) ◽

10.21512/seeij.v4i2.7363 ◽

2021 ◽

Vol 4 (2) ◽

pp. 97-106

Author(s):

Clara Herlina

Keyword(s):

Community Development ◽

Elementary Students ◽

Word Problems ◽

Development Program ◽

Classroom Teaching ◽

Teaching Mathematics ◽

Math Vocabulary ◽

The Subject ◽

Basic Concepts ◽

Math Problems

Mathematics subject is considered difficult for most elementary students, especially when the subject is given in English. To be able to do math exercises in English, the students have to understand the math vocabulary and the concepts of math. The purpose of this community development program is to increase the elementary students’ ability in solving word math problems in English. The participants in this program are twenty elementary students who are included in ASAK Paroki MKK community. The program is created in the form of classroom teaching and activities. In this program, we teach them the basic concepts of math vocabulary, understanding the word problems and the solutions to the problems. We also use several related activities to make the lessons meaningful and comprehensible. The results show that the students are able to solve math problems in English correctly and confidently.

Download Full-text

SANTM: Efficient Self-attention-driven Network for Text Matching

ACM Transactions on Internet Technology ◽

10.1145/3426971 ◽

2022 ◽

Vol 22 (3) ◽

pp. 1-21

Author(s):

Prayag Tiwari ◽

Amit Kumar Jaiswal ◽

Sahil Garg ◽

Ilsun You

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Matching Problems ◽

Attention Model ◽

Extra Information ◽

Textual Entailment ◽

Benchmark Datasets ◽

Text Matching

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

Download Full-text