TreeGen: A Tree-Based Transformer Architecture for Code Generation

Zeyu Sun; Qihao Zhu; Yingfei Xiong; Yican Sun; Lili Mou; Lu Zhang

doi:10.1609/aaai.v34i05.6430

TreeGen: A Tree-Based Transformer Architecture for Code Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6430 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8984-8991

Author(s):

Zeyu Sun ◽

Qihao Zhu ◽

Yingfei Xiong ◽

Yican Sun ◽

Lili Mou ◽

...

Keyword(s):

Code Generation ◽

State Of The Art ◽

Structural Information ◽

Semantic Parsing ◽

Generation System ◽

Neural Architecture ◽

Percentage Points ◽

Code Generators ◽

Grammar Rules ◽

Previous State

A code generation system generates programming language code based on an input natural language description. State-of-the-art approaches rely on neural networks for code generation. However, these code generators suffer from two problems. One is the long dependency problem, where a code element often depends on another far-away code element. A variable reference, for example, depends on its definition, which may appear quite a few lines before. The other problem is structure modeling, as programs contain rich structural information. In this paper, we propose a novel tree-based neural architecture, TreeGen, for code generation. TreeGen uses the attention mechanism of Transformers to alleviate the long-dependency problem, and introduces a novel AST reader (encoder) to incorporate grammar rules and AST structures into the network. We evaluated TreeGen on a Python benchmark, HearthStone, and two semantic parsing benchmarks, ATIS and GEO. TreeGen outperformed the previous state-of-the-art approach by 4.5 percentage points on HearthStone, and achieved the best accuracy among neural network-based approaches on ATIS (89.1%) and GEO (89.6%). We also conducted an ablation test to better understand each component of our model.

Download Full-text

A Grammar-Based Structural CNN Decoder for Code Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017055 ◽

2019 ◽

Vol 33 ◽

pp. 7055-7062 ◽

Cited By ~ 3

Author(s):

Zeyu Sun ◽

Qihao Zhu ◽

Lili Mou ◽

Yingfei Xiong ◽

Ge Li ◽

...

Keyword(s):

Neural Network ◽

Programming Language ◽

Code Generation ◽

State Of The Art ◽

Semantic Parsing ◽

Code Generator ◽

Percentage Points ◽

Grammar Rules ◽

Previous State ◽

Program Description

Code generation maps a program description to executable source code in a programming language. Existing approaches mainly rely on a recurrent neural network (RNN) as the decoder. However, we find that a program contains significantly more tokens than a natural language sentence, and thus it may be inappropriate for RNN to capture such a long sequence. In this paper, we propose a grammar-based structural convolutional neural network (CNN) for code generation. Our model generates a program by predicting the grammar rules of the programming language; we design several CNN modules, including the tree-based convolution and pre-order convolution, whose information is further aggregated by dedicated attentive pooling layers. Experimental results on the HearthStone benchmark dataset show that our CNN code generator significantly outperforms the previous state-of-the-art method by 5 percentage points; additional experiments on several semantic parsing tasks demonstrate the robustness of our model. We also conduct in-depth ablation test to better understand each component of our model.

Download Full-text

From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00025 ◽

2018 ◽

Vol 6 ◽

pp. 343-356 ◽

Cited By ~ 2

Author(s):

Egoitz Laparra ◽

Dongfang Xu ◽

Steven Bethard

Keyword(s):

Neural Network ◽

Machine Learning ◽

Comparative Analysis ◽

State Of The Art ◽

Learning Approaches ◽

Semantic Parsing ◽

Time Intervals ◽

Semantic Composition ◽

Previous State ◽

New Scoring

This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.

Download Full-text

A Pattern-Based Approach to Recognizing Time Expressions

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016335 ◽

2019 ◽

Vol 33 ◽

pp. 6335-6342

Author(s):

Wentao Ding ◽

Guanji Gao ◽

Linfeng Shi ◽

Yuzhong Qu

Keyword(s):

Question Answering ◽

State Of The Art ◽

Structural Information ◽

Main Idea ◽

Sequential Patterns ◽

Semantic Parsing ◽

Language Understanding ◽

Fine Grained ◽

Maximum Coverage ◽

Approach Time

Recognizing time expressions is a fundamental and important task in many applications of natural language understanding, such as reading comprehension and question answering. Several newest state-of-the-art approaches have achieved good performance on recognizing time expressions. These approaches are black-boxed or based on heuristic rules, which leads to the difficulty in understanding the temporal information. On the contrary, classic rule-based or semantic parsing approaches can capture rich structural information, but their performances on recognition are not so good. In this paper, we propose a pattern-based approach, called PTime, which automatically generates and selects patterns for recognizing time expressions. In this approach, time expressions in training text are abstracted into type sequences by using fine-grained token types, thus the problem is transformed to select an appropriate subset of the sequential patterns. We use the Extended Budgeted Maximum Coverage (EBMC) model to optimize the pattern selection. The main idea is to maximize the correct token sequences matched by the selected patterns while the number of the mistakes should be limited by an adjustable budget. The interpretability of patterns and the adjustability of permitted number of mistakes make PTime a very promising approach for many applications. Experimental results show that PTime achieves a very competitive performance as compared with existing state-of-the-art approaches.

Download Full-text

Unsupervised Structural Graph Node Representation Learning

10.18122/td/1754/boisestate ◽

2020 ◽

Author(s):

Mikel Joaristi

Keyword(s):

Real World ◽

State Of The Art ◽

Structural Information ◽

Representation Learning ◽

Graph Representation ◽

Learning Methods ◽

Structural Graph ◽

Connectivity Information ◽

Latent Space ◽

Previous State

Unsupervised Graph Representation Learning methods learn a numerical representation of the nodes in a graph. The generated representations encode meaningful information about the nodes' properties, making them a powerful tool for tasks in many areas of study, such as social sciences, biology or communication networks. These methods are particularly interesting because they facilitate the direct use of standard Machine Learning models on graphs. Graph representation learning methods can be divided into two main categories depending on the information they encode, methods preserving the nodes connectivity information, and methods preserving nodes' structural information. Connectivity-based methods focus on encoding relationships between nodes, with neighboring nodes being closer together in the resulting latent space. On the other hand, structure-based methods generate a latent space where nodes serving a similar structural function in the network are encoded close to each other, independently of them being connected or even close to each other in the graph. While there are a lot of works that focus on preserving nodes' connectivity information, only a few works study the problem of encoding nodes' structure, specially in an unsupervised way. In this dissertation, we demonstrate that properly encoding nodes' structural information is fundamental for many real-world applications, as it can be leveraged to successfully solve many tasks where connectivity-based methods fail. One concrete example is presented first. In this example, the task consists of detecting malicious entities in a real-world financial network. We show that to solve this problem, connectivity information is not enough and show how leveraging structural information provides considerable performance improvements. This particular example pinpoints the need for further research on the area of structural graph representation learning, together with the limitations of the previous state-of-the-art. We use the acquired knowledge as a starting point and inspiration for the research and development of three independent unsupervised structural graph representation learning methods: Structural Iterative Representation learning approach for Graph Nodes (SIR-GN), Structural Iterative Lexicographic Autoencoded Node Representation (SILA), and Sparse Structural Node Representation (SparseStruct). We show how each of our methods tackles specific limitations on the previous state-of-the-art on structural graph representation learning such as scalability, representation meaning, and lack of formal proof that guarantees the preservation of structural properties. We provide an extensive experimental section where we compare our three proposed methods to the current state-of-the-art on both connectivity-based and structure-based representation learning methods. Finally, in this dissertation, we look at extensions of the basic structural graph representation learning problem. We study the problem of temporal structural graph representation. We also provide a method for representation explainability.

Download Full-text

Dependency-based n-gram models for general purpose sentence realisation

Natural Language Engineering ◽

10.1017/s1351324910000288 ◽

2010 ◽

Vol 17 (4) ◽

pp. 455-483 ◽

Cited By ~ 2

Author(s):

YUQING GUO ◽

HAIFENG WANG ◽

JOSEF VAN GENABITH

Keyword(s):

State Of The Art ◽

Structural Information ◽

General Purpose ◽

Language Models ◽

Semantic Representations ◽

Linguistic Features ◽

Word Forms ◽

Grammar Rules ◽

Series Of Experiments ◽

N Gram

AbstractThis paper presents a general-purpose, wide-coverage, probabilistic sentence generator based on dependency n-gram models. This is particularly interesting as many semantic or abstract syntactic input specifications for sentence realisation can be represented as labelled bi-lexical dependencies or typed predicate-argument structures. Our generation method captures the mapping between semantic representations and surface forms by linearising a set of dependencies directly, rather than via the application of grammar rules as in more traditional chart-style or unification-based generators. In contrast to conventional n-gram language models over surface word forms, we exploit structural information and various linguistic features inherent in the dependency representations to constrain the generation space and improve the generation quality. A series of experiments shows that dependency-based n-gram models generalise well to different languages (English and Chinese) and representations (LFG and CoNLL). Compared with state-of-the-art generation systems, our general-purpose sentence realiser is highly competitive with the added advantages of being simple, fast, robust and accurate.

Download Full-text

Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00209 ◽

2013 ◽

Vol 1 ◽

pp. 49-62 ◽

Cited By ~ 51

Author(s):

Yoav Artzi ◽

Luke Zettlemoyer

Keyword(s):

Natural Language ◽

Supervised Learning ◽

State Of The Art ◽

Semantic Parsing ◽

Weak Supervision ◽

Instruction Sets ◽

Strong Signal ◽

Previous State ◽

Strong Performance ◽

Weakly Supervised

The context in which language is used provides a strong signal for learning to recover its meaning. In this paper, we show it can be used within a grounded CCG semantic parsing approach that learns a joint model of meaning and context for interpreting and executing natural language instructions, using various types of weak supervision. The joint nature provides crucial benefits by allowing situated cues, such as the set of visible objects, to directly influence learning. It also enables algorithms that learn while executing instructions, for example by trying to replicate human actions. Experiments on a benchmark navigational dataset demonstrate strong performance under differing forms of supervision, including correctly executing 60% more instruction sets relative to the previous state of the art.

Download Full-text

A Domain Generalization Perspective on Listwise Context Modeling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015965 ◽

2019 ◽

Vol 33 ◽

pp. 5965-5972 ◽

Cited By ~ 1

Author(s):

Lin Zhu ◽

Yihong Chen ◽

Bowen He

Keyword(s):

Data Mining ◽

Information Retrieval ◽

State Of The Art ◽

Learning To Rank ◽

Context Modeling ◽

Ranking Problem ◽

Neural Architecture ◽

Benchmark Datasets ◽

Previous State ◽

Latent Representations

As one of the most popular techniques for solving the ranking problem in information retrieval, Learning-to-rank (LETOR) has received a lot of attention both in academia and industry due to its importance in a wide variety of data mining applications. However, most of existing LETOR approaches choose to learn a single global ranking function to handle all queries, and ignore the substantial differences that exist between queries. In this paper, we propose a domain generalization strategy to tackle this problem. We propose QueryInvariant Listwise Context Modeling (QILCM), a novel neural architecture which eliminates the detrimental influence of inter-query variability by learning query-invariant latent representations, such that the ranking system could generalize better to unseen queries. We evaluate our techniques on benchmark datasets, demonstrating that QILCM outperforms previous state-of-the-art approaches by a substantial margin.

Download Full-text

Protein complex prediction with AlphaFold-Multimer

10.1101/2021.10.04.463034 ◽

2021 ◽

Author(s):

Richard Evans ◽

Michael O'Neill ◽

Alexander Pritzel ◽

Natasha Antropova ◽

Andrew W Senior ◽

...

Keyword(s):

Protein Complex ◽

State Of The Art ◽

Protein Complexes ◽

High Accuracy ◽

Single Chain ◽

Large Dataset ◽

Protein Complex Prediction ◽

Percentage Points ◽

Single Protein ◽

Previous State

While the vast majority of well-structured single protein chains can now be predicted to high accuracy due to the recent AlphaFold [1] model, the prediction of multi-chain protein complexes remains a challenge in many cases. In this work, we demonstrate that an AlphaFold model trained specifically for multimeric inputs of known stoichiometry, which we call AlphaFold-Multimer, significantly increases accuracy of predicted multimeric interfaces over input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates (introduced in [2]) we achieve at least medium accuracy (DockQ [3]≥0.49) on 14 targets and high accuracy (DockQ≥0.8) on 6 targets, compared to 9 targets of at least medium accuracy and 4 of high accuracy for the previous state of the art system (an AlphaFold-based system from [2]). We also predict structures for a large dataset of 4,433 recent protein complexes, from which we score all non-redundant interfaces with low template identity. For heteromeric interfaces we successfully predict the interface (DockQ≥0.23) in 67% of cases, and produce high accuracy predictions (DockQ≥0.8) in 23% of cases, an improvement of +25 and +11 percentage points over the flexible linker modification of AlphaFold [4] respectively. For homomeric interfaces we successfully predict the interface in 69% of cases, and produce high accuracy predictions in 34% of cases, an improvement of +5 percentage points in both instances.

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

Using spatial-temporal ensembles of convolutional neural networks for lumen segmentation in ureteroscopy

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02376-3 ◽

2021 ◽

Author(s):

Jorge F. Lazo ◽

Aldo Marzullo ◽

Sara Moccia ◽

Michele Catellani ◽

Benoit Rosa ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Automatic Segmentation ◽

Temporal Information ◽

Invasive Technique ◽

Dice Similarity Coefficient ◽

Specular Reflections ◽

Lumen Segmentation ◽

Previous State

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.

Download Full-text