Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016626 ◽

2019 ◽

Vol 33 ◽

pp. 6626-6633

Author(s):

Xiang Kong ◽

Qizhe Xie ◽

Zihang Dai ◽

Eduard Hovy

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State ◽

Computational Time ◽

Memory Consumption ◽

Image Captioning ◽

Vocabulary Size ◽

Language Generation ◽

Practical Applications ◽

Coding Schemes

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoSboosted Transformer yields 29.6 BLEU score for English-toGerman and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.9 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.

Download Full-text

Comparative Quality Estimation for Machine Translation Observations on Machine Learning and Features

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0029 ◽

2017 ◽

Vol 108 (1) ◽

pp. 307-318 ◽

Cited By ~ 1

Author(s):

Eleftherios Avramidis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Translation ◽

State Of The Art ◽

Linear Method ◽

The State ◽

Quality Estimation ◽

Art Methods ◽

Improved Performance

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.

Download Full-text

Unified Vision-Language Pre-Training for Image Captioning and VQA

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.7005 ◽

2020 ◽

Vol 34 (07) ◽

pp. 13041-13049 ◽

Cited By ~ 11

Author(s):

Luowei Zhou ◽

Hamid Palangi ◽

Lei Zhang ◽

Houdong Hu ◽

Jason Corso ◽

...

Keyword(s):

Unsupervised Learning ◽

Question Answering ◽

State Of The Art ◽

Learning Objectives ◽

Image Captioning ◽

Language Generation ◽

Visual Question Answering ◽

Benchmark Datasets

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP.

Download Full-text

Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking

KI - Künstliche Intelligenz ◽

10.1007/s13218-020-00679-2 ◽

2020 ◽

Vol 34 (4) ◽

pp. 571-584

Author(s):

Rajarshi Biswas ◽

Michael Barz ◽

Daniel Sonntag

Keyword(s):

State Of The Art ◽

Input Image ◽

The State ◽

Beam Search ◽

Image Captioning ◽

Bottom Up ◽

Interactive Machine Learning ◽

Joint Embedding ◽

Bounding Boxes ◽

High Level

AbstractImage captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.

Download Full-text

The State of the Art in Machine Translation in the U.S.S.R.

New Directions in Machine Translation ◽

10.1515/9783110874204-004 ◽

1988 ◽

pp. 75-84

Author(s):

Ivan I. Oubine ◽

Boris D. Tikhomirov

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State

Download Full-text

AsymDPOP: Complete Inference for Asymmetric Distributed Constraint Optimization Problems

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/32 ◽

2019 ◽

Author(s):

Yanchen Deng ◽

Ziyu Chen ◽

Dingding Chen ◽

Wenxin Zhang ◽

Xingqiong Jiang

Keyword(s):

Optimization Problems ◽

State Of The Art ◽

Empirical Evaluation ◽

The State ◽

Constraint Optimization ◽

Memory Consumption ◽

Distributed Constraint Optimization ◽

Constraint Optimization Problems

Asymmetric distributed constraint optimization problems (ADCOPs) are an emerging model for coordinating agents with personal preferences. However, the existing inference-based complete algorithms which use local eliminations cannot be applied to ADCOPs, as the parent agents are required to transfer their private functions to their children. Rather than disclosing private functions explicitly to facilitate local eliminations, we solve the problem by enforcing delayed eliminations and propose AsymDPOP, the first inference-based complete algorithm for ADCOPs. To solve the severe scalability problems incurred by delayed eliminations, we propose to reduce the memory consumption by propagating a set of smaller utility tables instead of a joint utility table, and to reduce the computation efforts by sequential optimizations instead of joint optimizations. The empirical evaluation indicates that AsymDPOP significantly outperforms the state-of-the-art, as well as the vanilla DPOP with PEAV formulation.

Download Full-text

Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks

Computational Intelligence and Neuroscience ◽

10.1155/2018/6747098 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 18

Author(s):

Md Zahangir Alom ◽

Paheding Sidike ◽

Mahmudul Hasan ◽

Tarek M. Taha ◽

Vijayan K. Asari

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Character Recognition ◽

State Of The Art ◽

The State ◽

Superior Performance ◽

Deep Convolutional Neural Networks ◽

Practical Applications ◽

High Degree

In spite of advances in object recognition technology, handwritten Bangla character recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even many advanced existing methods do not lead to satisfactory performance in practice that related to HBCR. In this paper, a set of the state-of-the-art deep convolutional neural networks (DCNNs) is discussed and their performance on the application of HBCR is systematically evaluated. The main advantage of DCNN approaches is that they can extract discriminative features from raw data and represent them with a high degree of invariance to object distortions. The experimental results show the superior performance of DCNN models compared with the other popular object recognition approaches, which implies DCNN can be a good candidate for building an automatic HBCR system for practical applications.

Download Full-text

Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6259 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7594-7601

Author(s):

Pierre Colombo ◽

Emile Chapuis ◽

Matteo Manica ◽

Emmanuel Vignon ◽

Giovanna Varni ◽

...

Keyword(s):

Machine Translation ◽

Random Fields ◽

Conditional Random Fields ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Accuracy Score ◽

Beam Search ◽

Conversational Agents ◽

Neural Machine Translation

The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.

Download Full-text

Modeling Coherence for Discourse Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017338 ◽

2019 ◽

Vol 33 ◽

pp. 7338-7345 ◽

Cited By ~ 2

Author(s):

Hao Xiong ◽

Zhongjun He ◽

Hua Wu ◽

Haifeng Wang

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State ◽

Neural Machine Translation ◽

Discourse Context ◽

Translation Quality ◽

Discourse Coherence ◽

Baseline System

Discourse coherence plays an important role in the translation of one text. However, the previous reported models most focus on improving performance over individual sentence while ignoring cross-sentence links and dependencies, which affects the coherence of the text. In this paper, we propose to use discourse context and reward to refine the translation quality from the discourse perspective. In particular, we generate the translation of individual sentences at first. Next, we deliberate the preliminary produced translations, and train the model to learn the policy that produces discourse coherent text by a reward teacher. Practical results on multiple discourse test datasets indicate that our model significantly improves the translation quality over the state-of-the-art baseline system by +1.23 BLEU score. Moreover, our model generates more discourse coherent text and obtains +2.2 BLEU improvements when evaluated by discourse metrics.

Download Full-text

On the Exact Solution of Prize-Collecting Steiner Tree Problems

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1087 ◽

2021 ◽

Author(s):

Daniel Rehfeldt ◽

Thorsten Koch

Keyword(s):

Exact Solution ◽

Steiner Tree ◽

State Of The Art ◽

The State ◽

Steiner Tree Problem ◽

New Techniques ◽

Practical Applications ◽

Computational Performance ◽

Benchmark Instances ◽

Prize Collecting

The prize-collecting Steiner tree problem (PCSTP) is a well-known generalization of the classic Steiner tree problem in graphs, with a large number of practical applications. It attracted particular interest during the 11th DIMACS Challenge in 2014, and since then, several PCSTP solvers have been introduced in the literature. Although these new solvers further, and often drastically, improved on the results of the DIMACS Challenge, many PCSTP benchmark instances have remained unsolved. The following article describes further advances in the state of the art in exact PCSTP solving. It introduces new techniques and algorithms for PCSTP, involving various new transformations (or reductions) of PCSTP instances to equivalent problems, for example, to decrease the problem size or to obtain a better integer programming formulation. Several of the new techniques and algorithms provably dominate previous approaches. Further theoretical properties of the new components, such as their complexity, are discussed. Also, new complexity results for the exact solution of PCSTP and related problems are described, which form the base of the algorithm design. Finally, the new developments also translate into a strong computational performance: the resulting exact PCSTP solver outperforms all previous approaches, both in terms of runtime and solvability. In particular, it solves several formerly intractable benchmark instances from the 11th DIMACS Challenge to optimality. Moreover, several recently introduced large-scale instances with up to 10 million edges, previously considered to be too large for any exact approach, can now be solved to optimality in less than two hours. Summary of Contribution: The prize-collecting Steiner tree problem (PCSTP) is a well-known generalization of the classic Steiner tree problem in graphs, with many practical applications. The article introduces and analyses new techniques and algorithms for PCSTP that ultimately aim for improved (practical) exact solution. The algorithmic developments are underpinned by results on theoretical aspects, such as fixed-parameter tractability of PCSTP. Computationally, we considerably push the limits of tractibility, being able to solve PCSTP instances with up to 10 million edges. The new solver, which also considerably outperforms the state of the art on smaller instances, will be made publicly available as part of the SCIP Optimization Suite.

Download Full-text

Randomized error removal for online spread estimation in data streaming

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447707 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1040-1052

Author(s):

Haibo Wang ◽

Chaoyi Ma ◽

Olufemi O Odegbile ◽

Shigang Chen ◽

Jih-Kwon Peir

Keyword(s):

Data Stream ◽

State Of The Art ◽

The State ◽

High Rate ◽

Estimation Accuracy ◽

Data Streaming ◽

Real World Data ◽

Practical Applications ◽

Spread Estimation ◽

Error Removal

Measuring flow spread in real time from large, high-rate data streams has numerous practical applications, where a data stream is modeled as a sequence of data items from different flows and the spread of a flow is the number of distinct items in the flow. Past decades have witnessed tremendous performance improvement for single-flow spread estimation. However, when dealing with numerous flows in a data stream, it remains a significant challenge to measure per-flow spread accurately while reducing memory footprint. The goal of this paper is to introduce new multi-flow spread estimation designs that incur much smaller processing overhead and query overhead than the state of the art, yet achieves significant accuracy improvement in spread estimation. We formally analyze the performance of these new designs. We implement them in both hardware and software, and use real-world data traces to evaluate their performance in comparison with the state of the art. The experimental results show that our best sketch significantly improves over the best existing work in terms of estimation accuracy, data item processing throughput, and online query throughput.

Download Full-text