Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.

Download Full-text

Improving Statistical Parser by Recognition of Chinese Number and Quantifier Prefix in Machine Translation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.1841 ◽

2013 ◽

Vol 427-429 ◽

pp. 1841-1844

Author(s):

Wen Xiong ◽

Yao Hong Jin ◽

Zhi Ying Liu

Keyword(s):

Experimental Data ◽

Machine Translation ◽

Word Segmentation ◽

Experimental Results ◽

Maximum Matching ◽

Recognition Method ◽

Matching Method ◽

Rule Based ◽

Processing Module ◽

Special Language

By studying the Chinese number and quantifier prefix (CNQP) as a special language phenomenon in machine translation, this paper presents a CNQP recognition method, which is rule based and independent of word segmentation. The method expressed CNQPs compositions using Backus-Naur Form (BNF), and took the numeral as the active information and the quantifiers as the boundaries of the CNQPs. To avoid the word segmentation noise, a forward maximum matching method was used for obtaining the compositions of the CNQPs, which can be fed into the statistical parser for the analysis of the Chinese sentences. The experimental results indicate the proposed method as a pre-processing module can effectively improve the parsing results of the statistical parser without retraining on experimental data constructed manually, which can further enhance the translation qualities.

Download Full-text

STOUT: SMILES to IUPAC names using neural machine translation

Journal of Cheminformatics ◽

10.1186/s13321-021-00512-4 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Kohulan Rajan ◽

Achim Zielesny ◽

Christoph Steinbeck

Keyword(s):

Machine Translation ◽

Similarity Index ◽

Chemical Compounds ◽

Human Beings ◽

International Union ◽

Rule Based ◽

Neural Machine Translation ◽

Tanimoto Similarity ◽

Reverse Translation ◽

String Representation

AbstractChemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.

Download Full-text

English to Sanskrit machine translation system: a rule-based approach

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2012.048144 ◽

2012 ◽

Vol 4 (2) ◽

pp. 168 ◽

Cited By ~ 2

Author(s):

Vimal Mishra ◽

R.B. Mishra

Keyword(s):

Machine Translation ◽

Translation System ◽

Rule Based ◽

System A ◽

Machine Translation System ◽

Rule Based Approach

Download Full-text

Methods for integrating rule-based and statistical systems for Arabic to English machine translation

Machine Translation ◽

10.1007/s10590-011-9106-9 ◽

2011 ◽

Vol 26 (1-2) ◽

pp. 67-83 ◽

Cited By ~ 3

Author(s):

Rabih Zbib ◽

Michael Kayser ◽

Spyros Matsoukas ◽

John Makhoul ◽

Hazem Nader ◽

...

Keyword(s):

Machine Translation ◽

Rule Based ◽

Statistical Systems

Download Full-text

Adabot: Fault-Tolerant Java Decompiler (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7203 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13861-13862

Author(s):

Zhiming Li ◽

Qing Wu ◽

Kun Qian

Keyword(s):

Reverse Engineering ◽

Machine Translation ◽

Fault Tolerant ◽

Safety Concern ◽

Statistical Machine Translation ◽

Abstract Syntax ◽

Rule Based ◽

Abstract Syntax Tree ◽

Important Field ◽

Internal Architecture

Reverse Engineering has been an extremely important field in software engineering, it helps us to better understand and analyze the internal architecture and interrealtions of executables. Classical Java reverse engineering task includes disassembly and decompilation. Traditional Abstract Syntax Tree (AST) based disassemblers and decompilers are strictly rule defined and thus highly fault intolerant when bytecode obfuscation were introduced for safety concern. In this work, we view decompilation as a statistical machine translation task and propose a decompilation framework which is fully based on self-attention mechanism. Through better adaption to the linguistic uniqueness of bytecode, our model fully outperforms rule-based models and previous works based on recurrence mechanism.

Download Full-text

Rule Based Case Transfer in Tamil-Malayalam Machine Translation

Research in Computing Science ◽

10.13053/rcs-84-1-4 ◽

2014 ◽

Vol 84 (1) ◽

pp. 41-52

Author(s):

S. Lakshmi ◽

Sobha Lalitha Devi

Keyword(s):

Machine Translation ◽

Rule Based

Download Full-text

Unsupervised Sub-tree Alignment for Tree-to-Tree Translation

Journal of Artificial Intelligence Research ◽

10.1613/jair.4033 ◽

2013 ◽

Vol 48 ◽

pp. 733-782 ◽

Cited By ~ 3

Author(s):

T. Xiao ◽

J. Zhu

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Experimental Results ◽

Alignment Accuracy ◽

Syntactic Structures ◽

Tree Alignment ◽

Matrix Encoding ◽

Word Alignments ◽

Alignment Model ◽

Translation Systems

This article presents a probabilistic sub-tree alignment model and its application to tree-to-tree machine translation. Unlike previous work, we do not resort to surface heuristics or expensive annotated data, but instead derive an unsupervised model to infer the syntactic correspondence between two languages. More importantly, the developed model is syntactically-motivated and does not rely on word alignments. As a by-product, our model outputs a sub-tree alignment matrix encoding a large number of diverse alignments between syntactic structures, from which machine translation systems can efficiently extract translation rules that are often filtered out due to the errors in 1-best alignment. Experimental results show that the proposed approach outperforms three state-of-the-art baseline approaches in both alignment accuracy and grammar quality. When applied to machine translation, our approach yields a +1.0 BLEU improvement and a -0.9 TER reduction on the NIST machine translation evaluation corpora. With tree binarization and fuzzy decoding, it even outperforms a state-of-the-art hierarchical phrase-based system.

Download Full-text

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Machine Translation ◽

10.1007/s10590-021-09260-6 ◽

2021 ◽

Author(s):

Tanmai Khanna ◽

Jonathan N. Washington ◽

Francis M. Tyers ◽

Sevilay Bayatlı ◽

Daniel G. Swanson ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Lexical Selection ◽

Rule Based ◽

Low Resource ◽

Language Technology ◽

Language Data ◽

Recursive Structures ◽

Platform Translation ◽

Free Open Source

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Download Full-text

The Beginnings of Machine Translation: The First Rule Based Systems

Machine Translation ◽

10.7551/mitpress/11043.003.0008 ◽

2017 ◽

Keyword(s):

Machine Translation ◽

Rule Based ◽

Rule Based Systems

Download Full-text

English to Hindi Machine Translation System in the Context of Homoeopathy Literature

International Journal of Artificial Life Research ◽

10.4018/ijalr.2016010103 ◽

2016 ◽

Vol 6 (1) ◽

pp. 46-62

Author(s):

Pramod P. Sukhadeve

Keyword(s):

Machine Translation ◽

Translation System ◽

Simple Complex ◽

Rule Based ◽

Machine Translation System ◽

Translation Accuracy

Over the years, researches in machine translation (MT) systems have gain momentum due to their widespread applicability. A number of systems have come up doing the task successfully for different language pairs. However, to the best of the author's knowledge, no significant work has been done in clinical and medical related domain especially in Homoeopathy. This paper describes a rule based English-Hindi MT system for Homoeopathic sentences. It has been designed to translate a variety of sentences from Homoeopathic literature. To achieve the task, the author developed English and Hindi Homoeopathic corpuses presently having the size 21096 and 23145 sentences respectively. For translation, the input sentences (in English) have been categorised in four different type's i.e. simple, complex, interrogative and ambiguous sentences. The authors tested the translation accuracy using BLEU score. At present, the overall Bleu score of the system is 0.7808 and the accuracy percentage is 82.25%.

Download Full-text