Improving Statistical Parser by Recognition of Chinese Number and Quantifier Prefix in Machine Translation

By studying the Chinese number and quantifier prefix (CNQP) as a special language phenomenon in machine translation, this paper presents a CNQP recognition method, which is rule based and independent of word segmentation. The method expressed CNQPs compositions using Backus-Naur Form (BNF), and took the numeral as the active information and the quantifiers as the boundaries of the CNQPs. To avoid the word segmentation noise, a forward maximum matching method was used for obtaining the compositions of the CNQPs, which can be fed into the statistical parser for the analysis of the Chinese sentences. The experimental results indicate the proposed method as a pre-processing module can effectively improve the parsing results of the statistical parser without retraining on experimental data constructed manually, which can further enhance the translation qualities.

Download Full-text

A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

Computational Intelligence and Neuroscience ◽

10.1155/2016/9821608 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Phuoc Tran ◽

Dien Dinh ◽

Hien T. Nguyen

Keyword(s):

Machine Translation ◽

Hybrid Approach ◽

Sparse Data ◽

Word Segmentation ◽

Experimental Results ◽

Translation System ◽

Word Level ◽

Data Problem ◽

Sparse Data Problem ◽

Language Pair

Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.

Download Full-text

Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6448 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9130-9137

Author(s):

Yu Wan ◽

Baosong Yang ◽

Derek F. Wong ◽

Lidia S. Chao ◽

Haihua Du ◽

...

Keyword(s):

Machine Translation ◽

Experimental Results ◽

Official Language ◽

Rule Based ◽

Training Corpus ◽

Special Machine ◽

Target Ranging ◽

Two Sides ◽

Parameter Sharing

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.

Download Full-text

Surveying word boundary factor in Chinese - Vietnamese statistical machine translation

Science and Technology Development Journal ◽

10.32508/stdj.v18i2.1133 ◽

2015 ◽

Vol 18 (2) ◽

pp. 70-78

Author(s):

Phuoc Thanh Tran ◽

Dien Dinh

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Segmentation ◽

Experimental Results ◽

Word Boundary ◽

Experimental Result ◽

Future Research

In isolating languages such as Chinese and Vietnamese, words are not separated by spaces, a word can include one or more spelling words. Segmenting word or not before training and translating process is a problem that need to be considered. In this paper, we will survey the effect of word boundary factor in the translation result of Chinese-Vietnamese statistical machine translation (SMT). The experimental result of this paper will be the basis for word segmentation improvement in future research which increase machine translation performance. We surveyed on two experiments: word segmentation (WS) and word un-segmentation (WUS) on the corpus of 8,000 and 12,000 sentence pairs. Based on the experimental results, we found that both of WS corpus and WUS corpus have their own advantages and defects. We propose integrating the advantages of these two methods in SMT

Download Full-text

STOUT: SMILES to IUPAC names using neural machine translation

Journal of Cheminformatics ◽

10.1186/s13321-021-00512-4 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Kohulan Rajan ◽

Achim Zielesny ◽

Christoph Steinbeck

Keyword(s):

Machine Translation ◽

Similarity Index ◽

Chemical Compounds ◽

Human Beings ◽

International Union ◽

Rule Based ◽

Neural Machine Translation ◽

Tanimoto Similarity ◽

Reverse Translation ◽

String Representation

AbstractChemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.

Download Full-text

Convective Losses From Cavity Solar Receivers—Comparisons Between Analytical Predictions and Experimental Results

Journal of Solar Energy Engineering ◽

10.1115/1.3266342 ◽

1983 ◽

Vol 105 (1) ◽

pp. 29-33 ◽

Cited By ~ 90

Author(s):

A. M. Clausing

Keyword(s):

Experimental Data ◽

Experimental Evidence ◽

Analytical Model ◽

Excellent Agreement ◽

Experimental Results ◽

Simple Analytical Model ◽

Refined Model ◽

Wall Area

Cavity solar receivers are generally believed to have higher thermal efficiencies than external receivers due to reduced losses. A simple analytical model was presented by the author which indicated that the ability to heat the air inside the cavity often controls the convective loss from cavity receivers. Thus, if the receiver contains a large amount of inactive hot wall area, it can experience a large convective loss. Excellent experimental data from a variety of cavity configurations and orientations have recently become available. These data provided a means of testing and refining the analytical model. In this manuscript, a brief description of the refined model is presented. Emphasis is placed on using available experimental evidence to substantiate the hypothesized mechanisms and assumptions. Detailed comparisons are given between analytical predictions and experimental results. Excellent agreement is obtained, and the important mechanisms are more clearly delineated.

Download Full-text

A Technique for the Laboratory Determination of Recirculation in Single Needle Dialysis

The International Journal of Artificial Organs ◽

10.1177/039139889301600202 ◽

1993 ◽

Vol 16 (2) ◽

pp. 63-70 ◽

Cited By ~ 3

Author(s):

N.A. Hoenich ◽

P.T. Smirthwaite ◽

C. Woffindin ◽

P. Lancaster ◽

T.H. Frost ◽

...

Keyword(s):

Experimental Data ◽

Flow Rate ◽

Experimental Results ◽

New Technique ◽

Treatment Efficiency ◽

Single Lumen ◽

Theoretical Predictions ◽

Lumen Catheter ◽

A New Technique

Recirculation is an important factor in single needle dialysis and, if high, can compromise treatment efficiency. To provide information regarding recirculation characteristics of access devices used in single needle dialysis, we have developed a new technique to characterise recirculation and have used this to measure the recirculation of a Terumo 15G fistula needle and a VasCath SC2300 single lumen catheter. The experimentally obtained results agreed well with those established clinically (8.5 ± 2.4% and 18.4 ± 3.4%). The experimental results have also demonstrated a dependence on access type, pump speeds and fistula flow rate. A comparison of experimental data with theoretical predictions showed that the latter exceeded those measured with the largest contribution being due to the experimental fistula.

Download Full-text

Validating Numerical CFD Simulations With Experimental Data for Turbulence Phenomena in Axial Flow Gas Turbine Diffusers

Volume 7: Turbomachinery, Parts A and B ◽

10.1115/gt2009-59086 ◽

2009 ◽

Author(s):

Farrokh Zarifi-Rad ◽

Hamid Vajihollahi ◽

James O’Brien

Keyword(s):

Experimental Data ◽

Gas Turbine ◽

Turbine Blades ◽

Axial Flow ◽

Experimental Results ◽

Scale Model ◽

Data Set ◽

Pressure Profiles ◽

Scale Models ◽

Inlet Conditions

Scale models give engineers an excellent understanding of the aerodynamic behavior behind their design; nevertheless, scale models are time consuming and expensive. Therefore computer simulations such as Computational Fluid Dynamics (CFD) are an excellent alternative to scale models. One must ask the question, how close are the CFD results to the actual fluid behavior of the scale model? In order to answer this question the engineering team investigated the performance of a large industrial Gas Turbine (GT) exhaust diffuser scale model with performance predicted by commercially available CFD software. The experimental results were obtained from a 1:12 scale model of a GT exhaust diffuser with a fixed row of blades to simulate the swirl generated by the last row of turbine blades five blade configurations. This work is to validate the effect of the turbulent inlet conditions on an axial diffuser, both on the experimental front and on the numerical analysis approach. The object of this work is to bring forward a better understanding of velocity and static pressure profiles along the gas turbine diffusers and to provide an accurate experimental data set to validate the CFD prediction. For the CFD aspect, ANSYS CFX software was chosen as the solver. Two different types of mesh (hexagonal and tetrahedral) will be compared to the experimental results. It is understood that hexagonal (HEX) meshes are more time consuming and more computationally demanding, they are less prone to mesh sensitivity and have the tendancy to converge at a faster rate than the tetrahedral (TET) mesh. It was found that the HEX mesh was able to generate more consistent results and had less error than TET mesh.

Download Full-text

A Off-Line Stroke-Based Handwritten Word Segmentation and Recognition Method for Low-Quality Educational Videos

IEEE Sixth International Symposium on Multimedia Software Engineering ◽

10.1109/mmse.2004.16 ◽

2005 ◽

Cited By ~ 1

Author(s):

Lijun Tang ◽

J.R. Kender

Keyword(s):

Word Segmentation ◽

Recognition Method ◽

Educational Videos

Download Full-text

Heats of mixing of binary mixtures of pyridine base with n-alkane

Canadian Journal of Chemistry ◽

10.1139/v88-263 ◽

1988 ◽

Vol 66 (7) ◽

pp. 1625-1627 ◽

Cited By ~ 6

Author(s):

Teresa Kasprzycka-Guttman ◽

Juan H. Vera

Keyword(s):

Experimental Data ◽

Binary Mixtures ◽

Experimental Results ◽

Pyridine Base ◽

Heats Of Mixing

Heats of mixing of 2,4-lutidine and 2,4,6-collidine with n-alkanes were measured at 293.15 K using an isothermal dilution calorimeter. Experimental results were fitted with a Redlich–Kister polynomial. Experimental data and coefficients for the Redlich–Kister polynomials are reported.

Download Full-text