Improving Statistical Parser by Recognition of Chinese Number and Quantifier Prefix in Machine Translation

2013 ◽  
Vol 427-429 ◽  
pp. 1841-1844
Author(s):  
Wen Xiong ◽  
Yao Hong Jin ◽  
Zhi Ying Liu

By studying the Chinese number and quantifier prefix (CNQP) as a special language phenomenon in machine translation, this paper presents a CNQP recognition method, which is rule based and independent of word segmentation. The method expressed CNQPs compositions using Backus-Naur Form (BNF), and took the numeral as the active information and the quantifiers as the boundaries of the CNQPs. To avoid the word segmentation noise, a forward maximum matching method was used for obtaining the compositions of the CNQPs, which can be fed into the statistical parser for the analysis of the Chinese sentences. The experimental results indicate the proposed method as a pre-processing module can effectively improve the parsing results of the statistical parser without retraining on experimental data constructed manually, which can further enhance the translation qualities.

2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Phuoc Tran ◽  
Dien Dinh ◽  
Hien T. Nguyen

Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.


2020 ◽  
Vol 34 (05) ◽  
pp. 9130-9137
Author(s):  
Yu Wan ◽  
Baosong Yang ◽  
Derek F. Wong ◽  
Lidia S. Chao ◽  
Haihua Du ◽  
...  

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.


2015 ◽  
Vol 18 (2) ◽  
pp. 70-78
Author(s):  
Phuoc Thanh Tran ◽  
Dien Dinh

In isolating languages such as Chinese and Vietnamese, words are not separated by spaces, a word can include one or more spelling words. Segmenting word or not before training and translating process is a problem that need to be considered. In this paper, we will survey the effect of word boundary factor in the translation result of Chinese-Vietnamese statistical machine translation (SMT). The experimental result of this paper will be the basis for word segmentation improvement in future research which increase machine translation performance. We surveyed on two experiments: word segmentation (WS) and word un-segmentation (WUS) on the corpus of 8,000 and 12,000 sentence pairs. Based on the experimental results, we found that both of WS corpus and WUS corpus have their own advantages and defects. We propose integrating the advantages of these two methods in SMT


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

AbstractChemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this ruleset a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner. Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. predicting the SMILES string from the IUPAC name. In both cases, the system is able to predict with an average BLEU score of about 90% and a Tanimoto similarity index of more than 0.9. Also incorrect predictions show a remarkable similarity between true and predicted compounds.


1983 ◽  
Vol 105 (1) ◽  
pp. 29-33 ◽  
Author(s):  
A. M. Clausing

Cavity solar receivers are generally believed to have higher thermal efficiencies than external receivers due to reduced losses. A simple analytical model was presented by the author which indicated that the ability to heat the air inside the cavity often controls the convective loss from cavity receivers. Thus, if the receiver contains a large amount of inactive hot wall area, it can experience a large convective loss. Excellent experimental data from a variety of cavity configurations and orientations have recently become available. These data provided a means of testing and refining the analytical model. In this manuscript, a brief description of the refined model is presented. Emphasis is placed on using available experimental evidence to substantiate the hypothesized mechanisms and assumptions. Detailed comparisons are given between analytical predictions and experimental results. Excellent agreement is obtained, and the important mechanisms are more clearly delineated.


1993 ◽  
Vol 16 (2) ◽  
pp. 63-70 ◽  
Author(s):  
N.A. Hoenich ◽  
P.T. Smirthwaite ◽  
C. Woffindin ◽  
P. Lancaster ◽  
T.H. Frost ◽  
...  

Recirculation is an important factor in single needle dialysis and, if high, can compromise treatment efficiency. To provide information regarding recirculation characteristics of access devices used in single needle dialysis, we have developed a new technique to characterise recirculation and have used this to measure the recirculation of a Terumo 15G fistula needle and a VasCath SC2300 single lumen catheter. The experimentally obtained results agreed well with those established clinically (8.5 ± 2.4% and 18.4 ± 3.4%). The experimental results have also demonstrated a dependence on access type, pump speeds and fistula flow rate. A comparison of experimental data with theoretical predictions showed that the latter exceeded those measured with the largest contribution being due to the experimental fistula.


Author(s):  
Farrokh Zarifi-Rad ◽  
Hamid Vajihollahi ◽  
James O’Brien

Scale models give engineers an excellent understanding of the aerodynamic behavior behind their design; nevertheless, scale models are time consuming and expensive. Therefore computer simulations such as Computational Fluid Dynamics (CFD) are an excellent alternative to scale models. One must ask the question, how close are the CFD results to the actual fluid behavior of the scale model? In order to answer this question the engineering team investigated the performance of a large industrial Gas Turbine (GT) exhaust diffuser scale model with performance predicted by commercially available CFD software. The experimental results were obtained from a 1:12 scale model of a GT exhaust diffuser with a fixed row of blades to simulate the swirl generated by the last row of turbine blades five blade configurations. This work is to validate the effect of the turbulent inlet conditions on an axial diffuser, both on the experimental front and on the numerical analysis approach. The object of this work is to bring forward a better understanding of velocity and static pressure profiles along the gas turbine diffusers and to provide an accurate experimental data set to validate the CFD prediction. For the CFD aspect, ANSYS CFX software was chosen as the solver. Two different types of mesh (hexagonal and tetrahedral) will be compared to the experimental results. It is understood that hexagonal (HEX) meshes are more time consuming and more computationally demanding, they are less prone to mesh sensitivity and have the tendancy to converge at a faster rate than the tetrahedral (TET) mesh. It was found that the HEX mesh was able to generate more consistent results and had less error than TET mesh.


1988 ◽  
Vol 66 (7) ◽  
pp. 1625-1627 ◽  
Author(s):  
Teresa Kasprzycka-Guttman ◽  
Juan H. Vera

Heats of mixing of 2,4-lutidine and 2,4,6-collidine with n-alkanes were measured at 293.15 K using an isothermal dilution calorimeter. Experimental results were fitted with a Redlich–Kister polynomial. Experimental data and coefficients for the Redlich–Kister polynomials are reported.


Sign in / Sign up

Export Citation Format

Share Document