Shooting at Flies in the Dark: Rule-Based Lexical Selection for a Minority Language Pair

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Download Full-text

Rule based features selection for the performance improvement of face recognition system

2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA) ◽

10.1109/c2spca.2013.6749387 ◽

2013 ◽

Author(s):

Vasudha S ◽

Neelamma K. Patil ◽

Lokesh R. Boregowda

Keyword(s):

Face Recognition ◽

Performance Improvement ◽

Recognition System ◽

Features Selection ◽

Rule Based ◽

Face Recognition System ◽

Selection For

Download Full-text

Instance Genetic Selection for Fuzzy Rule-based Systems Optimization to Opinion Classification

IEEE Latin America Transactions ◽

10.1109/tla.2020.9099762 ◽

2020 ◽

Vol 18 (07) ◽

pp. 1215-1221

Author(s):

Tayane Leite Cerqueira ◽

Fabiana Cristina Bertoni ◽

Matheus Giovanni Pires

Keyword(s):

Genetic Selection ◽

Fuzzy Rule ◽

Rule Based ◽

Selection For ◽

Rule Based Systems

Download Full-text

Gene Selection for Microarray Cancer Data Classification by a Novel Rule-Based Algorithm

Information ◽

10.3390/info9010006 ◽

2018 ◽

Vol 9 (1) ◽

pp. 6 ◽

Cited By ~ 8

Author(s):

Adrian Pino Angulo

Keyword(s):

Gene Selection ◽

Data Classification ◽

Rule Based ◽

Cancer Data ◽

Selection For

Download Full-text

Multi-objective evolutionary rule and condition selection for designing fuzzy rule-based classifiers

2012 IEEE International Conference on Fuzzy Systems ◽

10.1109/fuzz-ieee.2012.6251174 ◽

2012 ◽

Cited By ~ 4

Author(s):

Michela Antonelli ◽

Pietro Ducange ◽

Francesco Marcelloni

Keyword(s):

Fuzzy Rule ◽

Rule Based ◽

Multi Objective ◽

Selection For

Download Full-text

Adaptive Equivalent Consumption Minimization Strategy With Rule-Based Gear Selection for the Energy Management of Hybrid Electric Vehicles Equipped With Dual Clutch Transmissions

IEEE Access ◽

10.1109/access.2020.3032044 ◽

2020 ◽

Vol 8 ◽

pp. 190017-190038

Author(s):

Guido Ricardo Guercioni ◽

Enrico Galvagno ◽

Antonio Tota ◽

Alessandro Vigliani

Keyword(s):

Energy Management ◽

Electric Vehicles ◽

Hybrid Electric Vehicles ◽

Rule Based ◽

Selection For ◽

Hybrid Electric ◽

Equivalent Consumption Minimization Strategy

Download Full-text

Kernel Width Selection for SVM Classification

Mathematical Methods for Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-528-3.ch006 ◽

2011 ◽

pp. 101-115

Author(s):

Ali Smith ◽

Kate A. Smith

Keyword(s):

Parameter Selection ◽

Learning Approach ◽

Classification Problems ◽

Rule Based ◽

Svm Classification ◽

Rbf Kernel ◽

Meta Learning ◽

Selection For ◽

Best Parameter ◽

Multi Class Classification

The most critical component of kernel based learning algorithms is the choice of an appropriate kernel and its optimal parameters. In this paper we propose a rule based meta-learning approach for automatic radial basis function (rbf) kernel and its parameter selection for Support Vector Machine (SVM) classification. First, the best parameter selection is considered on the basis of prior information of the data with the help of Maximum Likelihood (ML) method and Nelder-Mead (N-M) simplex method. Then the new rule based meta-learning approach is constructed and tested on different sizes of 112 datasets with binary class as well as multi class classification problems. We observe that our rule based methodology provides significant improvement of computational time as well as accuracy in some specific cases.

Download Full-text

Rule-Based Machine Translation for the Italian–Sardinian Language Pair

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0022 ◽

2017 ◽

Vol 108 (1) ◽

pp. 221-232

Author(s):

Francis M. Tyers ◽

Hèctor Alòs i Font ◽

Gianfranco Fronteddu ◽

Adrià Martín-Mor

Keyword(s):

Machine Translation ◽

Translation System ◽

Rule Based ◽

Romance Language ◽

Machine Translation System ◽

The Mediterranean ◽

Language Pair

AbstractThis paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.

Download Full-text

Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach

JMIR Medical Informatics ◽

10.2196/15980 ◽

2019 ◽

Vol 7 (4) ◽

pp. e15980 ◽

Cited By ~ 3

Author(s):

Irena Spasic ◽

Dominik Krzeminski ◽

Padraig Corcoran ◽

Alexander Balinsky

Keyword(s):

Clinical Trials ◽

Language Processing ◽

Medical Records ◽

Training Data ◽

Eligibility Criteria ◽

Rule Based ◽

Cohort Selection ◽

Selection For ◽

The Given ◽

F Measure

Background Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. Objective Track 1 of the 2018 National Natural Language Processing Clinical Challenge focused on the task of cohort selection for clinical trials, aiming to answer the following question: Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials? The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. We aimed to describe a system developed to address this task. Methods Our system consisted of 13 classifiers, one for each eligibility criterion. All classifiers used a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern-matching approach was used to extract context-sensitive features. They were embedded back into the text as lexically distinguishable tokens, which were consequently featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances was available to learn from. A rule-based approach focusing on a small set of relevant features was chosen for the remaining criteria. Results The system was evaluated using microaveraged F measure. Overall, 4 machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier, and gradient tree boosting (GTB), were evaluated on the training data using 10–fold cross-validation. Overall, GTB demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. The final evaluation was performed on previously unseen test data. On average, the F measure of 89.04% was comparable to 3 of the top ranked performances in the shared task (91.11%, 90.28%, and 90.21%). With an F measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50%, and 70.81%) in identifying patients with advanced coronary artery disease. Conclusions The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset.

Download Full-text

Lexical Selection for Cross-Language Applications: Combining LCS with WordNet

10.21236/ada458651 ◽

1998 ◽

Cited By ~ 1

Author(s):

Bonnie J. Dorr ◽

Maria Katsova

Keyword(s):

Lexical Selection ◽

Selection For ◽

Cross Language

Download Full-text