scholarly journals Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

Author(s):  
Gerhard Jäger ◽  
Johann-Mattis List ◽  
Pavel Sofroniev
Author(s):  
Nur Ariffin Mohd Zin ◽  
Hishammuddin Asmuni ◽  
Haza Nuzly Abdul Hamed ◽  
Razib M. Othman ◽  
Shahreen Kasim ◽  
...  

Recent studies have shown that the wearing of soft lens may lead to performance degradation with the increase of false reject rate. However, detecting the presence of soft lens is a non-trivial task as its texture that almost indiscernible. In this work, we proposed a classification method to identify the existence of soft lens in iris image. Our proposed method starts with segmenting the lens boundary on top of the sclera region. Then, the segmented boundary is used as features and extracted by local descriptors. These features are then trained and classified using Support Vector Machines. This method was tested on Notre Dame Cosmetic Contact Lens 2013 database. Experiment showed that the proposed method performed better than state of the art methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nalindren Naicker ◽  
Timothy Adeliyi ◽  
Jeanette Wing

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.


2019 ◽  
Vol 2019 ◽  
pp. 1-30 ◽  
Author(s):  
R. Y. Goh ◽  
L. S. Lee

Development of credit scoring models is important for financial institutions to identify defaulters and nondefaulters when making credit granting decisions. In recent years, artificial intelligence (AI) techniques have shown successful performance in credit scoring. Support Vector Machines and metaheuristic approaches have constantly received attention from researchers in establishing new credit models. In this paper, two AI techniques are reviewed with detailed discussions on credit scoring models built from both methods since 1997 to 2018. The main discussions are based on two main aspects which are model type with issues addressed and assessment procedures. Then, together with the compilation of past experiments results on common datasets, hybrid modelling is the state-of-the-art approach for both methods. Some possible research gaps for future research are identified.


2014 ◽  
Vol 2014 ◽  
pp. 1-15
Author(s):  
Hilario Gómez-Moreno ◽  
Pedro Gil-Jiménez ◽  
Sergio Lafuente-Arroyo ◽  
Roberto López-Sastre ◽  
Saturnino Maldonado-Bascón

We present a new impulse noise removal technique based on Support Vector Machines (SVM). Both classification and regression were used to reduce the “salt and pepper” noise found in digital images. Classification enables identification of noisy pixels, while regression provides a means to determine reconstruction values. The training vectors necessary for the SVM were generated synthetically in order to maintain control over quality and complexity. A modified median filter based on a previous noise detection stage and a regression-based filter are presented and compared to other well-known state-of-the-art noise reduction algorithms. The results show that the filters proposed achieved good results, outperforming other state-of-the-art algorithms for low and medium noise ratios, and were comparable for very highly corrupted images.


2020 ◽  
pp. 016555152096125
Author(s):  
Wenda Qin ◽  
Randa Elanwar ◽  
Margrit Betke

Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Karol Nowakowski ◽  
Michal Ptaszynski ◽  
Fumito Masui ◽  
Yoshio Momouchi

We describe our attempt to apply a state-of-the-art sequential tagger – SVMTool – in the task of automatic part-of-speech annotation of the Ainu language, a critically endangered language isolate spoken by the native inhabitants of northern Japan. Our experiments indicated that it performs better than the custom system proposed in previous research (POST-AL), especially when applied to out-of-domain data. The biggest advantage of the model trained using SVMTool over the POST-AL tagger is its ability to guess part-of-speech tags for OoV words, with the accuracy of up to 63%.


Author(s):  
Leticia C. Cagnina ◽  
Paolo Rosso

Online opinions play an important role for customers and companies because of the increasing use they do to make purchase and business decisions. A consequence of that is the growing tendency to post fake reviews in order to change purchase decisions and opinions about products and services. Therefore, it is really important to filter out deceptive comments from the retrieved opinions. In this paper we propose the character n-grams in tokens, an efficient and effective variant of the traditional character n-grams model, which we use to obtain a low dimensionality representation of opinions. A Support Vector Machines classifier was used to evaluate our proposal on available corpora with reviews of hotels, doctors and restaurants. In order to study the performance of our model, we make experiments with intra and cross-domain cases. The aim of the latter experiment is to evaluate our approach in a realistic cross-domain scenario where deceptive opinions are available in a domain but not in another one. After comparing our method with state-of-the-art ones we may conclude that using character n-grams in tokens allows to obtain competitive results with a low dimensionality representation.


Sign in / Sign up

Export Citation Format

Share Document