scholarly journals Kannada morpheme segmentation using machine learning

2018 ◽  
Vol 7 (2.31) ◽  
pp. 45
Author(s):  
Sachi Angle ◽  
B Ashwath Rao ◽  
S N. Muralikrishna

This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various  methods for evaluation were considered and an accuracy of 85.97% was achieved.

Geophysics ◽  
2013 ◽  
Vol 78 (3) ◽  
pp. WB113-WB126 ◽  
Author(s):  
Matthew J. Cracknell ◽  
Anya M. Reading

Inductive machine learning algorithms attempt to recognize patterns in, and generalize from empirical data. They provide a practical means of predicting lithology, or other spatially varying physical features, from multidimensional geophysical data sets. It is for this reason machine learning approaches are increasing in popularity for geophysical data inference. A key motivation for their use is the ease with which uncertainty measures can be estimated for nonprobabilistic algorithms. We have compared and evaluated the abilities of two nonprobabilistic machine learning algorithms, random forests (RF) and support vector machines (SVM), to recognize ambiguous supervised classification predictions using uncertainty calculated from estimates of class membership probabilities. We formulated a method to establish optimal uncertainty threshold values to identify and isolate the maximum number of incorrect predictions while preserving most of the correct classifications. This is illustrated using a case example of the supervised classification of surface lithologies in a folded, structurally complex, metamorphic terrain. We found that (1) the use of optimal uncertainty thresholds significantly improves overall classification accuracy of RF predictions, but not those of SVM, by eliminating the maximum number of incorrectly classified samples while preserving the maximum number of correctly classified samples; (2) RF, unlike SVM, was able to exploit dependencies and structures contained within spatially varying input data; and (3) high RF prediction uncertainty is spatially coincident with transitions in lithology and associated contact zones, and regions of intense deformation. Uncertainty has its upside in the identification of areas of key geologic interest and has wide application across the geosciences, where transition zones are important classes in their own right. The techniques used in this study are of practical value in prioritizing subsequent geologic field activities, which, with the aid of this analysis, may be focused on key lithology contacts and problematic localities.


2017 ◽  
Vol 51 (2) ◽  
pp. 329-341
Author(s):  
Nicolas Couellan

In this note, we investigate connections between supervised classification and (Generalized) Nash equilibrium problems (NEP & GNEP). For the specific case of support vector machines (SVM), we exploit the geometric properties of class separation in the dual space to formulate a non-cooperative game. NEP and Generalized NEP formulations are proposed for both binary and multi-class SVM problems.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yao Huimin

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0257901
Author(s):  
Yanjing Bi ◽  
Chao Li ◽  
Yannick Benezeth ◽  
Fan Yang

Phoneme pronunciations are usually considered as basic skills for learning a foreign language. Practicing the pronunciations in a computer-assisted way is helpful in a self-directed or long-distance learning environment. Recent researches indicate that machine learning is a promising method to build high-performance computer-assisted pronunciation training modalities. Many data-driven classifying models, such as support vector machines, back-propagation networks, deep neural networks and convolutional neural networks, are increasingly widely used for it. Yet, the acoustic waveforms of phoneme are essentially modulated from the base vibrations of vocal cords, and this fact somehow makes the predictors collinear, distorting the classifying models. A commonly-used solution to address this issue is to suppressing the collinearity of predictors via partial least square regressing algorithm. It allows to obtain high-quality predictor weighting results via predictor relationship analysis. However, as a linear regressor, the classifiers of this type possess very simple topology structures, constraining the universality of the regressors. For this issue, this paper presents an heterogeneous phoneme recognition framework which can further benefit the phoneme pronunciation diagnostic tasks by combining the partial least square with support vector machines. A French phoneme data set containing 4830 samples is established for the evaluation experiments. The experiments of this paper demonstrates that the new method improves the accuracy performance of the phoneme classifiers by 0.21 − 8.47% comparing to state-of-the-arts with different data training data density.


2011 ◽  
Vol 230-232 ◽  
pp. 625-628
Author(s):  
Lei Shi ◽  
Xin Ming Ma ◽  
Xiao Hong Hu

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nalindren Naicker ◽  
Timothy Adeliyi ◽  
Jeanette Wing

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.


Sign in / Sign up

Export Citation Format

Share Document