Classification methods for functional data

Author(s):  
Amparo Baillo ◽  
Antonio Cuevas ◽  
Ricardo Fraiman

This article reviews the literature concerning supervised and unsupervised classification of functional data. It first explains the meaning of unsupervised classification vs. supervised classification before discussing the supervised classification problem in the infinite-dimensional case, showing that its formal statement generally coincides with that of discriminant analysis in the classical multivariate case. It then considers the optimal classifier and plug-in rules, empirical risk and empirical minimization rules, linear discrimination rules, the k nearest neighbor (k-NN) method, and kernel rules. It also describes classification based on partial least squares, classification based on reproducing kernels, and depth-based classification. Finally, it examines unsupervised classification methods, focusing on K-means for functional data, K-means for data in a Hilbert space, and impartial trimmed K-means for functional data. Some practical issues, in particular real-data examples and simulations, are reviewed and some selected proofs are given.

Author(s):  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
Alexander Gray ◽  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
...  

Chapter 6 described techniques for estimating joint probability distributions from multivariate data sets and for identifying the inherent clustering within the properties of sources. This approach can be viewed as the unsupervised classification of data. If, however, we have labels for some of these data points (e.g., an object is tall, short, red, or blue) we can utilize this information to develop a relationship between the label and the properties of a source. We refer to this as supervised classification, which is the focus of this chapter. The motivation for supervised classification comes from the long history of classification in astronomy. Possibly the most well known of these classification schemes is that defined by Edwin Hubble for the morphological classification of galaxies based on their visual appearance. This chapter discusses generative classification, k-nearest-neighbor classifier, discriminative classification, support vector machines, decision trees, and evaluating classifiers.


Jurnal INFORM ◽  
2016 ◽  
Vol 1 (2) ◽  
Author(s):  
Evy Kamilah Ratnasari

Abstract — Fruit recognition can be automatically applied to the field of education, industry, sales, as well as science. In the vision of computer recognition of fruit relies on four basic features that describe the characteristics of the fruit, i.e., size, color, shape, and texture. The fruit recognition through the RGB image results of cameras using the features of shape and size are not reliable and effective, because in a real data image can be composed of several different sizes of fruit on each type of fruit so it can't be identified morphologically the fruit size and uniformity that can affect the results of the classification. This journal based on the feature recognition method of building colors and textures for the classification of fruit.The classification is done by K-Nearest Neighbor based on color and texture features co-occurrence. Experimental results of 1882 dataset image of fruit for 12 different classes can recognize the fruit in both color and texture features based with the highest accuracy of 92%.


2019 ◽  
Vol 29 (2) ◽  
pp. 393-405 ◽  
Author(s):  
Magdalena Piotrowska ◽  
Gražina Korvel ◽  
Bożena Kostek ◽  
Tomasz Ciszewski ◽  
Andrzej Cżyzewski

Abstract Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and self-organizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’ and phonology experts’ speech was selected for analyses. For the purpose of the present study, a sub-list of 103 words containing the English alveolar lateral phoneme /l/ was compiled. The list includes ‘dark’ (velarized) allophonic realizations (which occur before a consonant or at the end of the word before silence) and 52 ‘clear’ allophonic realizations (which occur before a vowel), as well as voicing variants. The recorded signals were segmented into allophones and parametrized using a set of descriptors, originating from the MPEG 7 standard, plus dedicated time-based parameters as well as modified MFCC features proposed by the authors. Classification methods such as ANNs, the kNN and the SOM were employed to automatically detect the two types of allophones. Various sets of features were tested to achieve the best performance of the automatic methods. In the final experiment, a selected set of features was used for automatic evaluation of the pronunciation of dark /l/ by non-native speakers.


Teknik ◽  
2021 ◽  
Vol 42 (2) ◽  
pp. 137-148
Author(s):  
Vincentius Abdi Gunawan ◽  
Leonardus Sandy Ade Putra

Communication is essential in conveying information from one individual to another. However, not all individuals in the world can communicate verbally. According to WHO, deafness is a hearing loss that affects 466 million people globally, and 34 million are children. So it is necessary to have a non-verbal language learning method for someone who has hearing problems. The purpose of this study is to build a system that can identify non-verbal language so that it can be easily understood in real-time. A high success rate in the system needs a proper method to be applied in the system, such as machine learning supported by wavelet feature extraction and different classification methods in image processing. Machine learning was applied in the system because of its ability to recognize and compare the classification results in four different methods. The four classifications used to compare the hand gesture recognition from American Sign Language are the Multi-Class SVM classification, Backpropagation Neural Network Backpropagation, K - Nearest Neighbor (K-NN), and Naïve Bayes. The simulation test of the four classification methods that have been carried out obtained success rates of 99.3%, 98.28%, 97.7%, and 95.98%, respectively. So it can be concluded that the classification method using the Multi-Class SVM has the highest success rate in the introduction of American Sign Language, which reaches 99.3%. The whole system is designed and tested using MATLAB as supporting software and data processing.


2019 ◽  
Vol 16 (2) ◽  
pp. 187
Author(s):  
Mega Luna Suliztia ◽  
Achmad Fauzan

Classification is the process of grouping data based on observed variables to predict new data whose class is unknown. There are some classification methods, such as Naïve Bayes, K-Nearest Neighbor and Neural Network. Naïve Bayes classifies based on the probability value of the existing properties. K-Nearest Neighbor classifies based on the character of its nearest neighbor, where the number of neighbors=k, while Neural Network classifies based on human neural networks. This study will compare three classification methods for Seat Load Factor, which is the percentage of aircraft load, and also a measure in determining the profit of airline.. Affecting factors are the number of passengers, ticket prices, flight routes, and flight times. Based on the analysis with 47 data, it is known that the system of Naïve Bayes method has misclassifies in 14 data, so the accuracy rate is 70%. The system of K-Nearest Neighbor method with k=5 has misclassifies in 5 data, so the accuracy rate is 89%, and the Neural Network system has misclassifies in 10 data with accuracy rate 78%. The method with highest accuracy rate is the best method that will be used, which in this case is K-Nearest Neighbor method with success of classification system is 42 data, including 14 low, 10 medium, and 18 high value. Based on the best method, predictions can be made using new data, for example the new data consists of Bali flight routes (2), flight times in afternoon (2), estimate of passenger numbers is 140 people, and ticket prices is Rp.700,000. By using the K-Nearest Neighbor method, Seat Load Factor prediction is high or at intervals of 80% -100%.


Author(s):  
Triando Hamonangan Saragih ◽  
Diny Melsye Nurul Fajri ◽  
Alfita Rakhmandasari

Jatropha Curcas is a very useful plant that can be used as a bio fuel for diesel engines replacing the coal. In Indonesia, there are few plantation that plant Jatropha Curcas. But there is so limited farmers that understand in detail about the disease of Jatropha Curcas and it may cause a big loss during harvesting when the disease occured with no further action. An expert system can help the farmers to identify the lant diseases of Jatropha Curcas. The objective of this research is to compare several identification and classification methods, such as Decision Tree, K-Nearest Neighbor and its modification. The comparison is based on the accuracy. Modified K-Nearest Neighbor method given the best accuracy result that is 67.74%.


Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<p>Online business development through e-commerce platforms is a phenomenon which change the world of promoting and selling products in this 21<sup>st</sup> century. Product title classification is an important task in assisting retailers and sellers to list a product in a suitable category. Product title classification is apart of text classification problem but the properties of product title are different from general document. This study aims to evaluate the performance of five different supervised learning models on data sets consist of e-commerce product titles with a very short description and they are incomplete sentences. The supervised learning models involve in the study are Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree, Support Vector Machine (SVM) and Random Forest. The results show KNN model is the best model with the highest accuracy and fastest computation time to classify the data used in the study. Hence, KNN model is a good approach in classifying e-commerce products.</p>


2021 ◽  
pp. 179-218
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter discusses several classification and regression methods that can be used with game data. Specifically, we will discuss regression methods, including Linear Regression, and classification methods, including K-Nearest Neighbor, Naïve Bayes, Logistic Regression, Linear Discriminant Analysis, Support Vector Machines, Decisions Trees, and Random Forests. We will discuss how you can setup the data to apply these algorithms, as well as how you can interpret the results and the pros and cons for each of the methods discussed. We will conclude the chapter with some remarks on the process of application of these methods to games and the expected outcomes. The chapter also includes practical labs to walk you through the process of applying these methods to real game data.


Sign in / Sign up

Export Citation Format

Share Document