scholarly journals Convergence Rates for Empirical Estimation of Binary Classification Bounds

Entropy ◽  
2019 ◽  
Vol 21 (12) ◽  
pp. 1144
Author(s):  
Salimeh Yasaei Sekeh ◽  
Morteza Noshad ◽  
Kevin R. Moon ◽  
Alfred O. Hero

Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze–Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman–Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman–Rafsky estimator of the Henze–Penrose divergence. We validate our results experimentally and illustrate their application to real datasets.

2021 ◽  
Author(s):  
Yashodhan Rajiv Athavale

The objective of this study is to assess the performance and capability of a kernel-based machine learning method for time-series signal classification. Applying various stages of dimension transformation, training, testing and cross-validation, we attempt to perform a binary classification using the time-series signals from each category. This study has been applied to two domains: Financial and Biomedical. The financial domain study involves identifying the possibility of collapse or survival of a company trading in the stock market. For assessing the fate of each company, we collect its real stock market data, which is basically a financial time-series composed of weekly closing stock prices in a common time-series interval. This study has been applied to various economic sectors such as Pharmaceuticals and Biotechnology, Automobiles, Oil & Gas, Water Supply etc. The data has been collected using Thomson’s Datastream software. In the biomedical study we are dealing with knee signals collected using the Vibration arthrometry technique. This study involves using the severity of cartilage degeneration for assessing the possibility omachinf a subject getting affected by Osteoarthritis or undergoing knee replacement surgery at a later stage. This non-invasive diagnostic method can also prove be an alternative to various invasive procedures used for detecting osteoarthritis. For this analysis we have used the vibroarthro-signals for about 38 abnormal and 51 normal knee joint case studies. In both studies we apply Fisher Kernels incorporated with Gaussian Mixture Model (GMM) for dimension transformation into feature space created as a three-dimensional plot for visualization. The transformed data is then trained and tested using support vector machines for performing binary classification. From our experiments we observe that our method fits really well for both the studies with the classification error rate between 10% to 15%.


2021 ◽  
Author(s):  
Yashodhan Rajiv Athavale

The objective of this study is to assess the performance and capability of a kernel-based machine learning method for time-series signal classification. Applying various stages of dimension transformation, training, testing and cross-validation, we attempt to perform a binary classification using the time-series signals from each category. This study has been applied to two domains: Financial and Biomedical. The financial domain study involves identifying the possibility of collapse or survival of a company trading in the stock market. For assessing the fate of each company, we collect its real stock market data, which is basically a financial time-series composed of weekly closing stock prices in a common time-series interval. This study has been applied to various economic sectors such as Pharmaceuticals and Biotechnology, Automobiles, Oil & Gas, Water Supply etc. The data has been collected using Thomson’s Datastream software. In the biomedical study we are dealing with knee signals collected using the Vibration arthrometry technique. This study involves using the severity of cartilage degeneration for assessing the possibility omachinf a subject getting affected by Osteoarthritis or undergoing knee replacement surgery at a later stage. This non-invasive diagnostic method can also prove be an alternative to various invasive procedures used for detecting osteoarthritis. For this analysis we have used the vibroarthro-signals for about 38 abnormal and 51 normal knee joint case studies. In both studies we apply Fisher Kernels incorporated with Gaussian Mixture Model (GMM) for dimension transformation into feature space created as a three-dimensional plot for visualization. The transformed data is then trained and tested using support vector machines for performing binary classification. From our experiments we observe that our method fits really well for both the studies with the classification error rate between 10% to 15%.


2021 ◽  
Vol 13 (9) ◽  
pp. 1623
Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.


Author(s):  
Kanae Takahashi ◽  
Kouji Yamamoto ◽  
Aya Kuchiba ◽  
Tatsuki Koyama

AbstractA binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier’s performance, F1 score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the F1 score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of F1 scores, and statistical properties of these F1 scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating F1 scores with confidence intervals.


Author(s):  
Ana R. Redondo ◽  
Jorge Navarro ◽  
Rubén R. Fernández ◽  
Isaac Martín de Diego ◽  
Javier M. Moguerza ◽  
...  

2021 ◽  
Author(s):  
Federica Zonzini ◽  
Francesca Romano ◽  
Antonio Carbone ◽  
Matteo Zauli ◽  
Luca De Marchi

Abstract Despite the outstanding improvements achieved by artificial intelligence in the Structural Health Monitoring (SHM) field, some challenges need to be coped with. Among them, the necessity to reduce the complexity of the models and the data-to-user latency time which are still affecting state-of-the-art solutions. This is due to the continuous forwarding of a huge amount of data to centralized servers, where the inference process is usually executed in a bulky manner. Conversely, the emerging field of Tiny Machine Learning (TinyML), promoted by the recent advancements by the electronic and information engineering community, made sensor-near data inference a tangible, low-cost and computationally efficient alternative. In line with this observation, this work explored the embodiment of the One Class Classifier Neural Network, i.e., a neural network architecture solving binary classification problems for vibration-based SHM scenarios, into a resource-constrained device. To this end, OCCNN has been ported on the Arduino Nano 33 BLE Sense platform and validated with experimental data from the Z24 bridge use case, reaching an average accuracy and precision of 95% and 94%, respectively.


Sign in / Sign up

Export Citation Format

Share Document