classification errors
Recently Published Documents


TOTAL DOCUMENTS

117
(FIVE YEARS 29)

H-INDEX

17
(FIVE YEARS 2)

2021 ◽  
pp. 127-133
Author(s):  
Д.Н. Кобзаренко

В работе приводятся результаты анализа временных рядов – скоростей и направлений ветра в региональном масштабе с использованием моделей нейронных сетей и задачи классификации на основе данных четырех метеорологических станций, расположенных на территории Республики Дагестан. В качестве исходных данных взяты временные ряды за период 2011-2020гг с частотой измерений 8 раз в сутки. Цель работы заключается в изучении закономерностей во временных рядах на основе результатов машинного обучения в задаче классификации. В рамках поставленной цели решаются задачи: спроектировать модели нейронных сетей для классификации метеорологической станции на основе данных скоростей и направлений ветра (вместе и по отдельности); добиться максимально возможной точности предсказания через настройку глобальных параметров; выполнить серию экспериментов по моделированию и оценить результаты. В результате выполнения экспериментов получены зависимости точности классификации от размера блока данных, которые позволяют сделать выводе о минимальном размере блока данных во временном ряде, обеспечивающем точности близкие к максимально возможным. Также установлено и показано, что ошибки классификации модели нейронных сетей явно коррелируют с географическим положением метеорологических станций. По распределению ошибок классификации во временном интервале, установлено, что меньше всего ошибок имеется в весенний период, больше всего – в летний. В целом у расположенных на морском побережье метеорологических станций ошибок классификации больше, что говорит о меньшей уникальности ветрового режима в этих районах. Результаты работы также позволяют сделать общий вывод о том, что нейронные сети могут использоваться не только как инструмент прогноза, распознавания или классификации, но и как инструмент, позволяющий давать аналитическую оценку исходным данным – временным рядам. The paper presents the analytics results of time series – wind speeds and wind directions on a regional scale using neural network models for the classification task based on data from four meteorological stations located on the territory of the Republic of Dagestan. Time series for the period 2011-2020 were taken as the initial data with a frequency of measurements 8 times a day. The purpose of the work is to study patterns in time series based on the results of machine learning in the classification task. Within the framework of this purpose, the following tasks are being solved: to develop neural network models for the classification of a meteorological station based on data of wind speeds and wind directions (together and separately); to achieve the highest possible prediction accuracy by adjusting the global parameters; to run a series of simulation experiments and evaluate the results. As a result of the experiments, the dependences of the classification accuracy on the data block size were obtained, which allow us to conclude about the minimum size of the data block in the time series, which provides the accuracy close to the maximum possible. It was also found and shown that classification errors of the neural network model clearly correlate with the geographical location of meteorological stations. According to the distribution of classification errors in the time interval, it was found that the least number of errors is in the spring period, and most of all – in the summer ones. In general, the meteorological stations located on the sea coast have more classification errors, which indicates a lesser uniqueness of the wind dynamics in these regions. The paper results also allow us to draw a general conclusion that neural networks can be used not only as a forecasting, recognition or classification tool, but also as a tool that allows an analytical assessment of the time series data.


2021 ◽  
pp. 014662162110468
Author(s):  
Irina Grabovsky ◽  
Jesse Pace ◽  
Christopher Runyon

We model pass/fail examinations aiming to provide a systematic tool to minimize classification errors. We use the method of cut-score operating functions to generate specific cut-scores on the basis of minimizing several important misclassification measures. The goal of this research is to examine the combined effects of a known distribution of examinee abilities and uncertainty in the standard setting on the optimal choice of the cut-score. In addition, we describe an online application that allows others to utilize the cut-score operating function for their own standard settings.


2021 ◽  
Author(s):  
Mohamed Aziz Bhouri

Abstract We present a simulation-based classification approach for large deployed structures with localized operational excitations. The method extends the two-level Port-Reduced Reduced-Basis Component (PR-RBC) technique to provide faster solution estimation to the hyperbolic partial differential equation of time-domain elastodynamics with a moving load. Time-domain correlation function-based features are built in order to train classifiers such as Artificial Neural Networks and Support-Vector Machines and perform damage detection. The method is tested on a bridge-shaped structure with a moving vehicle (playing the role of a digital twin) in order to detect cracks’ existence. Such problem has 45 parameters and shows the merits of the two-level PR-RBC approach and of the correlation function-based features in the context of operational excitations, other nuisance parameters and added noise. The quality of the classification task is enhanced by the sufficiently large synthetic training dataset and the accuracy of the numerical solutions, reaching test classification errors below 0.1% for disjoint training set of size 7 × 103 and test set of size 3 × 103. Effects of the numerical solutions accuracy and of the sensors locations on the classification errors are also studied, showing the robustness of the proposed approach and the importance of constructing a rich and accurate representation of possible healthy and unhealthy states of interest.


2021 ◽  
Vol 13 (9) ◽  
pp. 1742
Author(s):  
Charles Labuzzetta ◽  
Zhengyuan Zhu ◽  
Xinyue Chang ◽  
Yuyu Zhou

Global surface water classification layers, such as the European Joint Research Centre’s (JRC) Monthly Water History dataset, provide a starting point for accurate and large scale analyses of trends in waterbody extents. On the local scale, there is an opportunity to increase the accuracy and temporal frequency of these surface water maps by using locally trained classifiers and gap-filling missing values via imputation in all available satellite images. We developed the Surface Water IMputation (SWIM) classification framework using R and the Google Earth Engine computing platform to improve water classification compared to the JRC study. The novel contributions of the SWIM classification framework include (1) a cluster-based algorithm to improve classification sensitivity to a variety of surface water conditions and produce approximately unbiased estimation of surface water area, (2) a method to gap-fill every available Landsat image for a region of interest to generate submonthly classifications at the highest possible temporal frequency, (3) an outlier detection method for identifying images that contain classification errors due to failures in cloud masking. Validation and several case studies demonstrate the SWIM classification framework outperforms the JRC dataset in spatiotemporal analyses of small waterbody dynamics with previously unattainable sensitivity and temporal frequency. Most importantly, this study shows that reliable surface water classifications can be obtained for all pixels in every available Landsat image, even those containing cloud cover, after performing gap-fill imputation. By using this technique, the SWIM framework supports monitoring water extent on a submonthly basis, which is especially applicable to assessing the impact of short-term flood and drought events. Additionally, our results contribute to addressing the challenges of training machine learning classifiers with biased ground truth data and identifying images that contain regions of anomalous classification errors.


2021 ◽  
pp. 238008442110071
Author(s):  
T.S. Alshihayb ◽  
B. Heaton

Introduction: Misclassification of clinical periodontitis can occur by partial-mouth protocols, particularly when tooth-based case definitions are applied. In these cases, the true prevalence of periodontal disease is underestimated, but specificity is perfect. In association studies of periodontal disease etiology, misclassification by this mechanism is independent of exposure status (i.e., nondifferential). Despite nondifferential mechanisms, differential misclassification may be realized by virtue of random errors. Objectives: To gauge the amount of uncertainty around the expectation of differential periodontitis outcome misclassification due to random error only, we estimated the probability of differential outcome misclassification, its magnitude, and expected impacts via simulation methods using values from the periodontitis literature. Methods: We simulated data sets with a binary exposure and outcome that varied according to sample size (200, 1,000, 5,000, 10,000), exposure effect (risk ratio; 1.5, 2), exposure prevalence (0.1, 0.3), outcome incidence (0.1, 0.4), and outcome sensitivity (0.6, 0.8). Using a Bernoulli trial, we introduced misclassification by randomly sampling individuals with the outcome in each exposure group and repeated each scenario 10,000 times. Results: The probability of differential misclassification decreased as the simulation parameter values increased and occurred at least 37% of the time across the 10,000 repetitions. Across all scenarios, the risk ratio was biased, on average, toward the null when the sensitivity was higher among the unexposed and away from the null when it was higher among the exposed. The extent of bias for absolute sensitivity differences ≥0.04 ranged from 0.05 to 0.19 regardless of simulation parameters. However, similar trends were not observed for the odds ratio where the extent and direction of bias were dependent on the outcome incidence, sensitivity of classification, and effect size. Conclusions: The results of this simulation provide helpful quantitative information to guide interpretation of findings in which nondifferential outcome misclassification mechanisms are known to be operational with perfect specificity. Knowledge Transfer Statement: Measurement of periodontitis can suffer from classification errors, such as when partial-mouth protocols are applied. In this case, specificity is perfect and sensitivity is expected to be nondifferential, leading to an expectation for no bias when studying periodontitis etiologies. Despite expectation, differential misclassification could occur from sources of random error, the effects of which are unknown. Proper scrutiny of research findings can occur when the probability and impact of random classification errors are known.


Author(s):  
Jean-Michel Nguyen ◽  
Pascal Jézéquel ◽  
Pierre Gillois ◽  
Luisa Silva ◽  
Faouda Ben Azzouz ◽  
...  

Abstract Motivation The principle of Breiman's random forest (RF) is to build and assemble complementary classification trees in a way that maximizes their variability. We propose a new type of random forest that disobeys Breiman’s principles and involves building trees with no classification errors in very large quantities. We used a new type of decision tree that uses a neuron at each node as well as an in-innovative half Christmas tree structure. With these new RFs, we developed a score, based on a family of ten new statistical information criteria, called Nguyen information criteria (NICs), to evaluate the predictive qualities of features in three dimensions. Results The first NIC allowed the Akaike information criterion to be minimized more quickly than data obtained with the Gini index when the features were introduced in a logistic regression model. The selected features based on the NICScore showed a slight advantage compared to the support vector machines—recursive feature elimination (SVM-RFE) method. We demonstrate that the inclusion of artificial neurons in tree nodes allows a large number of classifiers in the same node to be taken into account simultaneously and results in perfect trees without classification errors. Availability and implementation The methods used to build the perfect trees in this article were implemented in the “ROP” R package, archived at https://cran.r-project.org/web/packages/ROP/index.html Supplementary information Supplementary data are available at Bioinformatics online.


Mathematics ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 156
Author(s):  
Darío Ramos-López ◽  
Ana D. Maldonado

Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered.


Author(s):  
Lisia Castro Krebs ◽  
Marina Monteiro de Moraes Santos ◽  
Maria Claudia Siqueira ◽  
Brennda Paula Gonçalves de Araujo ◽  
Leonardo Gomes Oliveira ◽  
...  

Abstract: The objective of this work was to distinguish the sexual dimorphism of horses of the Campolina breed, by morphometric measurements, and to classify them according to sex, using discriminating functions. Two-hundred and fifteen horses were measured, and 39 morphometric measurements were evaluated. The analysis of covariance and the discriminant analysis were performed. Males were taller and showed a wider chest, a greater scapular-humeral angle, and a larger neck, both in length and circumference. Females had a larger heart girth, wider hips, and a greater opening of the coxal-ground and femorotibial angles. Regarding classification, circumference measurements (85.58%) were more accurate in sexual differentiation than the linear (83.26%) and angular (73.02%) ones. As to classification error, of the total animals measured, 10 to 20% of the females were categorized as males. In addition, 11 to 38% of the males were categorized as females. It can be concluded that of the 39 morphometric measurements evaluated, 22 are responsible for sexual dimorphism in the Campolina horse breed. Circumference and linear measurements provide a more assertive classification to determine sexual dimorphism. Angular measurements show greater classification errors regarding the gender of the horses.


Sign in / Sign up

Export Citation Format

Share Document