scholarly journals Combining Predictors

2000 ◽  
Vol 29 (550) ◽  
Author(s):  
Jakob Vogdrup Hansen

The most important theoretical tool in connection with machine learning is the bias/variance decomposition of error functions. Together with Tom Heskes, I have found the family of error functions with a natural bias/variance decomposition that has target independent variance. It is shown that no other group of error functions can be decomposed in the same way. An open problem in the machine learning community is thereby solved. The error functions are derived from the deviance measure on distributions in the one-parameter exponential family. It is therefore called the deviance error family.<br /> <br /> A bias/variance decomposition can also be viewed as an ambiguity decomposition for an ensemble method. The family of error functions with a natural bias/variance decomposition that has target independent variance can therefore be of use in connection with ensemble methods.<br /> <br /> The logarithmic opinion pool ensemble method has been developed together with Anders Krogh. It is based on the logarithmic opinion pool ambiguity decomposition using the Kullback-Leibler error function. It has been extended to the cross-validation logarithmic opinion pool ensemble method. The advantage of the cross-validation logarithmic opinion pool ensemble method is that it can use unlabeled data to estimate the generalization error, while it still uses the entire labeled example set for training.<br /> <br /> The cross-validation logarithmic opinion pool ensemble method is easily reformulated for another error function, as long as the error function has an ambiguity decomposition with target independent ambiguity. It is therefore possible to use the cross-validation ensemble method on all error functions in the deviance error family.

2020 ◽  
Vol 70 (3) ◽  
pp. 599-604
Author(s):  
Şahsene Altinkaya

AbstractIn this present investigation, we will concern with the family of normalized analytic error function which is defined by$$\begin{array}{} \displaystyle E_{r}f(z)=\frac{\sqrt{\pi z}}{2}\text{er} f(\sqrt{z})=z+\overset{\infty }{\underset {n=2}{\sum }}\frac{(-1)^{n-1}}{(2n-1)(n-1)!}z^{n}. \end{array}$$By making the use of the trigonometric polynomials Un(p, q, eiθ) as well as the rule of subordination, we introduce several new classes that consist of 𝔮-starlike and 𝔮-convex error functions. Afterwards, we derive some coefficient inequalities for functions in these classes.


2021 ◽  
Author(s):  
Elisabeth Pfaehler ◽  
Daniela Euba ◽  
Andreas Rinscheid ◽  
Otto S. Hoekstra ◽  
Josee Zijlstra ◽  
...  

Abstract Background Machine learning studies require a large number of images often obtained on different PET scanners. When merging these images, the use of harmonized images following EARL-standards is essential. However, when including retrospective images, EARL accreditation might not have been in place. The aim of this study was to develop a convolutional neural network (CNN) that can identify retrospectively if an image is EARL compliant and if it is meeting older or newer EARL-standards. Materials and Methods 96 PET images acquired on three PET/CT systems were included in the study. All images were reconstructed with the locally clinically preferred, EARL1, and EARL2 compliant reconstruction protocols. After image pre-processing, one CNN was trained to separate clinical and EARL compliant reconstructions. A second CNN was optimized to identify EARL1 and EARL2 compliant images. The accuracy of both CNNs was assessed using 5-fold cross validation. The CNNs were validated on 24 images acquired on a PET scanner not included in the training data. To assess the impact of image noise on the CNN decision, the 24 images were reconstructed with different scan durations. Results In the cross-validation, the first CNN classified all images correctly. When identifying EARL1 and EARL2 compliant images, the second CNN identified 100% EARL1 compliant and 85% EARL2 compliant images correctly. The accuracy in the independent dataset was comparable to the cross-validation accuracy. The scan duration had almost no impact on the results. Conclusion The two CNNs trained in this study can be used to retrospectively include images in a multi-center setting by e.g. adding additional smoothing. This method is especially important for machine learning studies where the harmonization of images from different PET systems is essential.


Author(s):  
Rahayu Abdul Rahman ◽  
Suraya Masrom ◽  
Normah Omar ◽  
Maheran Zakaria

Corporate tax avoidance reduces government revenues which could limit country development plans. Thus, the main objectives of this study is to establish a rigorous and effective model to detect corporate tax avoidance to assist government to prevent such practice. This paper presents the fundamental knowledge on the design and implementation of machine learning model based on five selected algorithms tested on the real dataset of 3,365 Malaysian companies listed on bursa Malaysia from 2005 to 2015. The performance of each machine learning algorithms on the tested dataset has been observed based on two approaches of training. The accuracy score for each algorithm is better with the cross-validation training approach. Additionationally, with the cross-validation training approach, the performances of each machine learning algorithm were tested on different group of features selection namely industry, governance, year and firm characteristics. The findings indicated that the machine learning models present better reliability with industry, governance and firm characteristics features rather than single year determinant mainly with the Random Forest and Logistic Regression algorithms.


2021 ◽  
Author(s):  
Elisabeth Pfaehler ◽  
Daniela Euba ◽  
Andreas Rinscheid ◽  
Otto S. Hoekstra ◽  
Josee Zijlstra ◽  
...  

Abstract Background: Machine learning studies require a large number of images often obtained on different PET scanners. When merging these images, the use of harmonized images following EARL-standards is essential. However, when including retrospective images, EARL accreditation might not have been in place. The aim of this study was to develop a convolutional neural network (CNN) that can identify retrospectively if an image is EARL compliant and if it is meeting older or newer EARL-standards. Materials and Methods: 96 PET images acquired on three PET/CT systems were included in the study. All images were reconstructed with the locally clinically preferred, EARL1, and EARL2 compliant reconstruction protocols. After image pre-processing, one CNN was trained to separate clinical and EARL compliant reconstructions. A second CNN was optimized to identify EARL1 and EARL2 compliant images. The accuracy of both CNNs was assessed using 5-fold cross validation. The CNNs were validated on 24 images acquired on a PET scanner not included in the training data. To assess the impact of image noise on the CNN decision, the 24 images were reconstructed with different scan durations.Results: In the cross-validation, the first CNN classified all images correctly. When identifying EARL1 and EARL2 compliant images, the second CNN identified 100% EARL1 compliant and 85% EARL2 compliant images correctly. The accuracy in the independent dataset was comparable to the cross-validation accuracy. The scan duration had almost no impact on the results. Conclusion: The two CNNs trained in this study can be used to retrospectively include images in a multi-center setting by e.g. adding additional smoothing. This method is especially important for machine learning studies where the harmonization of images from different PET systems is essential.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 2131
Author(s):  
Alicja Satoła ◽  
Edyta Agnieszka Bauer

The diagnosis of subclinical ketosis in dairy cows based on blood ketone bodies is a challenging and costly procedure. Scientists are searching for tools based on results of milk performance assessment that would allow monitoring the risk of subclinical ketosis. The objective of the study was (1) to design a scoring system that would allow choosing the best machine learning models for the identification of cows-at-risk of subclinical ketosis, (2) to select the best performing models, and (3) to validate them using a testing dataset containing unseen data. The scoring system was developed using two machine learning modeling pipelines, one for regression and one for classification. As part of the system, different feature selections, outlier detection, data scaling and oversampling methods were used. Various linear and non-linear models were fit using training datasets and evaluated on holdout, testing the datasets. For the assessment of suitability of individual models for predicting subclinical ketosis, three β-hydroxybutyrate concentration in blood (bBHB) thresholds were defined: 1.0, 1.2 and 1.4 mmol/L. Considering the thresholds of 1.2 and 1.4, the logistic regression model was found to be the best fitted model, which included independent variables such as fat-to-protein ratio, acetone and β-hydroxybutyrate concentrations in milk, lactose percentage, lactation number and days in milk. In the cross-validation, this model showed an average sensitivity of 0.74 or 0.75 and specificity of 0.76 or 0.78, at the pre-defined bBHB threshold 1.2 or 1.4 mmol/L, respectively. The values of these metrics were also similar in the external validation on the testing dataset (0.72 or 0.74 for sensitivity and 0.80 or 0.81 for specificity). For the bBHB threshold at 1.0 mmol/L, the best classification model was the model based on the SVC (Support Vector Classification) machine learning method, for which the sensitivity in the cross-validation was 0.74 and the specificity was 0.73. These metrics had lower values for the testing dataset (0.57 and 0.72 respectively). Regression models were characterized by poor fitness to data (R2 < 0.4). The study results suggest that the prediction of subclinical ketosis based on data from test-day records using classification methods and machine learning algorithms can be a useful tool for monitoring the incidence of this metabolic disorder in dairy cattle herds.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 383-383
Author(s):  
Leonardo Augusto Coelho Ribeiro ◽  
Tiago Bresolin ◽  
Guilherme J M Rosa ◽  
Daniel Rume Casagrande ◽  
Marina De Arruda Camargo Danes ◽  
...  

Abstract Wearable sensors have been adopted as an alternative for real-time monitoring of cattle feeding behavior in grazing systems. However, even using machine learning (ML) techniques confounding effects such as cross-validation strategy may inflate the prediction quality. Our objective was to evaluate the effect of different cross-validation strategies on the prediction of grazing activities in cattle using wearable sensor data and ML algorithms. Six Nellore bulls (345 ± 21 kg) had their behavior visually classified as grazing or not-grazing for a period of 15 days. Generalized Linear Model (GLM), Random Forest (RF), and Artificial Neural Network (ANN) were employed to predict behavior (grazing or not-grazing) using 3-axis accelerometer data. For each analytical method, three cross-validation strategies were evaluated: holdout, leave-one-animal-out (LOAO), and leave-one-day-out (LODO). Algorithms were trained using similar dataset sizes (holdout: n = 57,862; LOAO: n = 56,786; LODO: n = 56,672). Regardless of the cross-validation strategy, GLM achieved the worst prediction accuracy (53%) compared to the ML techniques (65% for both RF and ANN). ANN performed slightly better than RF for LOAO (73%) and LODO (64%) cross-validation strategies. The holdout yielded the highest accuracy values for all three ML approaches (GLM: 59%, RF: 76%, and ANN: 74%), followed by LODO (58%) and LOAO (55%). In conclusion, the GLM approach was not adequate to predict grazing behavior, regardless of the cross-validation strategy. The greater prediction accuracy observed for holdout cross-validation may simply indicate a lack of data independence and the presence of carry-over effects from animals and grazing management. Our results suggest that generalizing predictive models to unknown (not used for training) animals or grazing management may incur in poor prediction quality. The results highlight the need for using biological knowledge to define the validation strategy that is closer to the real-life situation.


Author(s):  
Turan G. Bali ◽  
Amit Goyal ◽  
Dashan Huang ◽  
Fuwei Jiang ◽  
Quan Wen

Sign in / Sign up

Export Citation Format

Share Document