scholarly journals Human Induction in Machine Learning

2021 ◽  
Vol 54 (3) ◽  
pp. 1-18
Author(s):  
Petr Spelda ◽  
Vit Stritecky

As our epistemic ambitions grow, the common and scientific endeavours are becoming increasingly dependent on Machine Learning (ML). The field rests on a single experimental paradigm, which consists of splitting the available data into a training and testing set and using the latter to measure how well the trained ML model generalises to unseen samples. If the model reaches acceptable accuracy, then an a posteriori contract comes into effect between humans and the model, supposedly allowing its deployment to target environments. Yet the latter part of the contract depends on human inductive predictions or generalisations, which infer a uniformity between the trained ML model and the targets. The article asks how we justify the contract between human and machine learning. It is argued that the justification becomes a pressing issue when we use ML to reach “elsewhere” in space and time or deploy ML models in non-benign environments. The article argues that the only viable version of the contract can be based on optimality (instead of on reliability, which cannot be justified without circularity) and aligns this position with Schurz's optimality justification. It is shown that when dealing with inaccessible/unstable ground-truths (“elsewhere” and non-benign targets), the optimality justification undergoes a slight change, which should reflect critically on our epistemic ambitions. Therefore, the study of ML robustness should involve not only heuristics that lead to acceptable accuracies on testing sets. The justification of human inductive predictions or generalisations about the uniformity between ML models and targets should be included as well. Without it, the assumptions about inductive risk minimisation in ML are not addressed in full.

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Runzhi Zhang ◽  
Alejandro R. Walker ◽  
Susmita Datta

Abstract Background Composition of microbial communities can be location-specific, and the different abundance of taxon within location could help us to unravel city-specific signature and predict the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across 16 cities around the world and samples from another 8 cities were provided as the main and mystery datasets respectively as the part of the CAMDA 2019 MetaSUB “Forensic Challenge”. The feature selecting, normalization, three methods of machine learning, PCoA (Principal Coordinates Analysis) and ANCOM (Analysis of composition of microbiomes) were conducted for both the main and mystery datasets. Results Features selecting, combined with the machines learning methods, revealed that the combination of the common features was effective for predicting the origin of the samples. The average error rates of 11.93 and 30.37% of three machine learning methods were obtained for main and mystery datasets respectively. Using the samples from main dataset to predict the labels of samples from mystery dataset, nearly 89.98% of the test samples could be correctly labeled as “mystery” samples. PCoA showed that nearly 60% of the total variability of the data could be explained by the first two PCoA axes. Although many cities overlapped, the separation of some cities was found in PCoA. The results of ANCOM, combined with importance score from the Random Forest, indicated that the common “family”, “order” of the main-dataset and the common “order” of the mystery dataset provided the most efficient information for prediction respectively. Conclusions The results of the classification suggested that the composition of the microbiomes was distinctive across the cities, which could be used to identify the sample origins. This was also supported by the results from ANCOM and importance score from the RF. In addition, the accuracy of the prediction could be improved by more samples and better sequencing depth.


Materials ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 643
Author(s):  
Paul Meißner ◽  
Jens Winter ◽  
Thomas Vietor

A neural network (NN)-based method is presented in this paper which allows the identification of parameters for material cards used in Finite Element simulations. Contrary to the conventionally used computationally intensive material parameter identification (MPI) by numerical optimization with internal or commercial software, a machine learning (ML)-based method is time saving when used repeatedly. Within this article, a self-developed ML-based Python framework is presented, which offers advantages, especially in the development of structural components in early development phases. In this procedure, different machine learning methods are used and adapted to the specific MPI problem considered herein. Using the developed NN-based and the common optimization-based method with LS-OPT, the material parameters of the LS-DYNA material card MAT_187_SAMP-1 and the failure model GISSMO were exemplarily calibrated for a virtually generated test dataset. Parameters for the description of elasticity, plasticity, tension–compression asymmetry, variable plastic Poisson’s ratio (VPPR), strain rate dependency and failure were taken into account. The focus of this paper is on performing a comparative study of the two different MPI methods with varying settings (algorithms, hyperparameters, etc.). Furthermore, the applicability of the NN-based procedure for the specific usage of both material cards was investigated. The studies reveal the general applicability for the calibration of a complex material card by the example of the used MAT_187_SAMP-1.


2020 ◽  
Author(s):  
Runzhi Zhang ◽  
Alejandro R. Walker ◽  
Susmita Datta

Abstract BackgroundComposition of microbial communities can be location specific, and the different abundance of taxon within location could help us to unravel city-specific signature and predict the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across 16 cities around the world and samples from another 8 cities were provided as the main and mystery datasets respectively as the part of the CAMDA 2019 MetaSUB “Forensic Challenge”. The feature selection, normalization, three methods of machine learning, PCoA (Principal Coordinates Analysis) and ANCOM (Analysis of composition of microbiomes) were conducted for both the main and mystery datasets.ResultsFeature selection, combined with the machines learning methods, revealed that the combination of the common features was effective for predicting the origin of the samples. The average error rates of 11.6% and 30.0% of three machine learning methods were obtained for main and mystery datasets respectively. Using the samples from main dataset to predict the labels of samples from mystery dataset, nearly 89.98% of the test samples could be correctly labeled as “mystery” samples. PCoA showed that nearly 60% of the total variability of the data could be explained by the first two PCoA axes. Although many cities overlapped, the separation of some cities was found in PCoA. The results of ANCOM, combined with importance score from the Random Forest, indicated that the common “family”, “order” of the main-dataset and the common “order” of the mystery dataset provided the most efficient information for prediction respectively.ConclusionsThe results of the classification suggested that the composition of the microbiomes was distinctive across the cities, which was also supported by the results from ANCOM and importance score from the RF. The analysis utilized in this study can be of great help in field of forensic science to efficiently predict the origin of the samples. And the accurate of the prediction could be improved by more samples and better sequencing depth.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Brian Ayers ◽  
Toumas Sandhold ◽  
Igor Gosev ◽  
Sunil Prasad ◽  
Arman Kilic

Introduction: Prior risk models for predicting survival after orthotopic heart transplantation (OHT) have displayed only modest discriminatory capability. With increasing interest in the application of machine learning (ML) to predictive analytics in clinical medicine, this study aimed to evaluate whether modern ML techniques could improve risk prediction in OHT. Methods: Data from the United Network for Organ Sharing registry was collected for all adult patients that underwent OHT from 2000 through 2019. The primary outcome was one-year post-transplant mortality. Dimensionality reduction and data re-sampling were employed during training. The final ensemble model was created from 100 different models of each algorithm: deep neural network, logistic regression, adaboost, and random forest. Discriminatory capability was assessed using area under receiver-operating-characteristic curve (AUROC), net reclassification index (NRI), and decision curve analysis (DCA). Results: Of the 33,657 study patients, 26,926 (80%) were randomly selected for the training set and 6,731 (20%) as a separate testing set. One-year mortality was balanced between cohorts (11.0% vs 11.3%). The optimal model performance was a final ensemble ML model. This model demonstrated an improved AUROC of 0.764 (95% CI, 0.745-0.782) in the testing set as compared to the other models (Figure). Additionally, the final model demonstrated an improvement of 72.9% ±3.8% (p<0.001) in predictive performance as assessed by NRI compared to logistic regression. The DCA showed the final ensemble method improved risk prediction across the entire spectrum of predicted risk as compared to all other models (p<0.001). Conclusions: An ensemble ML model was able to achieve greater predictive performance as compared to individual ML models as well as logistic regression for predicting survival after OHT. This analysis demonstrates the promise of ML techniques in risk prediction in OHT.


Author(s):  
Paul K. Moser

A prominent term in theory of knowledge since the seventeenth century, ‘a posteriori’ signifies a kind of knowledge or justification that depends on evidence, or warrant, from sensory experience. A posteriori truth is truth that cannot be known or justified independently of evidence from sensory experience, and a posteriori concepts are concepts that cannot be understood independently of reference to sensory experience. A posteriori knowledge contrasts with a priori knowledge, knowledge that does not require evidence from sensory experience. A posteriori knowledge is empirical, experience-based knowledge, whereas a priori knowledge is non-empirical knowledge. Standard examples of a posteriori truths are the truths of ordinary perceptual experience and the natural sciences; standard examples of a priori truths are the truths of logic and mathematics. The common understanding of the distinction between a posteriori and a priori knowledge as the distinction between empirical and non-empirical knowledge comes from Kant’s Critique of Pure Reason (1781/1787).


Cancers ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 328 ◽  
Author(s):  
Patrizia Ferroni ◽  
Fabio Zanzotto ◽  
Silvia Riondino ◽  
Noemi Scarpato ◽  
Fiorella Guadagni ◽  
...  

Machine learning (ML) has been recently introduced to develop prognostic classification models that can be used to predict outcomes in individual cancer patients. Here, we report the significance of an ML-based decision support system (DSS), combined with random optimization (RO), to extract prognostic information from routinely collected demographic, clinical and biochemical data of breast cancer (BC) patients. A DSS model was developed in a training set (n = 318), whose performance analysis in the testing set (n = 136) resulted in a C-index for progression-free survival of 0.84, with an accuracy of 86%. Furthermore, the model was capable of stratifying the testing set into two groups of patients with low- or high-risk of progression with a hazard ratio (HR) of 10.9 (p < 0.0001). Validation in multicenter prospective studies and appropriate management of privacy issues in relation to digital electronic health records (EHR) data are presently needed. Nonetheless, we may conclude that the implementation of ML algorithms and RO models into EHR data might help to achieve prognostic information, and has the potential to revolutionize the practice of personalized medicine.


2016 ◽  
Vol 130 (2) ◽  
pp. 146 ◽  
Author(s):  
William D. Halliday

Diet is an important aspect of the natural history of all animals, but diet can vary through space and time because of variations in prey availability. The diet of the Common Gartersnake (Thamnophis sirtalis) consists mainly of earthworms and frogs, but other prey items might be important when they are locally abundant. I report an observation of a female Eastern Gartersnake (Thamnophis sirtalis sirtalis) regurgitating 2 nestling birds in Ottawa, Ontario, Canada. Birds are seldom present in the diet of the Common Gartersnake. This rare food choice highlights the opportunistic nature of foraging by adult Common Gartersnakes and, further, demonstrates that diet depends not only on prey preference, but also on prey availability.


2020 ◽  
Author(s):  
Arnaud Adam ◽  
Isabelle Thomas

&lt;p&gt;Transport geography has always been characterized by a lack of accurate data, leading to surveys often based on samples that are spatially not representative. However, the current deluge of data collected through sensors promises to overpass this scarcity of data. We here consider one example: since April 1&lt;sup&gt;st&lt;/sup&gt; 2016, a GPS tracker is mandatory within each truck circulating in Belgium for kilometre taxes. Every 30 seconds, this tracker collects the position of the truck (as well as some other information such as speed or direction), leading to an individual taxation of trucks. This contribution uses a one-week exhaustive database containing the totality of trucks circulating in Belgium, in order to understand transport fluxes within the country, as well as the spatial effects of the taxation on the circulation of trucks.&lt;/p&gt;&lt;p&gt;Machine learning techniques are applied on over 270 million of GPS points to detect stops of trucks, leading to transform GPS sequences into a complete Origin-Destination matrix. Using machine learning allows to accurately classify stops that are different in nature (leisure stop, (un-)loading areas, or congested roads). Based on this matrix, we firstly propose an overview of the daily traffic, as well as an evaluation of the number of stops made in every Belgian place. Secondly, GPS sequences and stops are combined, leading to characterise sub-trajectories of each truck (first/last miles and transit) by their fiscal debit. This individual characterisation, as well as its variation in space and time, are here discussed: is the individual taxation system always efficient in space and time?&lt;/p&gt;&lt;p&gt;This contribution helps to better understand the circulation of trucks in Belgium, the places where they stopped, as well as the importance of their locations in a fiscal point of view. What are the potential modifications of the trucks routes that would lead to a more sustainable kilometre taxation? This contribution illustrates that combining big-data and machine learning open new roads for accurately measuring and modelling transportation.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document