scholarly journals General Elements of Genomic Selection and Statistical Learning

Author(s):  
Osval Antonio Montesinos López ◽  
Abelardo Montesinos López ◽  
Jose Crossa

AbstractNowadays, huge data quantities are collected and analyzed for delivering deep insights into biological processes and human behavior. This chapter assesses the use of big data for prediction and estimation through statistical machine learning and its applications in agriculture and genetics in general, and specifically, for genome-based prediction and selection. First, we point out the importance of data and how the use of data is reshaping our way of living. We also provide the key elements of genomic selection and its potential for plant improvement. In addition, we analyze elements of modeling with machine learning methods applied to genomic selection and stress their importance as a predictive methodology. Two cultures of model building are analyzed and discussed: prediction and inference; by understanding modeling building, researchers will be able to select the best model/method for each circumstance. Within this context, we explain the differences between nonparametric models (predictors are constructed according to information derived from data) and parametric models (all the predictors take predetermined forms with the response) as well their type of effects: fixed, random, and mixed. Basic elements of linear algebra are provided to facilitate understanding the contents of the book. This chapter also contains examples of the different types of data using supervised, unsupervised, and semi-supervised learning methods.

2018 ◽  
Vol 12 ◽  
pp. 117793221875929 ◽  
Author(s):  
Irene Sui Lan Zeng ◽  
Thomas Lumley

Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2344 ◽  
Author(s):  
Federico Pittino ◽  
Michael Puggl ◽  
Thomas Moldaschl ◽  
Christina Hirschl

Anomaly detection is becoming increasingly important to enhance reliability and resiliency in the Industry 4.0 framework. In this work, we investigate different methods for anomaly detection on in-production manufacturing machines taking into account their variability, both in operation and in wear conditions. We demonstrate how the nature of the available data, featuring any anomaly or not, is of importance for the algorithmic choice, discussing both statistical machine learning methods and control charts. We finally develop methods for automatic anomaly detection, which obtain a recall close to one on our data. Our developed methods are designed not to rely on a continuous recalibration and hand-tuning by the machine user, thereby allowing their deployment in an in-production environment robustly and efficiently.


Author(s):  
Umesh R. Rosyara ◽  
Kate Dreher ◽  
Bhoja R. Basnet ◽  
Susanne Dreisigacker

Abstract This chapter discusses the increased implications in the current breeding methodology of wheat, such as rapid evolution of new sequencing and genotyping technologies, automation of phenotyping, sequencing and genotyping methods and increased use of prediction and machine learning methods. Some of the strategies that will further transform wheat breeding in the next few years are also presented.


2013 ◽  
Vol 765-767 ◽  
pp. 1518-1523
Author(s):  
Fan Hui Meng ◽  
Qing Li Li

Data mining is the techniques of finding the potential law from the data by machine learning and statistical learning .This paper focuses on a number of problems existed in the currents ports training, discusses the application principle of the data mining technology in sports training, and applies the critical neural networks for forecasting the performances of the athletes .Experimental data show that prediction of athletic performance by the use of neural network has very good approximation ability. It shows a broad application space of the use of data mining technology.


Author(s):  
Joshua J. Levy ◽  
A. James O’Malley

AbstractBackgroundMachine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each.MethodsWe present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions.ResultsPreliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output.ConclusionsWhen a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.


Author(s):  
Livier Renteria-Gutierrez ◽  
Lluis A. Belanche-Muñoz ◽  
Felix F. Gonzalez-Navarro ◽  
Margarita Stoytcheva

2018 ◽  
Vol 10 (9) ◽  
pp. 1365 ◽  
Author(s):  
Jacinta Holloway ◽  
Kerrie Mengersen

Interest in statistical analysis of remote sensing data to produce measurements of environment, agriculture, and sustainable development is established and continues to increase, and this is leading to a growing interaction between the earth science and statistical domains. With this in mind, we reviewed the literature on statistical machine learning methods commonly applied to remote sensing data. We focus particularly on applications related to the United Nations World Bank Sustainable Development Goals, including agriculture (food security), forests (life on land), and water (water quality). We provide a review of useful statistical machine learning methods, how they work in a remote sensing context, and examples of their application to these types of data in the literature. Rather than prescribing particular methods for specific applications, we provide guidance, examples, and case studies from the literature for the remote sensing practitioner and applied statistician. In the supplementary material, we also describe the necessary steps pre and post analysis for remote sensing data; the pre-processing and evaluation steps.


2018 ◽  
Vol 11 (2) ◽  
pp. 170104 ◽  
Author(s):  
Juan Manuel González‐Camacho ◽  
Leonardo Ornella ◽  
Paulino Pérez‐Rodríguez ◽  
Daniel Gianola ◽  
Susanne Dreisigacker ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document