scholarly journals Benchmarking algorithms for genomic prediction of complex traits

2019 ◽  
Author(s):  
Christina B. Azodi ◽  
Andrew McCarren ◽  
Mark Roantree ◽  
Gustavo de los Campos ◽  
Shin-Han Shiu

AbstractThe usefulness of Genomic Prediction (GP) in crop and livestock breeding programs has led to efforts to develop new and improved GP approaches including non-linear algorithm, such as artificial neural networks (ANN) (i.e. deep learning) and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of GP datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and five non-linear algorithms, including ANNs. First, we found that hyperparameter selection was critical for all non-linear algorithms and that feature selection prior to model training was necessary for ANNs when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple GP algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits than that of linear algorithms. Although ANNs did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance near the level of other algorithms. These results, together with the fact that even small improvements in GP performance could accumulate into large genetic gains over the course of a breeding program, highlights the importance of algorithm selection for the prediction of trait values.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Richard Zuech ◽  
Taghi M. Khoshgoftaar

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.


2021 ◽  
Author(s):  
Rekha G ◽  
Krishna Reddy V ◽  
chandrashekar jatoth ◽  
Ugo Fiore

Abstract Class imbalance problems have attracted the research community but a few works have focused on feature selection with imbalanced datasets. To handle class imbalance problems, we developed a novel fitness function for feature selection using the chaotic salp swarm optimization algorithm, an efficient meta-heuristic optimization algorithm that has been successfully used in a wide range of optimization problems. This paper proposes an Adaboost algorithm with chaotic salp swarm optimization. The most discriminating features are selected using salp swarm optimization and Adaboost classifiers are thereafter trained on the features selected. Experiments show the ability of the proposed technique to find the optimal features with performance maximization of Adaboost.


2021 ◽  
Author(s):  
Agnieszka Konkolewska ◽  
Patrick Conaghan ◽  
Dan Milbourne ◽  
Michael Dineen ◽  
Susanne Barth ◽  
...  

2020 ◽  
Vol 36 (10) ◽  
pp. 3011-3017 ◽  
Author(s):  
Olga Mineeva ◽  
Mateo Rojas-Carulla ◽  
Ruth E Ley ◽  
Bernhard Schölkopf ◽  
Nicholas D Youngblut

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 2021 (10) ◽  
Author(s):  
Kang Zhou

Abstract We generalize the unifying relations for tree amplitudes to the 1-loop Feynman integrands. By employing the 1-loop CHY formula, we construct differential operators which transmute the 1-loop gravitational Feynman integrand to Feynman integrands for a wide range of theories, including Einstein-Yang-Mills theory, Einstein-Maxwell theory, pure Yang-Mills theory, Yang-Mills-scalar theory, Born-Infeld theory, Dirac-Born-Infeld theory, bi-adjoint scalar theory, non-linear sigma model, as well as special Galileon theory. The unified web at 1-loop level is established. Under the well known unitarity cut, the 1-loop level operators will factorize into two tree level operators. Such factorization is also discussed.


Author(s):  
Mikhail Sainov

Introduction. The main factor determining the stress-strain state (SSS) of rockfill dam with reinforced concrete faces is deformability of the dam body material, mostly rockfill. However, the deformation properties of rockfill have not been sufficiently studied yet for the time being due to technical complexity of the matter, Materials and methods. To determine the deformation parameters of rockfill, scientific and technical information on the results of rockfill laboratory tests in stabilometers were collected and analyzed, as well as field data on deformations in the existing rockfill dams. After that, the values of rockfill linear deformation modulus obtained in the laboratory and in the field were compared. The laboratory test results were processed and analyzed to determine the parameters of the non-linear rockfill deformation model. Results. Analyses of the field observation data demonstrates that the deformation of the rockfill in the existing dams varies in a wide range: its linear deformation modulus may vary from 30 to 500 МPа. It was found out that the results of the most rockfill tests conducted in the laboratory, as a rule, approximately correspond to the lower limit of the rockfill deformation modulus variation range in the bodies of the existing dams. This can be explained by the discrepancy in density and particle sizes of model and natural soils. Only recently, results of rockfill experimental tests were obtained which were comparable with the results of the field measurements. They demonstrate that depending on the stress state the rockfill linear deformation modulus may reach 700 МPа. The processing of the results of those experiments made it possible to determine the parameters on the non-linear model describing the deformation of rockfill in the dam body. Conclusions. The obtained data allows for enhancement of the validity of rockfill dams SSS analyses, as well as for studying of the impact of the non-linear character of the rockfill deformation on the SSS of reinforced concrete faces of rockfill dams.


Author(s):  
Satenik Harutyunyan ◽  
Davresh Hasanyan

A non-linear theoretical model including bending and longitudinal vibration effects was developed for predicting the magneto electric (ME) effects in a laminate bar composite structure consisting of magnetostrictive and piezoelectric multi-layers. If the magnitude of the applied field increases, the deflection rapidly increases and the difference between experimental results and linear predictions becomes large. However, the nonlinear predictions based on the present model well agree with the experimental results within a wide range of applied electric field. The results of the analysis are believed to be useful for materials selection and actuator structure design of actuator in actuator fabrication. It is shown that the problem for bars of symmetrical structure is not divided into a plane problem and a bending problem. A way of simplifying the solution of the problem is found by an asymptotic method. After solving the problem for a laminated bar, formula that enable one to change from one-dimensional required quantities to three dimensional quantities are obtained. The derived analytical expression for ME coefficients depend on vibration frequency and other geometrical and physical parameters of laminated composites. Parametric studies are presented to evaluate the influences of material properties and geometries on strain distribution and the ME coefficient. Analytical expressions indicate that the vibration frequency strongly influences the strain distribution in the laminates, and that these effects strongly influence the ME coefficients. It is shown that for certain values of vibration frequency (resonance frequency), the ME coefficient becomes infinity; as a particular case, low frequency ME coefficient were derived as well.


2017 ◽  
Vol 121 (1238) ◽  
pp. 553-575 ◽  
Author(s):  
T. Sakthivel ◽  
C. Venkatesan

ABSTRACTThe aim of the present study is to develop a relatively simple flight dynamic model which should have the ability to analyse trim, stability and response characteristics of a rotorcraft under various manoeuvring conditions. This study further addresses the influence of numerical aspects of perturbation step size in linearised model identification and integration timestep on non-linear model response. In addition, the effects of inflow models on the non-linear response are analysed. A new updated Drees inflow model is proposed in this study and the applicability of this model in rotorcraft flight dynamics is studied. It is noted that the updated Drees inflow model predicts the control response characteristics fairly close to control response characteristics obtained using dynamic inflow for a wide range of flight conditions such as hover, forward flight and recovery from steady level turn. A comparison is shown between flight test data, the control response obtained from the simple flight dynamic model, and the response obtained using a more detailed aeroelastic and flight dynamic model.


Author(s):  
Awder Mohammed Ahmed ◽  
◽  
Adnan Mohsin Abdulazeez ◽  

Multi-label classification addresses the issues that more than one class label assigns to each instance. Many real-world multi-label classification tasks are high-dimensional due to digital technologies, leading to reduced performance of traditional multi-label classifiers. Feature selection is a common and successful approach to tackling this problem by retaining relevant features and eliminating redundant ones to reduce dimensionality. There is several feature selection that is successfully applied in multi-label learning. Most of those features are wrapper methods that employ a multi-label classifier in their processes. They run a classifier in each step, which requires a high computational cost, and thus they suffer from scalability issues. Filter methods are introduced to evaluate the feature subsets using information-theoretic mechanisms instead of running classifiers to deal with this issue. Most of the existing researches and review papers dealing with feature selection in single-label data. While, recently multi-label classification has a wide range of real-world applications such as image classification, emotion analysis, text mining, and bioinformatics. Moreover, researchers have recently focused on applying swarm intelligence methods in selecting prominent features of multi-label data. To the best of our knowledge, there is no review paper that reviews swarm intelligence-based methods for multi-label feature selection. Thus, in this paper, we provide a comprehensive review of different swarm intelligence and evolutionary computing methods of feature selection presented for multi-label classification tasks. To this end, in this review, we have investigated most of the well-known and state-of-the-art methods and categorize them based on different perspectives. We then provided the main characteristics of the existing multi-label feature selection techniques and compared them analytically. We also introduce benchmarks, evaluation measures, and standard datasets to facilitate research in this field. Moreover, we performed some experiments to compare existing works, and at the end of this survey, some challenges, issues, and open problems of this field are introduced to be considered by researchers in the future.


Sign in / Sign up

Export Citation Format

Share Document