Kernel methods for regression model based on variable selection

Author(s):  
Sei ichi Ikeda ◽  
Yoshiharu Sato
Author(s):  
Alain J Mbebi ◽  
Hao Tong ◽  
Zoran Nikoloski

AbstractMotivationGenomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP).ResultsHere, we propose a L2,1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L2,1-joint, applicable in multi-trait GS. The usage of the L2,1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with datasets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model.Availability and implementation: The model is implemented using R programming language and the code is freely available from https://github.com/alainmbebi/L21-norm-GS.Supplementary informationSupplementary data are available at Bioinformatics online.


Author(s):  
Chenyang Song ◽  
Liguo Wang ◽  
Zeshui Xu

The logistic regression model is one of the most widely used classification models. In some practical situations, few samples and massive uncertain information bring more challenges to the application of the traditional logistic regression. This paper takes advantages of the hesitant fuzzy set (HFS) in depicting uncertain information and develops the logistic regression model under hesitant fuzzy environment. Considering the complexity and uncertainty in the application of this logistic regression, the concept of hesitant fuzzy information flow (HFIF) and the correlation coefficient between HFSs are introduced to determine the main factors. In order to better manage situations with small samples, a new optimized method based on the maximum entropy estimation is also proposed to determine the parameters. Then the Levenberg–Marquardt Algorithm (LMA) under hesitant fuzzy environment is developed to solve the parameter estimation problem with fewer samples and uncertain information in the logistic regression model. A specific implementation process for the optimized logistic regression model based on the maximum entropy estimation under the hesitant fuzzy environment is also provided. Moreover, we apply the proposed model to the prediction problem of Emergency Extreme Air Pollution Event (EEAPE). A comparative analysis and a sensitivity analysis are further conducted to illustrate the advantages of the optimized logistic regression model under hesitant fuzzy environment.


Sign in / Sign up

Export Citation Format

Share Document