variable selection
Recently Published Documents


TOTAL DOCUMENTS

3915
(FIVE YEARS 1036)

H-INDEX

102
(FIVE YEARS 11)

2022 ◽  
Vol 10 (1) ◽  
pp. 65-72
Author(s):  
Hongxin Su ◽  
Chenchen Zhou ◽  
Yi Cao ◽  
Shuang-Hua Yang ◽  
Zuzhen Ji

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Katrin Madjar ◽  
Manuela Zucknick ◽  
Katja Ickstadt ◽  
Jörg Rahnenführer

Global Heart ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. 1
Author(s):  
Mohiul I. Chowdhury ◽  
Karam Turk-Adawi ◽  
Abraham Samuel Babu ◽  
Gabriela Lime De Melo Ghisi ◽  
Pamela Seron ◽  
...  

2022 ◽  
Vol 12 ◽  
Author(s):  
Neda Gilani ◽  
Reza Arabi Belaghi ◽  
Younes Aftabi ◽  
Elnaz Faramarzi ◽  
Tuba Edgünlü ◽  
...  

Aim: This study aimed to accurately identification of potential miRNAs for gastric cancer (GC) diagnosis at the early stages of the disease.Methods: We used GSE106817 data with 2,566 miRNAs to train the machine learning models. We used the Boruta machine learning variable selection approach to identify the strong miRNAs associated with GC in the training sample. We then validated the prediction models in the independent sample GSE113486 data. Finally, an ontological analysis was done on identified miRNAs to eliciting the relevant relationships.Results: Of those 2,874 patients in the training the model, there were 115 (4%) patients with GC. Boruta identified 30 miRNAs as potential biomarkers for GC diagnosis and hsa-miR-1343-3p was at the highest ranking. All of the machine learning algorithms showed that using hsa-miR-1343-3p as a biomarker, GC can be predicted with very high precision (AUC; 100%, sensitivity; 100%, specificity; 100% ROC; 100%, Kappa; 100) using with the cut-off point of 8.2 for hsa-miR-1343-3p. Also, ontological analysis of 30 identified miRNAs approved their strong relationship with cancer associated genes and molecular events.Conclusion: The hsa-miR-1343-3p could be introduced as a valuable target for studies on the GC diagnosis using reliable biomarkers.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Sutton ◽  
Pierre-Emmanuel Sugier ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. Methods We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods. Results Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. Conclusion We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.


2022 ◽  
Vol 9 ◽  
Author(s):  
Huilong Lin ◽  
Yuting Zhao

The source park of the Yellow River (SPYR), as a vital ecological shelter on the Qinghai-Tibetan Plateau, is suffering different degrees of degradation and desertification, resulting in soil erosion in recent decades. Therefore, studying the mechanism, influencing factors and current situation of soil erosion in the alpine grassland ecosystems of the SPYR are significant for protecting the ecological and productive functions. Based on the 137Cs element tracing technique and machine learning algorithms, five strategic variable selection algorithms based on machine learning algorithms are used to identify the minimal optimal set and analyze the main factors that influence soil erosion in the SPYR. The optimal model for estimating soil erosion in the SPYR is obtained by comparisons model outputs between the RUSLE and machine learning algorithms combined with variable selection models. We identify the spatial distribution pattern of soil erosion in the study area by the optimal model. The results indicated that: (1) A comprehensive set of variables is more objective than the RUSLE model. In terms of verification accuracy, the simulated annealing -Cubist model (R = 0.67, RMSD = 1,368 t km–2⋅a–1) simulation results represents the best while the RUSLE model (R = 0.49, RMSD = 1,769 t⋅km–2⋅a–1) goes on the worst. (2) The soil erosion is more severe in the north than the southeast of the SPYR. The average erosion modulus is 6,460.95 t⋅km–2⋅a–1 and roughly 99% of the survey region has an intensive erosion modulus (5,000–8,000 t⋅km–2⋅a–1). (3) Total erosion loss is relatively 8.45⋅108 t⋅a–1 in the SPYR, which is commonly 12.64 times greater than the allowable soil erosion loss. The economic monetization of SOC loss caused by soil erosion in the entire research area was almost $47.90 billion in 2014. These results will help provide scientific evidences not only for farmers and herdsmen but also for environmental science managers and administrators. In addition, a new ecological policy recommendation was proposed to balance grassland protection and animal husbandry economic production based on the value of soil erosion reclassification.


Forests ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 62
Author(s):  
Ying Li ◽  
Guozhong Wang ◽  
Gensheng Guo ◽  
Yaoxiang Li ◽  
Brian K. Via ◽  
...  

Wood density is a key indicator for tree functionality and end utilization. Appropriate chemometric methods play an important role in the successful prediction of wood density by visible and near infrared (Vis-NIR) spectroscopy. The objective of this study was to select appropriate pre-processing, variable selection and multivariate calibration techniques to improve the prediction accuracy of density in Chinese white poplar (Populus tomentosa carriere) wood. The Vis-NIR spectra were de-noised using four methods (lifting wavelet transform, LWT; wavelet transform, WT; multiplicative scatter correction, MSC; and standard normal variate, SNV), and four variable selection techniques, including successive projections algorithm (SPA), uninformative variables elimination (UVE), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV), were compared to simplify the dimension of the high-dimensional spectral matrix. The non-linear models of generalized regression neural network (GRNN) and support vector machine (SVM) were performed using these selected variables. The results showed that the best prediction was obtained by GRNN models combined with the LWT and CARS method for Chinese white poplar wood density (Rp2 = 0.870; RMSEP = 13 Kg/m3; RPDp = 2.774).


Sign in / Sign up

Export Citation Format

Share Document