variable screening
Recently Published Documents


TOTAL DOCUMENTS

109
(FIVE YEARS 37)

H-INDEX

15
(FIVE YEARS 2)

2022 ◽  
Vol 16 (1) ◽  
Author(s):  
Baoying Yang ◽  
Wenbo Wu ◽  
Xiangrong Yin

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Sunwoo Han ◽  
Brian D. Williamson ◽  
Youyi Fong

Abstract Background While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases—a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive. Methods Using an immunologic marker dataset from a phase III HIV vaccine efficacy trial, we seek to optimize random forest prediction performance using combinations of variable screening, class balancing, weighting, and hyperparameter tuning. Results Our experiments show that while class balancing helps improve random forest prediction performance when variable screening is not applied, class balancing has a negative impact on performance in the presence of variable screening. The impact of the weighting similarly depends on whether variable screening is applied. Hyperparameter tuning is ineffective in situations with small sample sizes. We further show that random forests under-perform generalized linear models for some subsets of markers, and prediction performance on this dataset can be improved by stacking random forests and generalized linear models trained on different subsets of predictors, and that the extent of improvement depends critically on the dissimilarities between candidate learner predictions. Conclusion In small datasets from two-phase sampling design, variable screening and inverse sampling probability weighting are important for achieving good prediction performance of random forests. In addition, stacking random forests and simple linear models can offer improvements over random forests.


2021 ◽  
pp. 1-21
Author(s):  
Stephen R. Kodish ◽  
Ben G.S. Allen ◽  
Halidou Salou ◽  
Teresa R. Schwendler ◽  
Sheila Isanaka

Abstract Objective: The Three Delays Model is a conceptual model traditionally used to understand contributing factors of maternal mortality. It posits that most barriers to health services utilization occur in relation to one of three delays: Delay 1: delayed decision to seek care; Delay 2: delayed arrival at health facility; Delay 3: delayed provision of adequate care. We applied this model to understand why a community-based management of acute malnutrition (CMAM) services may have low coverage. Design: We conducted a Semi-Quantitative Evaluation of Access and Coverage (SQUEAC) over three phases using mixed methods to estimate program coverage and barriers to care. In this manuscript, we present findings from 51 semi-structured interviews with caregivers and program staff, as well as 72 structured interviews among caregivers only. Recurring themes were organized and interpreted using the Three Delays Model. Setting: Madaoua, Niger Participants: 123 caregivers and CMAM program staff Results: Overall, 11 barriers to CMAM services were identified in this setting. Five barriers contribute to Delay 1, including lack of knowledge around malnutrition and CMAM services, as well as limited family support, variable screening services, and alternative treatment options. High travel costs, far distances, poor roads, and competing demands were challenges associated with accessing care (Delay 2). Finally, upon arrival to health facilities, differential caregiver experiences around quality of care contributed to Delay 3. Conclusions: The Three Delays Model was a useful model to conceptualize the factors associated with CMAM uptake in this context, enabling implementing agencies to address specific barriers through targeted activities.


2021 ◽  
Author(s):  
Ke Wang ◽  
Zhu-yun Yan ◽  
Yun-tong Ma ◽  
Bo Li ◽  
Wei Wang ◽  
...  

Abstract Background: Enzyme activities play a very important role in metabolism. Carbon (C) and nitrogen (N) are the two most basic elements for plant growth and development, and their mutual coupling makes C:N become an important index to explore plant element allocation and adaptation strategies. Although the key enzymes activity in carbon and nitrogen metabolism, and defense enzymes are often used to indexes of the physiological and biochemical characteristics of plants, the relationship between them and biomass still lacks understanding. In this paper, under the control condition, the biomass and 18 kinds of physiological and biochemical indexes were obtained through 24 groups experiments of the regenerated seedlings of Salvia miltiorrhiza by 9 endophytic fungi strains grafted. Results: The data were analyzed by descriptive statistical analysis, Lasso variable screening analysis and MLP neural network regression analysis. Results show that many physiological and biochemical indexes are related to biomass, and glutamine synthetase ( GS ),glutamate synthase ( GLS ), glutamate dehydroge nase ( GDH ), peroxidases (POD), catalase (CAT), soluble protein are the key factors which affect the biomass synthesis of Salvia miltiorrhiza . Conclusion: In this paper, it discusses the relationship between physiological and biochemical indexes and biomass in a comprehensive and systematic way by the framework of "Build-Design-Calculate-Test". Through rigorous logical reasoning process, the factors affecting the growth of Salvia miltiorrhiza are selected, and the mathematical model is established. It also provides a powerful tool for the comprehensive and systematic study of plant growth and the synthesis of effective components.


Author(s):  
Zhennan Liu ◽  
Qiongfang Li ◽  
Jingnan Zhou ◽  
Weiguo Jiao ◽  
Xiaoyu Wang

2021 ◽  
pp. 096228022110172
Author(s):  
Abhik Ghosh ◽  
Magne Thoresen

Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre-screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the Sure Independence Screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. Finally, we illustrate its use in a study on regulation of lipid metabolism.


Sign in / Sign up

Export Citation Format

Share Document