scholarly journals High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors

Author(s):  
Ziqian Zhuang ◽  
Wei Xu ◽  
Rahi Jain

Introduction: High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors. Methods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC. Results: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods. Conclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.

2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 313-313
Author(s):  
Brianne Olivieri-Mui ◽  
Sandra Shi ◽  
Ellen McCarthy ◽  
Dae Kim

Abstract Frailty may differentially impact how older adult males and females perceive sexual functioning, an important part of well-being. We assessed the level of frailty (robust, pre-frail, frail) for anyone with data on 11 sexual functioning questions asked in wave 2 of the National Social Life, Health, and Aging Project, 2010-2011 (n=2060). Questions covered five domains: overall sexual function (OSF), sexual function anxiety (SFA), changes in sexual function (CSF), erectile/vaginal dysfunction (EVD), and masturbation. Logistic regression identified sex differences in frailty and reporting worse sexual functioning. Linear regression predicted the number of domains reported as worse. Among males (n=1057), pre-frailty meant higher odds of reporting SFA (OR 1.8 95%CI 1.2-6.6), CSF (OR 1.7 95%CI 1.1-2.7), and EVD (OR 1.5 95%CI 1.0-2.2). Among females (n=1003), there was no difference in reporting by frailty. Females were more likely to report worse OSF (Robust: OR 7.4, 95%CI 4.8-11.4; Pre-frail: OR 6.2, 95%CI 3.9-9.9; Frail: OR 3.4 95%CI 1.7-6.6), but less likely to report SFA (Robust OR .3, 95%CI .2-.5; Pre-frail OR .2, 95%CI .1-.3; Frail OR .2 95%CI .1-.3). Pre-frail and frail females reported fewer domains as worse (Pre-frail coefficient -0.21 SE 0.09, Frail -0.43 SE 0.14). As frailty worsened, males reported more domains as worse (Pre-frail 0.24 SE 0.07, Frail 0.29 SE 0.08). Self-reported sexual functioning differs by sex at all levels of frailty, and reporting by males, but not females, changes with frailty. Providers should be aware that sexual functioning is of importance to both sexes despite varying degrees of frailty.


Stats ◽  
2021 ◽  
Vol 4 (3) ◽  
pp. 665-681
Author(s):  
Luca Insolia ◽  
Ana Kenney ◽  
Martina Calovi ◽  
Francesca Chiaromonte

High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee (Apis mellifera) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings.


Author(s):  
Yao Zhang ◽  
Yingcang Ma ◽  
Xiaofei Yang

Like traditional single label learning, multi-label learning is also faced with the problem of dimensional disaster.Feature selection is an effective technique for dimensionality reduction and learning efficiency improvement of high-dimensional data. In this paper, Logistic regression, manifold learning and sparse regularization were combined to construct a joint framework for multi-label feature selection (LMFS). Firstly, the sparsity of the eigenweight matrix is constrained by the $L_{2,1}$-norm. Secondly, the feature manifold and label manifold can constrain the feature weight matrix to make it fit the data information and label information better. An iterative updating algorithm is designed and the convergence of the algorithm is proved.Finally, the LMFS algorithm is compared with DRMFS, SCLS and other algorithms on eight classical multi-label data sets. The experimental results show the effectiveness of LMFS algorithm.


2021 ◽  
Vol 8 ◽  
Author(s):  
Yongjuan Guo ◽  
Xiaomin Chen ◽  
Tianze Zeng ◽  
Lin Wang ◽  
Lvwei Cen

Background: Valid predictors of the syncope recurrence in vasovagal syncope (VVS) patients with a positive head-up tilt test (HUTT) are currently lacking. The goal of this study was to identify the predictive performance of age for the recurrence of syncope in VVS patients with a positive HUTT.Methods: In total, 175 VVS patients with a positive HUTT were observed for 6–32 months, and the recurrence of ≥1 syncope or typical pre-syncope prodromal episodes during follow-up was considered syncope recurrence. The population was divided into 2 groups, namely, a syncope recurrence group (44 patients) and a no syncope recurrence group (131 patients). The baseline clinical data, haemodynamic parameters, and classification of VVS on the HUTT were analyzed. Logistic regression was used to analyse the effect size and confidence interval for age. A receiver operating characteristic (ROC) curve analysis was used to assess the predictive performance and investigate the predictive value of age by the area under the curve (AUC).Results: The median age of the syncope recurrence group was older than that of the no syncope recurrence group [60.0 (47.8, 66.0) years>53.0 (43.0, 62.0) years], and there was a significant difference between the two groups (P < 0.05). The trend for syncope recurrence changed with advancing age, and the logistic regression model adjusted by sex showed that older patients had an increased risk of syncope recurrence in VVS with a positive HUTT [OR value: 1.03, 95% confidence interval (CI): 1.008–1.061, p < 0.05]. Age was a valid predictor for the recurrence of syncope in elderly VVS patients with a positive HUTT (AUC: 0.688; 95% CI: 0.598–0.777, p < 0.05). The cut-off value was 53.5 years, and the sensitivity and specificity were 72.7 and 52.7%, respectively.Conclusions: Age may be a valid predictor for syncope recurrence in elderly VVS patients with a positive HUTT. The rate of syncope recurrence increased with advancing age, especially in females.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246159
Author(s):  
Rahi Jain ◽  
Wei Xu

Feature selection on high dimensional data along with the interaction effects is a critical challenge for classical statistical learning techniques. Existing feature selection algorithms such as random LASSO leverages LASSO capability to handle high dimensional data. However, the technique has two main limitations, namely the inability to consider interaction terms and the lack of a statistical test for determining the significance of selected features. This study proposes a High Dimensional Selection with Interactions (HDSI) algorithm, a new feature selection method, which can handle high-dimensional data, incorporate interaction terms, provide the statistical inferences of selected features and leverage the capability of existing classical statistical techniques. The method allows the application of any statistical technique like LASSO and subset selection on multiple bootstrapped samples; each contains randomly selected features. Each bootstrap data incorporates interaction terms for the randomly sampled features. The selected features from each model are pooled and their statistical significance is determined. The selected statistically significant features are used as the final output of the approach, whose final coefficients are estimated using appropriate statistical techniques. The performance of HDSI is evaluated using both simulated data and real studies. In general, HDSI outperforms the commonly used algorithms such as LASSO, subset selection, adaptive LASSO, random LASSO and group LASSO.


2018 ◽  
Author(s):  
Michail Tsagris ◽  
Zacharias Papadovasilakis ◽  
Kleanthi Lakiotaki ◽  
Ioannis Tsamardinos

AbstractBackgroundFeature selection seeks to identify a minimal-size subset of features that is maximally predictive of the outcome of interest. It is particularly important for biomarker discovery from high-dimensional molecular data, where the features could correspond to gene expressions, Single Nucleotide Polymorphisms (SNPs), proteins concentrations, e.t.c. We evaluate, empirically, three state-of-the-art, feature selection algorithms, scalable to high-dimensional data: a novel generalized variant of OMP (gOMP), LASSO and FBED. All three greedily select the next feature to include; the first two employ the residuals re-sulting from the current selection, while the latter rebuilds a statistical model. The algorithms are compared in terms of predictive performance, number of selected features and computational efficiency, on gene expression data with either survival time (censored time-to-event) or disease status (case-control) as an outcome. This work attempts to answer a) whether gOMP is to be preferred over LASSO and b) whether residual-based algorithms, e.g. gOMP, are to be preferred over algorithms, such as FBED, that rely heavily on regression model fitting.ResultsgOMP is on par, or outperforms LASSO in all metrics, predictive performance, number of features selected and computational efficiency. Contrasting gOMP to FBED, both exhibit similar performance in terms of predictive performance and number of selected features. Overall, gOMP combines the benefits of both LASSO and FBED; it is computationally efficient and produces parsimonious models of high predictive performance.ConclusionsThe use of gOMP is suggested for variable selection with high-dimensional gene expression data, and the target variable need not be restricted to time-to-event or case control, as examined in this paper.


1980 ◽  
Vol 19 (01) ◽  
pp. 42-49 ◽  
Author(s):  
B. W. Brown ◽  
C. Engelhard ◽  
J. Haipern ◽  
J. F. Fries ◽  
L. S. Coles

In solving a clinical problem of diagnosis, prognosis, or treatment choice, a physician must select from among a large group of possible tests. In general, an ordering exists specifying which tests are most valuable in providing relevant information concerning the problem on hand. The computer program package to be described (MW) extracts appropriate data from the ARAMIS data banks and then analyzes the data by stepwise logistic regression. A binary outcome (diagnosis, prognostic event, or treatment response) is sequentially associated with possible tests, and the most powerful combination of tests is identified. For example, the most valuable predictor variable of early mortality in SLE is proteinuria, followed sequentially by anemia and absence of arthritis. Experience with these techniques suggests : 1. optimal certainty is usually reached after only three or four tests; 2. several different test sequences may lead to the same level of certainty; 3. diagnosis may usually be ascertained with greater certainty than prognosis; 4. many medical problems contain considerable non-reducible uncertainty; 5. a relatively small group of tests are typically found among the most powerful; 6. results are consistent across several patient populations; 7. results are largely independent of the particular statistic employed. These observations suggest strategies for maximizing information while minimizing risk and expense.


2020 ◽  
Vol 3 (3) ◽  
pp. 37-50
Author(s):  
Muhammad Suleman Nasir

Society means a group of people who are living together. People need society from birth to death. Without a collective life, man's deeds, intentions, and habits have no value. Islamic society is the name of a balanced and moderate life in which human intellect, customs, and social etiquette are determined in the light of divine revelation. This system is so comprehensive and all-encompassing that it covers all aspects and activities of life. Islam is a comprehensive, universal, complete code of conduct, and an ideal way of life It not only recognizes the collectiveness of human interaction. Rather, it helps in the development of the community and gives it natural principles that strengthen the community and provides good foundations for it and eliminates the factors that spoil it or make it limited and useless. The Principles of a successful social life in Islamic society seem to reflect the Islamic code of conduct and human nature. Islam is the only religion that advocates goodness and guarantees well-being. Islam gives us self-sacrifice, generosity, trust and honesty, service to the people, justice and fairness, forgiveness and kindness, good society and economy, good deeds, mutual unity, harmony, and brotherhood. Only by practicing the pure thoughts, beliefs, and unparalleled ideas of the religion of Islam, can a person live a prosperous life and he can feel real peace and lasting contentment in the moments of his life. A descriptive and analytical research methodology will be used in this study. It is concluded that for a prosperous social life it is necessary to abide by the injunction of Islamic principles, which provides a sound foundation for a successful social life here in the world and hereafter.


Sign in / Sign up

Export Citation Format

Share Document