scholarly journals Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes

Author(s):  
Britta Velten ◽  
Wolfgang Huber

Summary Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.

2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Anne-Laure Boulesteix ◽  
Riccardo De Bin ◽  
Xiaoyu Jiang ◽  
Mathias Fuchs

As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.


1981 ◽  
Vol 46 (04) ◽  
pp. 752-756 ◽  
Author(s):  
L Zuckerman ◽  
E Cohen ◽  
J P Vagher ◽  
E Woodward ◽  
J A Caprini

SummaryThrombelastography, although proven as a useful research tool has not been evaluated for its clinical utility against common coagulation laboratory tests. In this study we compare the thrombelastographic measurements with six common tests (the hematocrit, platelet count, fibrinogen, prothrombin time, activated thromboplastin time and fibrin split products). For such comparisons, two samples of subjects were selected, 141 normal volunteers and 121 patients with cancer. The data was subjected to various statistical techniques such as correlation, ANOVA, canonical and discriminant analysis to measure the extent of the correlations between the two sets of variables and their relative strength to detect blood clotting abnormalities. The results indicate that, although there is a strong relationship between the thrombelastographic variables and these common laboratory tests, the thrombelastographic variables contain additional information on the hemostatic process.


Author(s):  
Xiaoling Luo ◽  
Adrian Cottam ◽  
Yao-Jan Wu ◽  
Yangsheng Jiang

Trip purpose information plays a significant role in transportation systems. Existing trip purpose information is traditionally collected through human observation. This manual process requires many personnel and a large amount of resources. Because of this high cost, automated trip purpose estimation is more attractive from a data-driven perspective, as it could improve the efficiency of processes and save time. Therefore, a hybrid-data approach using taxi operations data and point-of-interest (POI) data to estimate trip purposes was developed in this research. POI data, an emerging data source, was incorporated because it provides a wealth of additional information for trip purpose estimation. POI data, an open dataset, has the added benefit of being readily accessible from online platforms. Several techniques were developed and compared to incorporate this POI data into the hybrid-data approach to achieve a high level of accuracy. To evaluate the performance of the approach, data from Chengdu, China, were used. The results show that the incorporation of POI information increases the average accuracy of trip purpose estimation by 28% compared with trip purpose estimation not using the POI data. These results indicate that the additional trip attributes provided by POI data can increase the accuracy of trip purpose estimation.


2022 ◽  
Vol 41 (1) ◽  
pp. 1-17
Author(s):  
Xin Chen ◽  
Anqi Pang ◽  
Wei Yang ◽  
Peihao Wang ◽  
Lan Xu ◽  
...  

In this article, we present TightCap, a data-driven scheme to capture both the human shape and dressed garments accurately with only a single three-dimensional (3D) human scan, which enables numerous applications such as virtual try-on, biometrics, and body evaluation. To break the severe variations of the human poses and garments, we propose to model the clothing tightness field—the displacements from the garments to the human shape implicitly in the global UV texturing domain. To this end, we utilize an enhanced statistical human template and an effective multi-stage alignment scheme to map the 3D scan into a hybrid 2D geometry image. Based on this 2D representation, we propose a novel framework to predict clothing tightness field via a novel tightness formulation, as well as an effective optimization scheme to further reconstruct multi-layer human shape and garments under various clothing categories and human postures. We further propose a new clothing tightness dataset of human scans with a large variety of clothing styles, poses, and corresponding ground-truth human shapes to stimulate further research. Extensive experiments demonstrate the effectiveness of our TightCap to achieve the high-quality human shape and dressed garments reconstruction, as well as the further applications for clothing segmentation, retargeting, and animation.


Diabetes is a long-term disease that ends up in multiple side-effects. It has now become a reticent exterminator in society because it doesn’t reveal any signs hitherto to the patients until it’s too late. It leads to many complications to other organs, such as kidney, cardiovascular, liver or blood pressure [1]. This work tends to apply a unique multitask learning [2] to synchronously map the relation between manifold complications wherever every task conforms to risks of modelling of complications [3]. It also uses feature selection to reduce the set of risk factors from high-dimensional datasets. Then using the concept of correlation, it finds the degree of relativity among various sideeffects. The proposed method is able to identify the possible future health hazards identified with the diabetes patient. This will enable us to explain medical conditions and can improves healthcare applications which would help to improve disease prediction performance.


Sign in / Sign up

Export Citation Format

Share Document