selection of variables
Recently Published Documents


TOTAL DOCUMENTS

288
(FIVE YEARS 104)

H-INDEX

26
(FIVE YEARS 3)

2021 ◽  
pp. 106002802110592
Author(s):  
Barbara Blaylock ◽  
Xiaoli Niu ◽  
H. Edward Davidson ◽  
Stefan Gravenstein ◽  
Ronald DePue ◽  
...  

Background Assessing chronic obstructive pulmonary disease (COPD) severity is challenging in nursing home (NH) residents due to incomplete symptom assessments and exacerbation history. Objective The objective of this study was to predict COPD severity in NH residents using the Minimum Data Set (MDS), a clinical assessment of functional capabilities and health needs. Methods A cohort analysis of prospectively collected longitudinal data was conducted. Residents from geographically varied Medicare-certified NHs with age ≥60 years, COPD diagnosis, and ≥6 months NH residence at enrollment were included. Residents with severe cognitive impairment were excluded. Demographic characteristics, medical history, and MDS variables were extracted from medical records. The care provider–completed COPD Assessment Test (CAT) and COPD exacerbation history were used to categorize residents by Global Initiative for Chronic Lung Disease (GOLD) A to D groups. Multivariate multinomial logit models mapped the MDS to GOLD A to D groups with stepwise selection of variables. Results Nursing home residents (N = 175) were 64% women and had a mean age of 77.9 years. Among residents, GOLD B was most common (A = 13.1%; B = 44.0%; C = 5.7%; D = 37.1%). Any long-acting bronchodilator (LABD) use and any dyspnea were significant predictors of GOLD A to D groups. The predicted MDS-GOLD group (A = 6.9%; B = 52.6%; C = 4.6%; D = 36.0%) showed good model fit (correctly predicted = 60.6%). Nursing home residents may underuse group-recommended LABD treatment (no LABD: B = 53.2%; C = 80.0%; D = 40.0%). Conclusion and Relevance The MDS, completed routinely for US NH residents, could potentially be used to estimate COPD severity. Predicted COPD severity with additional validation could provide a map to evidence-based treatment guidelines and may help to individualize treatment pathways for NH residents.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yuan Zhou ◽  
Botao Fa ◽  
Ting Wei ◽  
Jianle Sun ◽  
Zhangsheng Yu ◽  
...  

AbstractInvestigation of the genetic basis of traits or clinical outcomes heavily relies on identifying relevant variables in molecular data. However, characteristics such as high dimensionality and complex correlation structures of these data hinder the development of related methods, resulting in the inclusion of false positives and negatives. We developed a variable importance measure method, termed the ECAR scores, that evaluates the importance of variables in the dataset. Based on this score, ranking and selection of variables can be achieved simultaneously. Unlike most current approaches, the ECAR scores aim to rank the influential variables as high as possible while maintaining the grouping property, instead of selecting the ones that are merely predictive. The ECAR scores’ performance is tested and compared to other methods on simulated, semi-synthetic, and real datasets. Results showed that the ECAR scores improve the CAR scores in terms of accuracy of variable selection and high-rank variables’ predictive power. It also outperforms other classic methods such as lasso and stability selection when there is a high degree of correlation among influential variables. As an application, we used the ECAR scores to analyze genes associated with forced expiratory volume in the first second in patients with lung cancer and reported six associated genes.


2021 ◽  
pp. 1-72
Author(s):  
Marjorie Lima do Vale ◽  
Luke Buckner ◽  
Claudia Gabriela Mitrofan ◽  
Claudia Raulino Tramontt ◽  
Sento Kai Kargbo ◽  
...  

Abstract Cardiovascular disease (CVD) is the most common non-communicable disease occurring globally. Although previous literature have provided useful insights on the important role that diet play in CVD prevention and treatment, understanding the causal role of diets is a difficult task considering inherent and introduced weaknesses of observational (e.g., not properly addressing confounders and mediators) and experimental research designs (e.g., not appropriate or well-designed). In this narrative review, we organised current evidence linking diet, as well as conventional and emerging physiological risk factors with CVD risk, incidence and mortality in a series of diagrams. The diagrams presented can aid causal inference studies as they provide a visual representation of the types of studies underlying the associations between potential risk markers/factors for CVD. This may facilitate the selection of variables to be considered and the creation of analytical models. Evidence depicted in the diagrams was systematically collected from studies included in the British Nutrition Task Force report on Diet and CVD and database searches, including Medline and Embase. Although several markers and disorders linked to conventional and emerging risk factors for CVD were identified, the causal link between many remains unknown. There is a need to address the multifactorial nature of CVD and the complex interplay between conventional and emerging risk factors with natural and built environments, while bringing the life course and the role of additional environmental factors into the spotlight.


2021 ◽  
Vol 13 (21) ◽  
pp. 4466
Author(s):  
Isabell Eischeid ◽  
Eeva M. Soininen ◽  
Jakob J. Assmann ◽  
Rolf A. Ims ◽  
Jesper Madsen ◽  
...  

The Arctic is under great pressure due to climate change. Drones are increasingly used as a tool in ecology and may be especially valuable in rapidly changing and remote landscapes, as can be found in the Arctic. For effective applications of drones, decisions of both ecological and technical character are needed. Here, we provide our method planning workflow for generating ground-cover maps with drones for ecological monitoring purposes. The workflow includes the selection of variables, layer resolutions, ground-cover classes and the development and validation of models. We implemented this workflow in a case study of the Arctic tundra to develop vegetation maps, including disturbed vegetation, at three study sites in Svalbard. For each site, we generated a high-resolution map of tundra vegetation using supervised random forest (RF) classifiers based on four spectral bands, the normalized difference vegetation index (NDVI) and three types of terrain variables—all derived from drone imagery. Our classifiers distinguished up to 15 different ground-cover classes, including two classes that identify vegetation state changes due to disturbance caused by herbivory (i.e., goose grubbing) and winter damage (i.e., ‘rain-on-snow’ and thaw-freeze). Areas classified as goose grubbing or winter damage had lower NDVI values than their undisturbed counterparts. The predictive ability of site-specific RF models was good (macro-F1 scores between 83% and 85%), but the area of the grubbing class was overestimated in parts of the moss tundra. A direct transfer of the models between study sites was not possible (macro-F1 scores under 50%). We show that drone image analysis can be an asset for studying future vegetation state changes on local scales in Arctic tundra ecosystems and encourage ecologists to use our tailored workflow to integrate drone mapping into long-term monitoring programs.


Author(s):  
В.В. Мокшин ◽  
А.В. Спиридонова ◽  
Г.В. Спиридонов

Рассматриваются математические и информационные методы эффективного прогнозирования потребления водных ресурсов. Произведены расчёты водопотребления по типовому административному зданию. Предложенные материалы представляют интерес для широкого круга специалистов, занимающихся разработкой экономико-математических моделей и повышением эффективности при планировании водных ресурсов в сфере жилищно-коммунального хозяйства. Прогнозирование осуществлялось с помощью регрессионных методов Forward Regression и Backward Elimination, включающих в себя как линейные, так и множественные нелинейные подходы анализа данных. Отдельное внимание было уделено сравнению действительных и прогнозируемых показаний. В ходе работы были выявлены наиболее релевантные алгоритмы, которые позволили произвести достаточно точную оценку водопотребления, что считается одной из основных задач водоснабжения и управления водопроводными сетями. В ходе исследования было установлено, что корректность прогнозируемых результатов в равной степени зависит как от количества исходных данных, на основе которых производится построение моделей, так и от количества дней, на которое производится прогнозирование. В случае выборки данных в 255 исходных и 116 прогнозируемых дней наиболее вероятные значения были получены регрессионными методами прямого и обратного отбора переменных. Проведённый анализ позволил указать причины появления ошибок при использовании данных методов. На основе достоверности расчётных показаний можно говорить о востребованности и пригодности изученных методов среди информационных систем на промышленных и жилищно-коммунальных объектах. Комплексный подход оптимизирует процесс планирования и повышает точность прогнозируемых значений суточного водопотребления в пределах жилищных микрорайонов, что сегодня является исключительно важным аспектом в сфере водоснабжения и управления водопроводными сетями This article discusses mathematical and informational methods for effective forecasting of water consumption. We calculated the water consumption for a typical administrative building. The materials proposed in the article are of interest to a wide range of specialists working on the development of economic and mathematical models and increasing the efficiency of housing and communal companies. We carried out the prediction using regression methods - Forward Regression and Backward Elimination, which include both linear and multiple nonlinear approaches to data analysis. We paid special attention to the comparison of actual and predicted readings. In the course of the work, we identified the most relevant algorithms, which allowed us to make a fairly accurate assessment of water consumption, which is an extremely important aspect in the field of water supply and management of water supply networks. In the course of the study, we found that the correctness of the predicted results equally depends both on the amount of initial data, on the basis of which the models are built, and on the number of days for which the forecast is made. In the case of a sample of data of 255 baseline and 116 forecast days, we obtained the most probable values by regression methods of direct and inverse selection of variables. The analysis made it possible to indicate the reasons for the appearance of errors when using these methods. Based on the reliability of the calculated readings, we can talk about the relevance and suitability of the studied methods among information systems at industrial and housing and communal facilities. An integrated approach optimizes the planning process and increases the accuracy of the predicted values of daily water consumption within residential areas, which today is an extremely important aspect in the field of water supply and management of water supply networks


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Panek Michał ◽  
Stawiski Konrad ◽  
Kuna Piotr

Abstract Background TGF-β and its receptors play a crucial role in asthma pathogenesis, bronchial hyperreactivity, and bronchial remodeling. Expression of isoforms 1–3 of TGFβ cytokine is influenced by tagging polymorphisms in the TGFβ1, TGFβ2 and TGFβ3 gene, and these SNPs may be associated with the risk of asthma development and severity as well as with other diseases. Polymorphic forms of TGF-β1, TGF-β2 and TGF-β3 genes regulate the degree of bronchial inflammation, deterioration of lung functional parameters in spirometry and elevated level of total IgE. All this results in intensification of disease symptoms. According to current GINA 2020 guidelines, the Asthma Control Test (ACT™) should be applied to assess asthma symptoms. Methods An analysis of polymorphisms localized in TGF-β1, TGF-β2 and TGF-β3 genes was conducted on 652 DNA samples with an application of the MassARRAY® system using the mass spectrometry technique MALDI TOF MS. The degree of asthma control was evaluated with ACT™. Results The occurrence of the T / C genotype in rs8109627 (p = 0.0171) in the TGF-β1 gene is significantly associated with a higher ACT result (controlled asthma) in a multivariate linear regression analysis model after using backward stepwise selection of variables. In addition, in the linear model for prediction of ACT score we showed SNP rs8109627 (p = 0.0497) in the TGF-β1 gene (improvement of the disease control - controlled asthma) and rs2796822 (p = 0.0454) in the TGF-β2 gene (deterioration of the diseases control - uncontrolled asthma) significantly modify the degree of asthma control. Discussion We described clinical significance of two SNPs in two genes TGF-β1 and TGF-β2, as yet unknown. We proved that the use of both genotypes and MAC allows to create a moderately correct prognostic model which is about 70% efficient on the entire set of analyzed SNPs in TGF-β1, TGF-β2, and TGF-β3 genes.


2021 ◽  
Vol 19 (3) ◽  
pp. 499
Author(s):  
Milan Andrejić ◽  
Milorad Kilibarda ◽  
Vukašin Pajić

In the last decade, more and more attention has been paid to the efficiency of logistics systems not only in the literature but also in practice. The reason is the huge savings that can be achieved. In a very dynamic market with environmental changes distribution centers have to realize their activities and processes in an efficient way. Distribution centers connect producers with other participants in the supply chain, including end-users. The main objective of this paper is to develop a DEA model for measuring distribution centers’ efficiency change in time. The paper investigates the impact of input and output variables selection on the resulting efficiency in the context of measuring the change in efficiency over time. The selection of variables on the one hand is a basic step in applying the DEA method. On the other hand, the number of basic and derived indicators that are monitored in real systems is increasing, while the percentage of those used in the decision-making process is decreasing (less than 20%). The developed model was tested on the example of a retail chain operating in Serbia. The main factors changing the efficiency have been identified, as well as the corresponding corrective actions. For measuring efficiency change in time Malmquist productivity index is used. The developed approach could help managers in the decision-making process and also represents a good basis for further research.


2021 ◽  
Author(s):  
Reetika Sarkar ◽  
Sithija Manage ◽  
Xiaoli Gao

Abstract Background: High-dimensional genomic data studies are often found to exhibit strong correlations, which results in instability and inconsistency in the estimates obtained using commonly used regularization approaches including both the Lasso and MCP, and related methods. Result: In this paper, we perform a comparative study of regularization approaches for variable selection under different correlation structures, and propose a two-stage procedure named rPGBS to address the issue of stable variable selection in various strong correlation settings. This approach involves repeatedly running of a two-stage hierarchical approach consisting of a random pseudo-group clustering and bi-level variable selection. Conclusion: Both the simulation studies and high-dimensional genomic data analysis have demonstrated the advantage of the proposed rPGBS method over most commonly used regularization methods. In particular, the rPGBS results in more stable selection of variables across a variety of correlation settings, as compared to recent work addressing variable selection with strong correlations. Moreover, the rPGBS is computationally efficient across various settings.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lorena Hafermann ◽  
Heiko Becher ◽  
Carolin Herrmann ◽  
Nadja Klein ◽  
Georg Heinze ◽  
...  

Abstract Background Statistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed “background knowledge” truly is. In fact, “known” predictors might be findings from preceding studies which may also have employed inappropriate model building strategies. Methods We conducted a simulation study assessing the influence of treating variables as “known predictors” in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a “known” predictor if a predefined number of preceding studies identified it as relevant. Results Even if several preceding studies identified a variable as a “true” predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection. Conclusions The source of “background knowledge” should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.


Author(s):  
Alvaro Alonso ◽  
Faye L. Norby ◽  
Richard F. MacLehose ◽  
Neil A. Zakai ◽  
Rob F. Walker ◽  
...  

Background Current scores for bleeding risk assessment in patients with venous thromboembolism (VTE) undergoing oral anticoagulation have limited predictive capacity. We developed and internally validated a bleeding prediction model using healthcare claims data. Methods and Results We selected patients with incident VTE initiating oral anticoagulation in the 2011 to 2017 MarketScan databases. Hospitalized bleeding events were identified using validated algorithms in the 180 days after VTE diagnosis. We evaluated demographic factors, comorbidities, and medication use before oral anticoagulation initiation as potential predictors of bleeding using stepwise selection of variables in Cox models run on 1000 bootstrap samples of the patient population. Variables included in >60% of all models were selected for the final analysis. We internally validated the model using bootstrapping and correcting for optimism. We included 165 434 patients with VTE and initiating oral anticoagulation, of whom 2294 had a bleeding event. After undergoing the variable selection process, the final model included 20 terms (15 main effects and 5 interactions). The c‐statistic for the final model was 0.68 (95% CI, 0.67–0.69). The internally validated c‐statistic corrected for optimism was 0.68 (95% CI, 0.67–0.69). For comparison, the c‐statistic of the Hypertension, Abnormal Renal/Liver Function, Stroke, Bleeding History or Predisposition, Labile International Normalized Ratio, Elderly (>65 Years), Drugs/Alcohol Concomitantly (HAS‐BLED) score in this population was 0.62 (95% CI, 0.61–0.63). Conclusions We have developed a novel model for bleeding prediction in VTE using large healthcare claims databases. Performance of the model was moderately good, highlighting the urgent need to identify better predictors of bleeding to inform treatment decisions.


Sign in / Sign up

Export Citation Format

Share Document