Structure-preserving integrated analysis for risk stratification with application to cancer staging

Biostatistics ◽  
2021 ◽  
Author(s):  
Tianjie Wang ◽  
Rui Chen ◽  
Wenshuo Liu ◽  
Menggang Yu

Summary To provide appropriate and practical level of health care, it is critical to group patients into relatively few strata that have distinct prognosis. Such grouping or stratification is typically based on well-established risk factors and clinical outcomes. A well-known example is the American Joint Committee on Cancer staging for cancer that uses tumor size, node involvement, and metastasis status. We consider a statistical method for such grouping based on individual patient data from multiple studies. The method encourages a common grouping structure as a basis for borrowing information, but acknowledges data heterogeneity including unbalanced data structures across multiple studies. We build on the “lasso-tree” method that is more versatile than the well-known classification and regression tree method in generating possible grouping patterns. In addition, the parametrization of the lasso-tree method makes it very natural to incorporate the underlying order information in the risk factors. In this article, we also strengthen the lasso-tree method by establishing its theoretical properties for which Lin and others (2013. Lasso tree for cancer staging with survival data. Biostatistics 14, 327–339) did not pursue. We evaluate our method in extensive simulation studies and an analysis of multiple breast cancer data sets.

2020 ◽  
Vol 16 (1) ◽  
Author(s):  
Amélie Mugnier ◽  
Sylvie Chastant-Maillard ◽  
Hanna Mila ◽  
Faouzi Lyazrhi ◽  
Florine Guiraud ◽  
...  

Abstract Background Neonatal mortality (over the first three weeks of life) is a major concern in canine breeding facilities as an economic and welfare issue. Since low birth weight (LBW) dramatically increases the risk of neonatal death, the risk factors of occurrence need to be identified together with the chances and determinants of survival of newborns at-risk. Results Data from 4971 puppies from 10 breeds were analysed. Two birth weight thresholds regarding the risk of neonatal mortality were identified by breed, using respectively Receiver Operating Characteristics and Classification and Regression Tree method. Puppies were qualified as LBW and very low birth weight (VLBW) when their birth weight value was respectively between the two thresholds and lower than the two thresholds. Mortality rates were 4.2, 8.8 and 55.3%, in the normal, LBW and VLBW groups, accounting for 48.7, 47.9 and 3.4% of the included puppies, respectively. A separate binary logistic regression approach allowed to identify breed, gender and litter size as determinants of LBW. The increase in litter size and being a female were associated with a higher risk for LBW. Survival for LBW puppies was reduced in litters with at least one stillborn, compared to litters with no stillborn, and was also reduced when the dam was more than 6 years old. Concerning VLBW puppies, occurrence and survival were influenced by litter size. Surprisingly, the decrease in litter size was a risk factor for VLBW and also reduced their survival. The results of this study suggest that VLBW and LBW puppies are two distinct populations. Moreover, it indicates that events and factors affecting intrauterine growth (leading to birth weight reduction) also affect their ability to adapt to extrauterine life. Conclusion These findings could help veterinarians and breeders to improve the management of their facility and more specifically of LBW puppies. Possible recommendations would be to only select for reproduction dams of optimal age and to pay particular attention to LBW puppies born in small litters. Further studies are required to understand the origin of LBW in dogs.


2020 ◽  
Vol 2 (7A) ◽  
Author(s):  
Diaa Alrahmany ◽  
Sirous Golchinheydari ◽  
Islam M. Ghazi

Background: Acinetobacter baumannii (AB) was declared an antibiotic-resistant “Priority 1 pathogen” by WHO. We sought to investigate the predisposing risk factors to this pathogen. Methods: In a retrospective study, adults who were admitted to Sohar hospital during 2016-2017 and had a positive laboratory-confirmed culture of AB were studied.We classified patients into 2 groups based on 30-day, all-cause mortality and compared the characteristics. Exploratory classification and regression tree (CART) analysis was performed to explore risk factors for mortality to include to a logistic regression model. Results: A total of 321 patients were included, age was (Mean±SD) 57.42±20.22, male gender was 180(56.07%), mortality was 140(44%). Survivors vs deceased had; length of stay 38.25±88.74 vs 51.31±79.19 (p=0.002),multi-drug resistantisolates 134(51.34%) vs 127(48.66%) p=<0.001, critical care admission 35(38.04%) vs 57(61.96%) p=<0.001, comorbidities 114(47.50%) vs 126(52.50%) p=<0.001 and history of invasive procedures 82(59.85%) vs 55(40.15%) p=0.27. Logistic regression revealed that the odds of dying increase by a factor of 1.044 for every additional year of age, 1.844 times higher for male compared to female, 4.412 times higher for patients admitted into critical care units compared to general wards, 3.138 times higher for patients admitted with a diagnosis of infection, 2.356 times higher for patients with hospital-acquired AB infection compared to community-acquired. Conclusion: Both modifiable and non-modifiable risk factors are associated with mortality and overall health status may contribute to infection outcome. Stabilization of comorbidities and effective antimicrobial treatment could be the mainstay of successful prevention.


2017 ◽  
Vol 47 (1) ◽  
pp. 31-39 ◽  
Author(s):  
Steven J. Rigatti

For the task of analyzing survival data to derive risk factors associated with mortality, physicians, researchers, and biostatisticians have typically relied on certain types of regression techniques, most notably the Cox model. With the advent of more widely distributed computing power, methods which require more complex mathematics have become increasingly common. Particularly in this era of “big data” and machine learning, survival analysis has become methodologically broader. This paper aims to explore one technique known as Random Forest. The Random Forest technique is a regression tree technique which uses bootstrap aggregation and randomization of predictors to achieve a high degree of predictive accuracy. The various input parameters of the random forest are explored. Colon cancer data (n = 66,807) from the SEER database is then used to construct both a Cox model and a random forest model to determine how well the models perform on the same data. Both models perform well, achieving a concordance error rate of approximately 18%.


Animals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 1165
Author(s):  
Abdelfattah Selim ◽  
Ameer Megahed ◽  
Sahar Kandeel ◽  
Abdullah D. Alanazi ◽  
Hamdan I. Almohammed

Classification and Regression Tree (CART) analysis is a potentially powerful tool for identifying risk factors associated with contagious caprine pleuropneumonia (CCPP) and the important interactions between them. Our objective was therefore to determine the seroprevalence and identify the risk factors associated with CCPP using CART data mining modeling in the most densely sheep- and goat-populated governorates. A cross-sectional study was conducted on 620 animals (390 sheep, 230 goats) distributed over four governorates in the Nile Delta of Egypt in 2019. The randomly selected sheep and goats from different geographical study areas were serologically tested for CCPP, and the animals’ information was obtained from flock men and farm owners. Six variables (geographic location, species, flock size, age, gender, and communal feeding and watering) were used for risk analysis. Multiple stepwise logistic regression and CART modeling were used for data analysis. A total of 124 (20%) serum samples were serologically positive for CCPP. The highest prevalence of CCPP was between aged animals (>4 y; 48.7%) raised in a flock size ≥200 (100%) having communal feeding and watering (28.2%). Based on logistic regression modeling (area under the curve, AUC = 0.89; 95% CI 0.86 to 0.91), communal feeding and watering showed the highest prevalence odds ratios (POR) of CCPP (POR = 3.7, 95% CI 1.9 to 7.3), followed by age (POR = 2.1, 95% CI 1.6 to 2.8) and flock size (POR = 1.1, 95% CI 1.0 to 1.2). However, higher-accuracy CART modeling (AUC = 0.92, 95% CI 0.90 to 0.95) showed that a flock size >100 animals is the most important risk factor (importance score = 8.9), followed by age >4 y (5.3) followed by communal feeding and watering (3.1). Our results strongly suggest that the CCPP is most likely to be found in animals raised in a flock size >100 animals and with age >4 y having communal feeding and watering. Additionally, sheep seem to have an important role in the CCPP epidemiology. The CART data mining modeling showed better accuracy than the traditional logistic regression.


2013 ◽  
Vol 864-867 ◽  
pp. 2782-2786
Author(s):  
Bao Hua Yang ◽  
Shuang Li

This papers deals with the study of the algorithm of classification method based on decision tree for remote sensing image. The experimental area is located in the Xiangyang district, the data source for the 2010 satellite images of SPOT and TM fusion. Moreover, classification method based on decision tree is optimized with the help of the module of RuleGen and applied in regional remote sensing image of interest. The precision of Maximum likelihood ratio is 95.15 percent, and 94.82 percent for CRAT. Experimental results show that the classification method based on classification and regression tree method is as well as the traditional one.


2006 ◽  
Vol 29 (1) ◽  
pp. 153-162
Author(s):  
Pratul Kumar Saraswati ◽  
Sanjeev V Sabnis

Paleontologists use statistical methods for prediction and classification of taxa. Over the years, the statistical analyses of morphometric data are carried out under the assumption of multivariate normality. In an earlier study, three closely resembling species of a biostratigraphically important genus Nummulites were discriminated by multi-group discrimination. Two discriminant functions that used diameter and thickness of the tests and height and length of chambers in the final whorl accounted for nearly 100% discrimination. In this paper Classification and Regression Tree (CART), a non-parametric method, is used for classification and prediction of the same data set. In all 111 iterations of CART methodology are performed by splitting the data set of 55 observations into training, validation and test data sets in varying proportions. In the validation data sets 40% of the iterations are correctly classified and only one case of misclassification in 49% of the iterations is noted. As regards test data sets, nearly 70% contain no misclassification cases whereas in about 25% test data sets only one case of misclassification is found. The results suggest that the method is highly successful in assigning an individual to a particular species. The key variables on the basis of which tree models are built are combinations of thickness of the test (T), height of the chambers in the final whorl (HL) and diameter of the test (D). Both discriminant analysis and CART thus appear to be comparable in discriminating the three species. However, CART reduces the number of requisite variables without increasing the misclassification error. The method is very useful for professional geologists for quick identification of species.


Sign in / Sign up

Export Citation Format

Share Document