scholarly journals Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Siddharth Subramaniyam ◽  
Michael A. DeJesus ◽  
Anisha Zaveri ◽  
Clare M. Smith ◽  
Richard E. Baker ◽  
...  

Abstract Background Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. Results In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. Conclusions Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model.

2019 ◽  
Author(s):  
Siddharth Subramaniyam ◽  
Anisha Zaveri ◽  
Michael A. DeJesus ◽  
Clare Smith ◽  
Richard E. Baker ◽  
...  

AbstractDeep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a retrospective analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model.


Forests ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 377 ◽  
Author(s):  
Zhangwen Su ◽  
Haiqing Hu ◽  
Mulualem Tigabu ◽  
Guangyu Wang ◽  
Aicong Zeng ◽  
...  

Wildfire is a major disturbance that affects large area globally every year. Thus, a better prediction of the likelihood of wildfire occurrence is essential to develop appropriate fire prevention measures. We applied a global negative Binomial (NB) and a geographically weighted negative Binomial regression (GWNBR) models to determine the relationship between wildfire occurrence and its drivers factors in the boreal forests of the Great Xing’an Mountains, northeast China. Using geo-weighted techniques to consider the geospatial information of meteorological, topographic, vegetation type and human factors, we aimed to verify whether the performance of the NB model can be improved. Our results confirmed that the model fitting and predictions of GWNBR model were better than the global NB model, produced more precise and stable model parameter estimation, yielded a more realistic spatial distribution of model predictions, and provided the detection of the impact hotpots of these predictor variables. We found slope, vegetation cover, average precipitation, average temperature, and average relative humidity as important predictors of wildfire occurrence in the Great Xing’an Mountains. Thus, spatially differing relations improves the explanatory power of the global NB model, which does not explain sufficiently the relationship between wildfire occurrence and its drivers. Thus, the GWNBR model can complement the global NB model in overcoming the issue of nonstationary variables, thereby enabling a better prediction of the occurrence of wildfires in large geographical areas and improving management practices of wildfire.


Author(s):  
Zoe Schroder ◽  
James B. Elsner

AbstractEnvironmental variables are routinely used in estimating when and where tornadoes are likely to occur, but more work is needed to understand how tornado and casualty counts of severe weather outbreak vary with the larger scale environmental factors. Here the authors demonstrate a method to quantify ‘outbreak’-level tornado and casualty counts with respect to variations in large-scale environmental factors. They do this by fitting negative binomial regression models to cluster-level environmental data to estimate the number of tornadoes and the number of casualties on days with at least ten tornadoes. Results show that a 1000 J kg−1 increase in CAPE corresponds to a 5% increase in the number of tornadoes and a 28% increase in the number of casualties, conditional on at least ten tornadoes, and holding the other variables constant. Further, results show that a 10 m s−1 increase in deep-layer bulk shear corresponds to a 13% increase in tornadoes and a 98% increase in casualties, conditional on at least ten tornadoes, and holding the other variables constant. The casualty-count model quantifies the decline in the number of casualties per year and indicates that outbreaks have a larger impact in the Southeast than elsewhere after controlling for population and geographic area.


2021 ◽  
pp. jech-2020-215039 ◽  
Author(s):  
Anders Malthe Bach-Mortensen ◽  
Michelle Degli Esposti

IntroductionThe COVID-19 pandemic has disproportionately impacted care homes and vulnerable populations, exacerbating existing health inequalities. However, the role of area deprivation in shaping the impacts of COVID-19 in care homes is poorly understood. We examine whether area deprivation is linked to higher rates of COVID-19 outbreaks and deaths among care home residents across upper tier local authorities in England (n=149).MethodsWe constructed a novel dataset from publicly available data. Using negative binomial regression models, we analysed the associations between area deprivation (Income Deprivation Affecting Older People Index (IDAOPI) and Index of Multiple Deprivation (IMD) extent) as the exposure and COVID-19 outbreaks, COVID-19-related deaths and all-cause deaths among care home residents as three separate outcomes—adjusting for population characteristics (size, age composition, ethnicity).ResultsCOVID-19 outbreaks in care homes did not vary by area deprivation. However, COVID-19-related deaths were more common in the most deprived quartiles of IDAOPI (incidence rate ratio (IRR): 1.23, 95% CI 1.04 to 1.47) and IMD extent (IRR: 1.16, 95% CI 1.00 to 1.34), compared with the least deprived quartiles.DiscussionThese findings suggest that area deprivation is a key risk factor in COVID-19 deaths among care home residents. Future research should look to replicate these results when more complete data become available.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hai-Yang Zhang ◽  
An-Ran Zhang ◽  
Qing-Bin Lu ◽  
Xiao-Ai Zhang ◽  
Zhi-Jie Zhang ◽  
...  

Abstract Background COVID-19 has impacted populations around the world, with the fatality rate varying dramatically across countries. Selenium, as one of the important micronutrients implicated in viral infections, was suggested to play roles. Methods An ecological study was performed to assess the association between the COVID-19 related fatality and the selenium content both from crops and topsoil, in China. Results Totally, 14,045 COVID-19 cases were reported from 147 cities during 8 December 2019–13 December 2020 were included. Based on selenium content in crops, the case fatality rates (CFRs) gradually increased from 1.17% in non-selenium-deficient areas, to 1.28% in moderate-selenium-deficient areas, and further to 3.16% in severe-selenium-deficient areas (P = 0.002). Based on selenium content in topsoil, the CFRs gradually increased from 0.76% in non-selenium-deficient areas, to 1.70% in moderate-selenium-deficient areas, and further to 1.85% in severe-selenium-deficient areas (P < 0.001). The zero-inflated negative binomial regression model showed a significantly higher fatality risk in cities with severe-selenium-deficient selenium content in crops than non-selenium-deficient cities, with incidence rate ratio (IRR) of 3.88 (95% CIs: 1.21–12.52), which was further confirmed by regression fitting the association between CFR of COVID-19 and selenium content in topsoil, with the IRR of 2.38 (95% CIs: 1.14–4.98) for moderate-selenium-deficient cities and 3.06 (1.49–6.27) for severe-selenium-deficient cities. Conclusions Regional selenium deficiency might be related to an increased CFR of COVID-19. Future studies are needed to explore the associations between selenium status and disease outcome at individual-level.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ahmed Nabil Shaaban ◽  
Bárbara Peleteiro ◽  
Maria Rosario O. Martins

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Jun Heo ◽  
Won-Jun Choi ◽  
Seunghon Ham ◽  
Seong-Kyu Kang ◽  
Wanhyung Lee

Abstract Background The association between breakfast skipping and abnormal metabolic outcomes remains controversial. A comprehensive study with various stratified data is required. Objective The aim of this study was to investigate the relationship between abnormal metabolic outcomes and breakfast skipping by sex, age, and work status stratification. Methods We used data from the Korea National Health and Nutrition Examination Surveys from 2013 to 2018. A total of 21,193 (9022 men and 12,171 women) participants were included in the final analysis. The risk of metabolic outcomes linked to breakfast skipping was estimated using the negative binomial regression analysis by sex, work status, and age stratification. Results A total of 11,952 (56.4%) participants consumed breakfast regularly. The prevalence of abnormal metabolic outcomes was higher among those with irregular breakfast consumption habits. Among young male workers, negative binomial regression analysis showed that irregular breakfast eaters had a higher risk of abnormal metabolic outcomes, after adjusting for covariates (odds ratio, 1.15; 95% confidence interval, 1.03–1.27). Conclusions The risk of abnormal metabolic outcomes was significant in young men in the working population. Further studies are required to understand the association of specific working conditions (working hours or shift work) with breakfast intake status and the risk of metabolic diseases.


Author(s):  
Simo Näyhä

AbstractThis paper examines whether the anomalous summer peak in deaths from coronary heart disease (CHD) in Finland could be attributed to adverse effects of the Midsummer festival and alcohol consumption during the festival. Daily deaths from CHD and alcohol poisoning in Finland, 1961–2014, that occurred during the 7 days centering on Midsummer Day were analysed in relation to deaths during 14 to 4 days before and 4 to 14 after Midsummer Day. Daily counts of deaths from CHD among persons aged 35–64 years were regressed on days around the Midsummer period by negative binomial regression. Mortality from CHD was highest on Midsummer Day (RR 1.25 (95% confidence interval 1.12–1.31), one day after the peak in deaths from alcohol poisonings. RR for CHD on Midsummer Day was particulary high (RR = 1.43; 1.09–1.86) in the 2000s, 30% of deaths being attributable to that day. In conclusion, the anomalous and prominent summer peak in deaths from CHD in Finland is an adverse consequence of the Midsummer festival. The most likely underlying reason is heavy alcohol consumption during the festival period, especially on Midsummer Eve. In the 2000s, one third of deaths from CHD on Midsummer Day are preventable.


Sign in / Sign up

Export Citation Format

Share Document