scholarly journals A Bayesian inference tool for identifying artifactual calls from differential transcript abundance analyses

2020 ◽  
Author(s):  
Stefano Mangiola ◽  
Evan A Thomas ◽  
Martin Modrák ◽  
Anthony T Papenfuss

AbstractRelative transcript abundance has proven to be a valuable tool for inferring the phenotype of biological systems from genetic material. Several methods for the analysis of differential transcript abundance have been developed, and some of the most popular are based on negative binomial models. Although most genes are fitted reasonably well by the negative binomial distribution, the presence of outlier observations that do not fit such models can lead to artifactual identification of significant changes in transcription. Identifying those transcripts for the correct interpretation of results is extremely important. A robust and automated tool for detecting sample/transcript pairs that do not fit a negative binomial regression model is currently lacking. Here we propose ppcseq, a robust statistical framework that models hierarchically sample- and gene-wise features such as sequencing depth bias, the association between mean transcript abundance and its over-dispersion, and provides a theoretical transcript abundance distribution, on which the observed transcript abundance can be tested for outliers. We show using a publicly available data set where nearly 10% of differentially abundant transcripts had fold change inflated by the presence of outliers. This method has broad utility in filtering artifactual results of differential transcript abundance analyses based on a negative binomial framework.

2016 ◽  
Vol 63 (1) ◽  
pp. 77-87 ◽  
Author(s):  
William H. Fisher ◽  
Stephanie W. Hartwell ◽  
Xiaogang Deng

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.


2017 ◽  
Vol 65 (2) ◽  
pp. 133-137
Author(s):  
Nasrin Sultana ◽  
Wasimul Bari

The main aim of this paper is to find out the potential determinants of antenatal care visits of women in Bangladesh during their pregnancy. The data set was extracted from Bangladesh Demographic and Health Survey, 2014 and overdispersion is found to be present in the data set. To take the overdispersion into account, negative binomial regression model has been used. A number of socioeconomic and demographic variables were observed to have significant impact on the antenatal care visit. Results are explained using incidence rate ratio. Dhaka Univ. J. Sci. 65(2): 133-137, 2017 (July)


Author(s):  
Yenew Alemu

The COVID-19 pandemic in Ethiopia is a global epidemic of coronavirus disease 2019 caused by severe acute respiratory syndrome coronavirus 2. Amhara Region is a regional state in northern Ethiopia and the homeland of the Amhara people. The main objective of this study was to identify factors of the prevalence of COVID-19 for females in Amhara region. The data set was obtained from Amhara Public Health Institute 2020.  A negative Binomial regression model was conducted to find out the determinants of the prevalence of COVID-19 for females. Out of 5,627 confirmed cases, 96 patients have died and 1,483 confirmed cases were females from 138 daily reports.  Number of recovered COVID-19 cases ( = -0.4566332, 95%CI: (-0.7364772, -0.1767892), P-value < 0.05), severe cases ( = 1.038589, 95%CI: (0.7531619, 1.324017), P-value < 0.05), total deaths ( = 0.5164175, 95% CI: (0.1438362, 0.8889987), P-value <0.05) and the average age of patients per day ( = 1.511936, 95% C.I: (0.9220257, 2.101846), P-value < 0.05 ) were statistically significant factors for the confirmed case of COVID-19 for females in Amhara region. Above 30 average age of patients per day, below 19 number of recovered cases, above 12 number of severe cases, and above two number of deaths are optimally high-risk factors of confirmed cases of COVID-19 for females.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hai-Yang Zhang ◽  
An-Ran Zhang ◽  
Qing-Bin Lu ◽  
Xiao-Ai Zhang ◽  
Zhi-Jie Zhang ◽  
...  

Abstract Background COVID-19 has impacted populations around the world, with the fatality rate varying dramatically across countries. Selenium, as one of the important micronutrients implicated in viral infections, was suggested to play roles. Methods An ecological study was performed to assess the association between the COVID-19 related fatality and the selenium content both from crops and topsoil, in China. Results Totally, 14,045 COVID-19 cases were reported from 147 cities during 8 December 2019–13 December 2020 were included. Based on selenium content in crops, the case fatality rates (CFRs) gradually increased from 1.17% in non-selenium-deficient areas, to 1.28% in moderate-selenium-deficient areas, and further to 3.16% in severe-selenium-deficient areas (P = 0.002). Based on selenium content in topsoil, the CFRs gradually increased from 0.76% in non-selenium-deficient areas, to 1.70% in moderate-selenium-deficient areas, and further to 1.85% in severe-selenium-deficient areas (P < 0.001). The zero-inflated negative binomial regression model showed a significantly higher fatality risk in cities with severe-selenium-deficient selenium content in crops than non-selenium-deficient cities, with incidence rate ratio (IRR) of 3.88 (95% CIs: 1.21–12.52), which was further confirmed by regression fitting the association between CFR of COVID-19 and selenium content in topsoil, with the IRR of 2.38 (95% CIs: 1.14–4.98) for moderate-selenium-deficient cities and 3.06 (1.49–6.27) for severe-selenium-deficient cities. Conclusions Regional selenium deficiency might be related to an increased CFR of COVID-19. Future studies are needed to explore the associations between selenium status and disease outcome at individual-level.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ahmed Nabil Shaaban ◽  
Bárbara Peleteiro ◽  
Maria Rosario O. Martins

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.


Author(s):  
Hitesh Chawla ◽  
Megat-Usamah Megat-Johari ◽  
Peter T. Savolainen ◽  
Christopher M. Day

The objectives of this study were to assess the in-service safety performance of roadside culverts and evaluate the potential impacts of installing various safety treatments to mitigate the severity of culvert-involved crashes. Such crashes were identified using standard fields on police crash report forms, as well as through a review of pertinent keywords from the narrative section of these forms. These crashes were then linked to the nearest cross-drainage culvert, which was associated with the nearest road segment. A negative binomial regression model was then estimated to discern how the risk of culvert-involved crashes varied as a function of annual average daily traffic, speed limit, number of travel lanes, and culvert size and offset. The second stage of the analysis involved the use of the Roadside Safety Analysis Program to estimate the expected crash costs associated with various design contexts. A series of scenarios were evaluated, culminating in guidance as to the most cost-effective treatments for different combinations of roadway geometric and traffic characteristics. The results of this study provide an empirical model that can be used to predict the risk of culvert-involved crashes under various scenarios. The findings also suggest that the installation of safety grates on culvert openings provides a promising alternative for most of the cases where the culvert is located within the clear zone. In general, a guardrail is recommended when adverse conditions are present or when other treatments are not feasible at a specific location.


Sign in / Sign up

Export Citation Format

Share Document