scholarly journals Parametric modeling of quantile regression coefficient functions with count data

Author(s):  
Paolo Frumento ◽  
Nicola Salvati

AbstractApplying quantile regression to count data presents logical and practical complications which are usually solved by artificially smoothing the discrete response variable through jittering. In this paper, we present an alternative approach in which the quantile regression coefficients are modeled by means of (flexible) parametric functions. The proposed method avoids jittering and presents numerous advantages over standard quantile regression in terms of computation, smoothness, efficiency, and ease of interpretation. Estimation is carried out by minimizing a “simultaneous” version of the loss function of ordinary quantile regression. Simulation results show that the described estimators are similar to those obtained with jittering, but are often preferable in terms of bias and efficiency. To exemplify our approach and provide guidelines for model building, we analyze data from the US National Medical Expenditure Survey. All the necessary software is implemented in the existing R package .

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Arnaud Liehrmann ◽  
Guillem Rigaill ◽  
Toby Dylan Hocking

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
De-Chih Lee ◽  
Hailun Liang ◽  
Leiyu Shi

Abstract Objective This study applied the vulnerability framework and examined the combined effect of race and income on health insurance coverage in the US. Data source The household component of the US Medical Expenditure Panel Survey (MEPS-HC) of 2017 was used for the study. Study design Logistic regression models were used to estimate the associations between insurance coverage status and vulnerability measure, comparing insured with uninsured or insured for part of the year, insured for part of the year only, and uninsured only, respectively. Data collection/extraction methods We constructed a vulnerability measure that reflects the convergence of predisposing (race/ethnicity), enabling (income), and need (self-perceived health status) attributes of risk. Principal findings While income was a significant predictor of health insurance coverage (a difference of 6.1–7.2% between high- and low-income Americans), race/ethnicity was independently associated with lack of insurance. The combined effect of income and race on insurance coverage was devastating as low-income minorities with bad health had 68% less odds of being insured than high-income Whites with good health. Conclusion Results of the study could assist policymakers in targeting limited resources on subpopulations likely most in need of assistance for insurance coverage. Policymakers should target insurance coverage for the most vulnerable subpopulation, i.e., those who have low income and poor health as well as are racial/ethnic minorities.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10849
Author(s):  
Maximilian Knoll ◽  
Jennifer Furkel ◽  
Juergen Debus ◽  
Amir Abdollahi

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).


Biometrics ◽  
2015 ◽  
Vol 72 (1) ◽  
pp. 74-84 ◽  
Author(s):  
Paolo Frumento ◽  
Matteo Bottai

Sign in / Sign up

Export Citation Format

Share Document