Parametric modeling of quantile regression coefficient functions with count data

Statistical Methods & Applications ◽

10.1007/s10260-021-00557-7 ◽

2021 ◽

Author(s):

Paolo Frumento ◽

Nicola Salvati

Keyword(s):

Quantile Regression ◽

Count Data ◽

Model Building ◽

R Package ◽

Parametric Modeling ◽

Medical Expenditure ◽

National Medical ◽

Discrete Response ◽

The Us ◽

Parametric Functions

AbstractApplying quantile regression to count data presents logical and practical complications which are usually solved by artificially smoothing the discrete response variable through jittering. In this paper, we present an alternative approach in which the quantile regression coefficients are modeled by means of (flexible) parametric functions. The proposed method avoids jittering and presents numerous advantages over standard quantile regression in terms of computation, smoothness, efficiency, and ease of interpretation. Estimation is carried out by minimizing a “simultaneous” version of the loss function of ordinary quantile regression. Simulation results show that the described estimators are similar to those obtained with jittering, but are often preferable in terms of bias and efficiency. To exemplify our approach and provide guidelines for model building, we analyze data from the US National Medical Expenditure Survey. All the necessary software is implemented in the existing R package .

Download Full-text

Application of A Quantile Regression to Estimate Across Which Quantiles the US Federal Reserve Sets the Monetary Policy In Relation to Short, Medium and Long- Term Yields of the US Interest Rates.

SSRN Electronic Journal ◽

10.2139/ssrn.3253237 ◽

2018 ◽

Author(s):

Michel Guirguis

Keyword(s):

Monetary Policy ◽

Quantile Regression ◽

Interest Rates ◽

Federal Reserve ◽

The Us

Download Full-text

GPSeqClus: an r package for sequential clustering of animal location data for model building, model application, and field site investigations

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13572 ◽

2021 ◽

Author(s):

Justin G. Clapp ◽

Joseph D. Holbrook ◽

Daniel J. Thompson

Keyword(s):

Model Building ◽

R Package ◽

Field Site ◽

Model Application ◽

Location Data ◽

Building Model ◽

Site Investigations ◽

Sequential Clustering

Download Full-text

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

BMC Bioinformatics ◽

10.1186/s12859-021-04221-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Arnaud Liehrmann ◽

Guillem Rigaill ◽

Toby Dylan Hocking

Keyword(s):

Histone Modifications ◽

Count Data ◽

High Throughput Sequencing ◽

Genetic Regulation ◽

Regulation Of Gene Expression ◽

Basic Mechanism ◽

R Package ◽

Detection Accuracy ◽

Full Potential ◽

Segmentation Models

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.

Download Full-text

The convergence of racial and income disparities in health insurance coverage in the United States

International Journal for Equity in Health ◽

10.1186/s12939-021-01436-z ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

De-Chih Lee ◽

Hailun Liang ◽

Leiyu Shi

Keyword(s):

Health Insurance ◽

Low Income ◽

Insurance Coverage ◽

Health Insurance Coverage ◽

The United States ◽

Combined Effect ◽

Medical Expenditure ◽

The Us ◽

Race Ethnicity ◽

Vulnerability Measure

Abstract Objective This study applied the vulnerability framework and examined the combined effect of race and income on health insurance coverage in the US. Data source The household component of the US Medical Expenditure Panel Survey (MEPS-HC) of 2017 was used for the study. Study design Logistic regression models were used to estimate the associations between insurance coverage status and vulnerability measure, comparing insured with uninsured or insured for part of the year, insured for part of the year only, and uninsured only, respectively. Data collection/extraction methods We constructed a vulnerability measure that reflects the convergence of predisposing (race/ethnicity), enabling (income), and need (self-perceived health status) attributes of risk. Principal findings While income was a significant predictor of health insurance coverage (a difference of 6.1–7.2% between high- and low-income Americans), race/ethnicity was independently associated with lack of insurance. The combined effect of income and race on insurance coverage was devastating as low-income minorities with bad health had 68% less odds of being insured than high-income Whites with good health. Conclusion Results of the study could assist policymakers in targeting limited resources on subpopulations likely most in need of assistance for insurance coverage. Policymakers should target insurance coverage for the most vulnerable subpopulation, i.e., those who have low income and poor health as well as are racial/ethnic minorities.

Download Full-text

Gaussian versus count-data hurdle models: cigarette consumption by women in the US

Applied Economics Letters ◽

10.1080/135048599353663 ◽

1999 ◽

Vol 6 (2) ◽

pp. 73-76 ◽

Cited By ~ 13

Author(s):

STEVEN T. YEN

Keyword(s):

Count Data ◽

Cigarette Consumption ◽

Hurdle Models ◽

The Us

Download Full-text

Patient-Provider Communication Disparities by Limited English Proficiency (LEP): Trends from the US Medical Expenditure Panel Survey, 2006–2015

Journal of General Internal Medicine ◽

10.1007/s11606-018-4757-3 ◽

2018 ◽

Vol 34 (8) ◽

pp. 1434-1440 ◽

Cited By ~ 6

Author(s):

Terceira A. Berdahl ◽

James B. Kirby

Keyword(s):

Limited English Proficiency ◽

English Proficiency ◽

Medical Expenditure Panel Survey ◽

Medical Expenditure ◽

Provider Communication ◽

Panel Survey ◽

The Us ◽

Patient Provider Communication

Download Full-text

The economic burden of asthma in US children: Estimates from the National Medical Expenditure Survey☆☆☆★

Journal of Allergy and Clinical Immunology ◽

10.1016/s0091-6749(99)70075-8 ◽

1999 ◽

Vol 104 (5) ◽

pp. 957-963 ◽

Cited By ~ 123

Author(s):

Paula Lozano ◽

Sean D. Sullivan ◽

David H. Smith ◽

Kevin B. Weiss

Keyword(s):

Economic Burden ◽

Medical Expenditure ◽

National Medical ◽

Expenditure Survey ◽

Us Children

Download Full-text

modelBuildR: an R package for model building and feature selection with erroneous classifications

PeerJ ◽

10.7717/peerj.10849 ◽

2021 ◽

Vol 9 ◽

pp. e10849

Author(s):

Maximilian Knoll ◽

Jennifer Furkel ◽

Juergen Debus ◽

Amir Abdollahi

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Model Building ◽

Linear Models ◽

Binary Classification ◽

Ground Truth ◽

R Package ◽

Methylation Array ◽

Survival Difference ◽

Error Probabilities

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).

Download Full-text