Bayesian multiple instance regression for modeling immunogenic neoantigens

The relationship between tumor immune responses and tumor neoantigens is one of the most fundamental and unsolved questions in tumor immunology, and is the key to understanding the inefficiency of immunotherapy observed in many cancer patients. However, the properties of neoantigens that can elicit immune responses remain unclear. This biological problem can be represented and solved under a multiple instance learning framework, which seeks to model multiple instances (neoantigens) within each bag (patient specimen) with the continuous response (T cell infiltration) observed for each bag. To this end, we develop a Bayesian multiple instance regression method, named BMIR, using a Gaussian distribution to address continuous responses and latent binary variables to model primary instances in bags. By means of such Bayesian modeling, BMIR can learn a function for predicting the bag-level responses and for identifying the primary instances within bags, as well as give access to Bayesian statistical inference, which are elusive in existing works. We demonstrate the superiority of BMIR over previously proposed optimization-based methods for multiple instance regression through simulation and real data analyses. Our method is implemented in R package entitled “BayesianMIR” and is available at https://github.com/inmybrain/BayesianMIR .

Download Full-text

Post-prediction inference

10.1101/2020.01.21.914002 ◽

2020 ◽

Author(s):

Siruo Wang ◽

Tyler H McCormick ◽

Jeffrey T Leek

Keyword(s):

Machine Learning ◽

Statistical Inference ◽

Variance Estimation ◽

R Package ◽

Outcome Data ◽

Learning Framework ◽

Autopsy Data ◽

Validation Set ◽

Low Dimensional ◽

The Relationship

Many modern problems in medicine and public health leverage machine learning methods to predict outcomes based on observable covariates. In an increasingly wide array of settings, these predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes post-prediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with an arbitrary machine learning method. Rather than trying to derive the correction from the first principles for each machine learning tool, we make the observation that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for the post-prediction inference that naturally fits into the standard machine learning framework, where the data is divided into training, testing, and validation sets. We train the prediction model in the training set,. We estimate the relationship between the observed and predicted outcomes on the testing set and use that model to correct inference on the validation set and subsequent statistical models. We show our postpi approach can correct bias and improve variance estimation (and thus subsequent statistical inference) with predicted outcome data. To show the broad range of applicability of our approach, we show postpi can improve inference in two totally distinct fields: modeling predicted phenotypes in re-purposed gene expression data and modeling predicted causes of death in verbal autopsy data. We have made our method available through an open-source R package: https://github.com/leekgroup/postpi

Download Full-text

multiMarker: software for modelling and prediction of continuous food intake using multiple biomarkers measurements

BMC Bioinformatics ◽

10.1186/s12859-021-04394-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Silvia D’Angelo ◽

Isobel Claire Gormley ◽

Aoife E. McNamara ◽

Lorraine Brennan

Keyword(s):

Food Intake ◽

Web Application ◽

Real Data ◽

R Package ◽

Great Sensitivity ◽

Web Based ◽

Ongoing Research ◽

Multiple Biomarkers ◽

The Relationship ◽

The Web

Abstract Background Metabolomic biomarkers offer potential for objective and reliable food intake assessment, and there is growing interest in using biomarkers in place of or with traditional self-reported approaches. Ongoing research suggests that multiple biomarkers are associated with single foods, offering great sensitivity and specificity. However, currently there is a dearth of methods to model the relationship between multiple biomarkers and single food intake measurements. Results Here, we introduce multiMarker, a web-based application based on the homonymous R package, that enables one to infer the relationship between food intake and two or more metabolomic biomarkers. Furthermore, multiMarker allows prediction of food intake from biomarker data alone. multiMarker differs from previous approaches by providing distributions of predicted intakes, directly accounting for uncertainty in food intake quantification. Usage of both the R package and the web application is demonstrated using real data concerning three biomarkers for orange intake. Further, example data is pre-loaded in the web application to enable users to examine multiMarker’s functionality. Conclusion The proposed software advance the field of Food Intake Biomarkers providing researchers with a novel tool to perform continuous food intake quantification, and to assess its associated uncertainty, from multiple biomarkers. To facilitate widespread use of the framework, multiMarker has been implemented as an R package and a Shiny web application.

Download Full-text

Exploring the Cooccurrence Patterns of Multiple Sets of Genomic Intervals

BioMed Research International ◽

10.1155/2013/617545 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Hao Wu ◽

Zhaohui S. Qin

Keyword(s):

Spatial Relationship ◽

Pairwise Comparison ◽

Software Tool ◽

Real Data ◽

R Package ◽

Genomic Research ◽

Model Parameters ◽

Sufficient Statistics ◽

Relationship Of ◽

The Relationship

Background. Exploring the spatial relationship of different genomic features has been of great interest since the early days of genomic research. The relationship sometimes provides useful information for understanding certain biological processes. Recent advances in high-throughput technologies such as ChIP-seq produce large amount of data in the form of genomic intervals. Most of the existing methods for assessing spatial relationships among the intervals are designed for pairwise comparison and cannot be easily scaled up.Results. We present a statistical method and software tool to characterize the cooccurrence patterns of multiple sets of genomic intervals. The occurrences of genomic intervals are described by a simple finite mixture model, where each component represents a distinct cooccurrence pattern. The model parameters are estimated via an EM algorithm and can be viewed as sufficient statistics of the cooccurrence patterns. Simulation and real data results show that the model can accurately capture the patterns and provide biologically meaningful results. The method is implemented in a freely available R packagegiClust.Conclusions. The method and the software provide a convenient way for biologists to explore the cooccurrence patterns among a relatively large number of sets of genomic intervals.

Download Full-text

Methods for correcting inference based on outcomes predicted by machine learning

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2001238117 ◽

2020 ◽

Vol 117 (48) ◽

pp. 30266-30275

Author(s):

Siruo Wang ◽

Tyler H. McCormick ◽

Jeffrey T. Leek

Keyword(s):

Machine Learning ◽

Statistical Inference ◽

Variance Estimation ◽

Learning Algorithm ◽

R Package ◽

Neural Nets ◽

Learning Framework ◽

Validation Set ◽

Low Dimensional ◽

The Relationship

Many modern problems in medicine and public health leverage machine-learning methods to predict outcomes based on observable covariates. In a wide array of settings, predicted outcomes are used in subsequent statistical analysis, often without accounting for the distinction between observed and predicted outcomes. We call inference with predicted outcomes postprediction inference. In this paper, we develop methods for correcting statistical inference using outcomes predicted with arbitrarily complicated machine-learning models including random forests and deep neural nets. Rather than trying to derive the correction from first principles for each machine-learning algorithm, we observe that there is typically a low-dimensional and easily modeled representation of the relationship between the observed and predicted outcomes. We build an approach for postprediction inference that naturally fits into the standard machine-learning framework where the data are divided into training, testing, and validation sets. We train the prediction model in the training set, estimate the relationship between the observed and predicted outcomes in the testing set, and use that relationship to correct subsequent inference in the validation set. We show our postprediction inference (postpi) approach can correct bias and improve variance estimation and subsequent statistical inference with predicted outcomes. To show the broad range of applicability of our approach, we show postpi can improve inference in two distinct fields: modeling predicted phenotypes in repurposed gene expression data and modeling predicted causes of death in verbal autopsy data. Our method is available through an open-source R package:https://github.com/leekgroup/postpi.

Download Full-text

MHC and malaria: the relationship between HLA class II alleles and immune responses to Plasmodium falciprum

International Immunology ◽

10.1093/intimm/4.9.1055 ◽

1992 ◽

Vol 4 (9) ◽

pp. 1055-1063 ◽

Cited By ~ 29

Author(s):

E. M. Riley ◽

O. Olerup ◽

S. Bennett ◽

P. Rowe ◽

S. J. Allen ◽

...

Keyword(s):

Immune Responses ◽

Class Ii ◽

Hla Class Ii ◽

The Relationship ◽

Hla Class Ii Alleles

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Multidimensional Item Response Theory Model for Continuous and Graded Responses With Error in Persons and Items

Educational and Psychological Measurement ◽

10.1177/0013164421998412 ◽

2021 ◽

pp. 001316442199841

Author(s):

Pere J. Ferrando ◽

David Navarro-González

Keyword(s):

Item Response Theory ◽

Item Response ◽

Theory Model ◽

Response Model ◽

Response Theory ◽

Continuous Response ◽

Graded Responses ◽

Graded Response ◽

Continuous Responses ◽

Differential Measurement Error

Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.

Download Full-text

Ivermectin converts cold tumors hot and synergizes with immune checkpoint blockade for treatment of breast cancer

npj Breast Cancer ◽

10.1038/s41523-021-00229-5 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Dobrin Draganov ◽

Zhen Han ◽

Aamir Rana ◽

Nitasha Bennett ◽

Darrell J. Irvine ◽

...

Keyword(s):

Breast Cancer ◽

Immune Responses ◽

Immune Checkpoint Blockade ◽

Checkpoint Blockade ◽

Synergistic Activity ◽

Primary Tumors ◽

T Cell Infiltration ◽

Cancer Cell Death ◽

Bona Fide

AbstractWe show that treatment with the FDA-approved anti-parasitic drug ivermectin induces immunogenic cancer cell death (ICD) and robust T cell infiltration into breast tumors. As an allosteric modulator of the ATP/P2X4/P2X7 axis which operates in both cancer and immune cells, ivermectin also selectively targets immunosuppressive populations including myeloid cells and Tregs, resulting in enhanced Teff/Tregs ratio. While neither agent alone showed efficacy in vivo, combination therapy with ivermectin and checkpoint inhibitor anti-PD1 antibody achieved synergy in limiting tumor growth (p = 0.03) and promoted complete responses (p < 0.01), also leading to immunity against contralateral re-challenge with demonstrated anti-tumor immune responses. Going beyond primary tumors, this combination achieved significant reduction in relapse after neoadjuvant (p = 0.03) and adjuvant treatment (p < 0.001), and potential cures in metastatic disease (p < 0.001). Statistical modeling confirmed bona fide synergistic activity in both the adjuvant (p = 0.007) and metastatic settings (p < 0.001). Ivermectin has dual immunomodulatory and ICD-inducing effects in breast cancer, converting cold tumors hot, thus represents a rational mechanistic partner with checkpoint blockade.

Download Full-text

Spatial pattern and genetic diversity estimates are linked in stochastic models of population differentiation

Genetics and Molecular Biology ◽

10.1590/s1415-47572000000300007 ◽

2000 ◽

Vol 23 (3) ◽

pp. 541-544 ◽

Cited By ~ 8

Author(s):

José Alexandre Felizola Diniz-Filho ◽

Mariana Pires de Campos Telles

Keyword(s):

Genetic Diversity ◽

Spatial Pattern ◽

Stochastic Models ◽

Population Differentiation ◽

Real Data ◽

Population Heterogeneity ◽

Data Set ◽

Mantel’S Test ◽

The Relationship ◽

Diversity Estimates

In the present study, we used both simulations and real data set analyses to show that, under stochastic processes of population differentiation, the concepts of spatial heterogeneity and spatial pattern overlap. In these processes, the proportion of variation among and within a population (measured by G ST and 1 - G ST, respectively) is correlated with the slope and intercept of a Mantel's test relating genetic and geographic distances. Beyond the conceptual interest, the inspection of the relationship between population heterogeneity and spatial pattern can be used to test departures from stochasticity in the study of population differentiation.

Download Full-text

Detection of differentially methylated CpG sites between tumor samples with uneven tumor purities

Bioinformatics ◽

10.1093/bioinformatics/btz885 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2017-2024

Author(s):

Weiwei Zhang ◽

Ziyi Li ◽

Nana Wei ◽

Hua-Jun Wu ◽

Xiaoqi Zheng

Keyword(s):

Real Data ◽

R Package ◽

Differential Methylation ◽

Least Square ◽

Epigenetic Mechanism ◽

Supplementary Information ◽

Cpg Sites ◽

Tumor Purity ◽

Different Sources ◽

Normal Controls

Abstract Motivation Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. Results We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. Availability and implementation InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text