TRONCO: an R package for the inference of cancer progression models from heterogeneous genomic data

AbstractMotivationWe introduce TRONCO (TRanslational ONCOlogy), an open-source R package that implements the state-of-the-art algorithms for the inference of cancer progression models from (epi)genomic mutational profiles. TRONCO can be used to extract population-level models describing the trends of accumulation of alterations in a cohort of cross-sectional samples, e.g., retrieved from publicly available databases, and individual-level models that reveal the clonal evolutionary history in single cancer patients, when multiple samples, e.g., multiple biopsies or single-cell sequencing data, are available. The resulting models can provide key hints in uncovering the evolutionary trajectories of cancer, especially for precision medicine or personalized therapy.AvailabilityTRONCO is released under the GPL license, it is hosted in the Software section at http://bimib.disco.unimib.it/ and archived also at [email protected]

Download Full-text

Algorithmic methods to infer the evolutionary trajectories in cancer progression

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1520213113 ◽

2016 ◽

Vol 113 (28) ◽

pp. E4025-E4034 ◽

Cited By ~ 38

Author(s):

Giulio Caravagna ◽

Alex Graudenzi ◽

Daniele Ramazzotti ◽

Rebeca Sanz-Pamplona ◽

Luca De Sano ◽

...

Keyword(s):

Cancer Progression ◽

Current Knowledge ◽

Population Level ◽

Selective Advantage ◽

Explanatory Models ◽

Next Generation Sequencing Data ◽

Driver Mutations ◽

Sequencing Data ◽

Cross Sectional ◽

Progression Model

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.

Download Full-text

Examining the Effects of the Sacramento Dockless E-Bike Share on Bicycling and Driving

Sustainability ◽

10.3390/su13010368 ◽

2021 ◽

Vol 13 (1) ◽

pp. 368

Author(s):

Dillon T. Fitch ◽

Hossain Mohiuddin ◽

Susan L. Handy

Keyword(s):

Travel Behavior ◽

Regression Models ◽

Population Level ◽

Panel Survey ◽

Multilevel Regression ◽

Combine Data ◽

Cross Sectional ◽

Individual Level ◽

Before And After ◽

Bike Share

One way cities are looking to promote bicycling is by providing publicly or privately operated bike-share services, which enable individuals to rent bicycles for one-way trips. Although many studies have examined the use of bike-share services, little is known about how these services influence individual-level travel behavior more generally. In this study, we examine the behavior of users and non-users of a dockless, electric-assisted bike-share service in the Sacramento region of California. This service, operated by Jump until suspended due to the coronavirus pandemic, was one of the largest of its kind in the U.S., and spanned three California cities: Sacramento, West Sacramento, and Davis. We combine data from a repeat cross-sectional before-and-after survey of residents and a longitudinal panel survey of bike-share users with the goal of examining how the service influenced individual-level bicycling and driving. Results from multilevel regression models suggest that the effect of bike-share on average bicycling and driving at the population level is likely small. However, our results indicate that people who have used-bike share are likely to have increased their bicycling because of bike-share.

Download Full-text

Urine Spot Samples Can Be Used to Estimate 24-Hour Urinary Sodium Excretion in Children

Journal of Nutrition ◽

10.1093/jn/nxy211 ◽

2018 ◽

Vol 148 (12) ◽

pp. 1946-1953 ◽

Cited By ~ 6

Author(s):

Magali Rios-Leyvraz ◽

Pascal Bovet ◽

René Tabin ◽

Bernard Genin ◽

Michel Russo ◽

...

Keyword(s):

Sodium Excretion ◽

Salt Intake ◽

Sodium Intake ◽

Population Level ◽

Urinary Sodium Excretion ◽

Cross Sectional Study ◽

Urinary Sodium ◽

Cross Sectional ◽

High Sodium ◽

Individual Level

ABSTRACT Background The gold standard to assess salt intake is 24-h urine collections. Use of a urine spot sample can be a simpler alternative, especially when the goal is to assess sodium intake at the population level. Several equations to estimate 24-h urinary sodium excretion from urine spot samples have been tested in adults, but not in children. Objective The objective of this study was to assess the ability of several equations and urine spot samples to estimate 24-h urinary sodium excretion in children. Methods A cross-sectional study of children between 6 and 16 y of age was conducted. Each child collected one 24-h urine sample and 3 timed urine spot samples, i.e., evening (last void before going to bed), overnight (first void in the morning), and morning (second void in the morning). Eight equations (i.e., Kawasaki, Tanaka, Remer, Mage, Brown with and without potassium, Toft, and Meng) were used to estimate 24-h urinary sodium excretion. The estimates from the different spot samples and equations were compared with the measured excretion through the use of several statistics. Results Among the 101 children recruited, 86 had a complete 24-h urine collection and were included in the analysis (mean age: 10.5 y). The mean measured 24-h urinary sodium excretion was 2.5 g (range: 0.8–6.4 g). The different spot samples and equations provided highly heterogeneous estimates of the 24-h urinary sodium excretion. The overnight spot samples with the Tanaka and Brown equations provided the most accurate estimates (mean bias: −0.20 to −0.12 g; correlation: 0.48–0.53; precision: 69.7–76.5%; sensitivity: 76.9–81.6%; specificity: 66.7%; and misclassification: 23.0–27.7%). The other equations, irrespective of the timing of the spot, provided less accurate estimates. Conclusions Urine spot samples, with selected equations, might provide accurate estimates of the 24-h sodium excretion in children at a population level. At an individual level, they could be used to identify children with high sodium excretion. This study was registered at clinicaltrials.gov as NCT02900261.

Download Full-text

Algorithmic Methods to Infer the Evolutionary Trajectories in Cancer Progression

10.1101/027359 ◽

2015 ◽

Cited By ~ 3

Author(s):

Giulio Caravagna ◽

Alex Graudenzi ◽

DANIELE RAMAZZOTTI ◽

Rebeca Sanz-Pamplona ◽

Luca De Sano ◽

...

Keyword(s):

Cancer Progression ◽

Current Knowledge ◽

Population Level ◽

Selective Advantage ◽

Explanatory Models ◽

Driver Mutations ◽

Cross Sectional ◽

Progression Model ◽

Colorectal Cancer Progression ◽

Ngs Data

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next generation sequencing (NGS) data, and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent works on "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications as it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses.

Download Full-text

hypeR: An R Package for Geneset Enrichment Workflows

10.1101/656637 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

Link Type ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.

Download Full-text

Modeling Dynamic Heterogeneity Using Gaussian Processes

Journal of Marketing Research ◽

10.1177/0022243719874047 ◽

2019 ◽

Vol 57 (1) ◽

pp. 55-77 ◽

Cited By ~ 3

Author(s):

Ryan Dew ◽

Asim Ansari ◽

Yang Li

Keyword(s):

Gaussian Processes ◽

Marketing Research ◽

Population Level ◽

Online Reviews ◽

Parameter Estimates ◽

Model Parameters ◽

Dynamic Heterogeneity ◽

Cross Sectional ◽

Individual Level ◽

The Rich

Marketing research relies on individual-level estimates to understand the rich heterogeneity of consumers, firms, and products. While much of the literature focuses on capturing static cross-sectional heterogeneity, little research has been done on modeling dynamic heterogeneity, or the heterogeneous evolution of individual-level model parameters. In this work, the authors propose a novel framework for capturing the dynamics of heterogeneity, using individual-level, latent, Bayesian nonparametric Gaussian processes. Similar to standard heterogeneity specifications, this Gaussian process dynamic heterogeneity (GPDH) specification models individual-level parameters as flexible variations around population-level trends, allowing for sharing of statistical information both across individuals and within individuals over time. This hierarchical structure provides precise individual-level insights regarding parameter dynamics. The authors show that GPDH nests existing heterogeneity specifications and that not flexibly capturing individual-level dynamics may result in biased parameter estimates. Substantively, they apply GPDH to understand preference dynamics and to model the evolution of online reviews. Across both applications, they find robust evidence of dynamic heterogeneity and illustrate GPDH’s rich managerial insights, with implications for targeting, pricing, and market structure analysis.

Download Full-text

Meeting UK dietary recommendations is associated with higher estimated consumer food costs: an analysis using the National Diet and Nutrition Survey and consumer expenditure data, 2008–2012

Public Health Nutrition ◽

10.1017/s1368980017003275 ◽

2017 ◽

Vol 21 (5) ◽

pp. 948-956 ◽

Cited By ~ 12

Author(s):

Nicholas RV Jones ◽

Tammy YN Tong ◽

Pablo Monsivais

Keyword(s):

Saturated Fat ◽

Population Level ◽

Cross Sectional Study ◽

Primary Outcome Measure ◽

Future Research ◽

Consumer Expenditure ◽

Dietary Recommendations ◽

Nutrition Survey ◽

Cross Sectional ◽

Individual Level

AbstractObjectiveTo test whether diets achieving recommendations from the UK’s Scientific Advisory Committee on Nutrition (SACN) were associated with higher monetary costs in a nationally representative sample of UK adults.DesignA cross-sectional study linking 4 d diet diaries in the National Diet and Nutrition Survey (NDNS) to contemporaneous food price data from a market research firm. The monetary cost of diets was assessed in relation to whether or not they met eight food- and nutrient-based recommendations from SACN. Regression models adjusted for potential confounding factors. The primary outcome measure was individual dietary cost per day and per 2000 kcal (8368 kJ).SettingUK.SubjectsAdults (n 2045) sampled between 2008 and 2012 in the NDNS.ResultsOn an isoenergetic basis, diets that met the recommendations for fruit and vegetables, oily fish, non-milk extrinsic sugars, fat, saturated fat and salt were estimated to be between 3 and 17 % more expensive. Diets meeting the recommendation for red and processed meats were 4 % less expensive, while meeting the recommendation for fibre was cost-neutral. Meeting multiple targets was also associated with higher costs; on average, diets meeting six or more SACN recommendations were estimated to be 29 % more costly than isoenergetic diets that met no recommendations.ConclusionsFood costs may be a population-level barrier limiting the adoption of dietary recommendations in the UK. Future research should focus on identifying systems- and individual-level strategies to enable consumers achieve dietary recommendations without increasing food costs. Such strategies may improve the uptake of healthy eating in the population.

Download Full-text

MetQy – an R package to query metabolic functions of genes and genomes

10.1101/215525 ◽

2017 ◽

Author(s):

Andrea Martinez–Vernon ◽

Frederick Farrell ◽

Orkun S. Soyer

Keyword(s):

Metabolic Engineering ◽

R Package ◽

Biological Functions ◽

Sequencing Data ◽

Kegg Database ◽

Flat File ◽

Link Type ◽

Metabolic Functions ◽

Rapid Accumulation ◽

Available Information

AbstractSummaryWith the rapid accumulation of sequencing data from genomic and metagenomic studies, there is an acute need for better tools that facilitate their analyses against biological functions. To this end, we developed MetQy, an open–source R package designed for query–based analysis of functional units in [meta]genomes and/or sets of genes using the The Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Furthermore, MetQy contains visualization and analysis tools and facilitates KEGG’s flat file manipulation. Thus, MetQy enables better understanding of metabolic capabilities of known genomes or user–specified [meta]genomes by using the available information and can help guide studies in microbial ecology, metabolic engineering and synthetic biology.Availability and ImplementationThe MetQy R package is freely available and can be downloaded from our group’s website (http://osslab.lifesci.warwick.ac.uk) or GitHub (https://github.com/OSS-Lab/MetQy)[email protected]

Download Full-text

Content bias in the cultural evolution of house finch song

10.1101/2021.03.05.434109 ◽

2021 ◽

Author(s):

Mason Youngblood ◽

David Lahti

Keyword(s):

Cultural Evolution ◽

Cultural Transmission ◽

Population Level ◽

R Package ◽

House Finch ◽

Population Declines ◽

Individual Level ◽

Transmission Mechanisms ◽

Content Bias ◽

Approximate Bayesian

In this study, we used a longitudinal dataset of house finch (Haemorhous mexicanus) song recordings spanning four decades in the introduced eastern range to assess how individual-level cultural transmission mechanisms drive population-level changes in birdsong. First, we developed an agent-based model (available as a new R package called TransmissionBias) that simulates the cultural transmission of house finch song given different parameters related to transmission biases, or biases in social learning that modify the probability of adoption of particular cultural variants. Next, we used approximate Bayesian computation and machine learning to estimate what parameter values likely generated the temporal changes in diversity in our observed data. We found evidence that strong content bias, likely targeted towards syllable complexity, plays a central role in the cultural evolution of house finch song in western Long Island. Frequency and demonstrator biases appear to be neutral or absent. Additionally, we estimated that house finch song is transmitted with extremely high fidelity. Future studies should use our simulation framework to better understand how cultural transmission and population declines influence song diversity in wild populations.

Download Full-text

3.5KJPNv2, An allele frequency panel of 3,552 Japanese Individuals

10.1101/529529 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shu Tadaka ◽

Fumiki Katsuoka ◽

Masao Ueki ◽

Kaname Kojima ◽

Satoshi Makino ◽

...

Keyword(s):

Allele Frequency ◽

X Chromosome ◽

Japanese Population ◽

Population Level ◽

Genetic Variations ◽

Genomic Information ◽

Personalized Healthcare ◽

Individual Level ◽

Link Type

AbstractThe first step towards realizing personalized healthcare is to catalog the genetic variations in a population. Since the dissemination of individual-level genomic information is strictly controlled, it will be useful to construct population-level allele frequency panels and to provide them through easy-to-use interfaces.In the Tohoku Medical Megabank Project, we have sequenced nearly 4,000 individuals from a Japanese population, and constructed an allele frequency panel of 3,552 individuals after removing related samples. The panel is called the 3.5KJPNv2. It was constructed by using a standard pipeline including the 1KGP and gnomAD algorithms to reduce technical biases and to allow comparisons to other populations. Our database is the first largescale panel providing the frequencies of variants present on the X chromosome and on the mitochondria in the Japanese population. All the data are available on our original database at https://jmorp.megabank.tohoku.ac.jp.

Download Full-text