SodaPop: A computational suite for simulating the dynamics of asexual populations

Mapping Intimacies ◽

10.1101/189142 ◽

2017 ◽

Author(s):

Louis Gauthier ◽

Rémicia Di Franco ◽

Adrian W.R. Serohijos

Keyword(s):

Protein Evolution ◽

Large Scale ◽

Fitness Landscape ◽

Viral Evolution ◽

Supplementary Information ◽

Fitness Effects ◽

A Cell ◽

Supplementary Material ◽

Asexual Populations ◽

Large Scale Simulations

AbstractMotivationSimulating protein evolution with realistic constraints from population genetics is essential in addressing problems in molecular evolution, from understanding the forces shaping the evolutionary landscape to the clinical challenges of antibiotic resistance, viral evolution and cancer.ResultsTo address this need, we present SodaPop, a new forward-time simulator of large asexual populations aimed at studying their structure, dynamics and the distribution of fitness effects with flexible assumptions on the fitness landscape. SodaPop integrates biochemical and biophysical properties in a cell-based, object-oriented framework and provides an efficient, open-source toolkit for performing large-scale simulations of protein evolution.Availability and implementationSource code and binaries are freely available at https://github.com/louisgt/SodaPop under the GNU GPLv3 license. The software is implemented in C++ and supported on Linux, Mac OS/X and [email protected] informationSupplementary information is available on the Github project page.

SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning

10.1101/118901 ◽

2017 ◽

Cited By ~ 9

Author(s):

Bo Wang ◽

Daniele Ramazzotti ◽

Luca De Sano ◽

Junjie Zhu ◽

Emma Pierson ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Single Cell Analysis ◽

R Package ◽

Supplementary Information ◽

Cell Analysis ◽

Rna Seq ◽

A Cell ◽

Supplementary Material ◽

Public Datasets

AbstractMotivationWe here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.Availability and ImplementationSIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on [email protected] or [email protected] InformationSupplementary data are available at Bioinformatics online.

Bacmeta: simulation for genomic evolution in bacterial metapopulations

10.1101/175257 ◽

2017 ◽

Author(s):

Aleksi Sipola ◽

Pekka Marttinen ◽

Jukka Corander

Keyword(s):

Large Scale ◽

Sequence Data ◽

Supplementary Information ◽

Neutral Evolution ◽

Bacterial Populations ◽

Fisher Model ◽

Cluster Environment ◽

Supplementary Material ◽

Stochastic Events ◽

Large Scale Simulations

AbstractThe advent of genomic data from densely sampled bacterial populations has created a need for flexible simulators by which models and hypotheses can be efficiently investigated in the light of empirical observations. Bacmeta provides fast stochastic simulation of neutral evolution within a large collection of interconnected bacterial populations with completely adjustable connectivity network. Stochastic events of mutations, recombinations, insertions/deletions, migrations and microepidemics can be simulated in discrete non-overlapping generations with a Wright-Fisher model that operates on explicit sequence data of any desired genome length. Each model component, including locus, bacterial strain, population, and ultimately the whole metapopulation, is efficiently simulated using C++ objects, and detailed metadata from each level of the simulation can be acquired. The software can be executed in a cluster environment using simple textual input files, enabling, e.g., large-scale simulations and likelihood-free inference. Bacmeta is implemented with C++ for Linux, Mac and Windows. It is available at https://bitbucket.org/aleksisipola/bacmeta under the BSD 3-clause [email protected],[email protected] informationSupplementary data are available online at bioRxiv.

Large-scale structure prediction by improved contact predictions and model quality assessment

10.1101/128231 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mirco Michel ◽

David Menéndez Hurtado ◽

Karolis Uziela ◽

Arne Elofsson

Keyword(s):

Structure Prediction ◽

Large Scale ◽

Supplementary Information ◽

Model Quality ◽

Contact Maps ◽

Folding Algorithm ◽

Unknown Structure ◽

Supplementary Material ◽

Direct Coupling Analysis ◽

Contact Predictions

AbstractMotivationAccurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known.ResultsWe present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these 415 have not been reported before.AvailabilityDatasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/. All programs used here are freely [email protected] informationNo supplementary data

MODE-TASK: Large-scale protein motion tools

10.1101/217505 ◽

2017 ◽

Author(s):

Caroline Ross ◽

Bilal Nizami ◽

Michael Glenister ◽

Olivier Sheik Amamuddy ◽

Ali Rana Atilgan ◽

...

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Normal Mode Analysis ◽

Md Simulations ◽

Supplementary Information ◽

Mode Analysis ◽

Analysis Tool ◽

Link Type ◽

Supplementary Material ◽

Anisotropic Network

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Methods for Automatic Reference Trees and Multilevel Phylogenetic Placement

10.1101/299792 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lucas Czech ◽

Alexandros Stamatakis

Keyword(s):

Large Scale ◽

Sequence Data ◽

Sequence Similarity ◽

Computational Effort ◽

Supplementary Information ◽

Data Sets ◽

Metagenomic Sequencing ◽

Sequencing Studies ◽

Manual Selection ◽

Supplementary Material

AbstractMotivationIn most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.ResultsWe present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence data sets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.ImplementationFreely available under GPLv3 at http://github.com/lczech/[email protected] InformationSupplementary data are available at Bioinformatics online.

Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions

10.1101/108480 ◽

2017 ◽

Cited By ~ 3

Author(s):

Halil Kilicoglu

Keyword(s):

Text Mining ◽

Biomedical Research ◽

Large Scale ◽

Research Output ◽

Guideline Development ◽

Research Integrity ◽

Supplementary Information ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Supplementary Material

AbstractAn estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted, due to problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the end result of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part towards enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload, and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can add checks and balances that promote responsible research practices and can provide significant benefits for the biomedical research enterprise.Supplementary informationSupplementary material is available at BioRxiv.

Dopaminergic and Cholinergic Modulation of Large Scale Networks in silico Using Snudda

Frontiers in Neural Circuits ◽

10.3389/fncir.2021.748989 ◽

2021 ◽

Vol 15 ◽

Author(s):

Johanna Frost Nylen ◽

Jarl Jacob Johannes Hjorth ◽

Sten Grillner ◽

Jeanette Hellgren Kotaleski

Keyword(s):

Ion Channels ◽

Large Scale ◽

Single Cells ◽

Critical Role ◽

Neuron Models ◽

Bath Application ◽

A Cell ◽

Large Scale Networks ◽

Circuit Function ◽

Large Scale Simulations

Neuromodulation is present throughout the nervous system and serves a critical role for circuit function and dynamics. The computational investigations of neuromodulation in large scale networks require supportive software platforms. Snudda is a software for the creation and simulation of large scale networks of detailed microcircuits consisting of multicompartmental neuron models. We have developed an extension to Snudda to incorporate neuromodulation in large scale simulations. The extended Snudda framework implements neuromodulation at the level of single cells incorporated into large-scale microcircuits. We also developed Neuromodcell, a software for optimizing neuromodulation in detailed multicompartmental neuron models. The software adds parameters within the models modulating the conductances of ion channels and ionotropic receptors. Bath application of neuromodulators is simulated and models which reproduce the experimentally measured effects are selected. In Snudda, we developed an extension to accommodate large scale simulations of neuromodulation. The simulator has two modes of simulation – denoted replay and adaptive. In the replay mode, transient levels of neuromodulators can be defined as a time-varying function which modulates the receptors and ion channels within the network in a cell-type specific manner. In the adaptive mode, spiking neuromodulatory neurons are connected via integrative modulating mechanisms to ion channels and receptors. Both modes of simulating neuromodulation allow for simultaneous modulation by several neuromodulators that can interact dynamically with each other. Here, we used the Neuromodcell software to simulate dopaminergic and muscarinic modulation of neurons from the striatum. We also demonstrate how to simulate different neuromodulatory states with dopamine and acetylcholine using Snudda. All software is freely available on Github, including tutorials on Neuromodcell and Snudda-neuromodulation.

Trevolver: simulating non-reversible DNA sequence evolution in trinucleotide context on a bifurcating tree

10.1101/672717 ◽

2019 ◽

Author(s):

Chase W. Nelson ◽

Yunxin Fu ◽

Wen-Hsiung Li

Keyword(s):

Dna Sequence ◽

De Novo ◽

Viral Evolution ◽

Supplementary Information ◽

Mutation Rates ◽

Sequence Evolution ◽

Mutation Model ◽

Supplementary Material ◽

Dna Sequence Evolution ◽

Reversible Mutation

AbstractSummaryRecent de novo mutation data allow the estimation of non-reversible mutation rates for trinucleotide sequence contexts. However, existing tools for simulating DNA sequence evolution are limited to time-reversible models or do not consider trinucleotide context-dependent rates. As this ability is critical to testing evolutionary scenarios under neutrality, we created Trevolver. Sequence evolution is simulated on a bifurcating tree using a 64 × 4 trinucleotide mutation model. Runtime is fast and results match theoretical expectation for CpG sites. Simulations with Trevolver will enable neutral hypotheses to be tested at within-species (polymorphism), between-species (divergence), within-host (e.g., viral evolution), and somatic (e.g., cancer) levels of evolutionary change.Availability and ImplementationTrevolver is implemented in Perl and available on GitHub under GNU General Public License (GPL) version 3 at https://github.com/chasewnelson/[email protected] informationFurther details and example data are available on GitHub.

SodaPop: a forward simulation suite for the evolutionary dynamics of asexual populations on protein fitness landscapes

Bioinformatics ◽

10.1093/bioinformatics/btz175 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4053-4062 ◽

Cited By ~ 1

Author(s):

Louis Gauthier ◽

Rémicia Di Franco ◽

Adrian W R Serohijos

Keyword(s):

Population Dynamics ◽

Population Size ◽

Protein Evolution ◽

Evolutionary Dynamics ◽

Fitness Landscapes ◽

Supplementary Information ◽

Biological Organization ◽

Physical Force ◽

Protein Biochemistry ◽

Asexual Populations

Abstract Motivation Protein evolution is determined by forces at multiple levels of biological organization. Random mutations have an immediate effect on the biophysical properties, structure and function of proteins. These same mutations also affect the fitness of the organism. However, the evolutionary fate of mutations, whether they succeed to fixation or are purged, also depends on population size and dynamics. There is an emerging interest, both theoretically and experimentally, to integrate these two factors in protein evolution. Although there are several tools available for simulating protein evolution, most of them focus on either the biophysical or the population-level determinants, but not both. Hence, there is a need for a publicly available computational tool to explore both the effects of protein biophysics and population dynamics on protein evolution. Results To address this need, we developed SodaPop, a computational suite to simulate protein evolution in the context of the population dynamics of asexual populations. SodaPop accepts as input several fitness landscapes based on protein biochemistry or other user-defined fitness functions. The user can also provide as input experimental fitness landscapes derived from deep mutational scanning approaches or theoretical landscapes derived from physical force field estimates. Here, we demonstrate the broad utility of SodaPop with different applications describing the interplay of selection for protein properties and population dynamics. SodaPop is designed such that population geneticists can explore the influence of protein biochemistry on patterns of genetic variation, and that biochemists and biophysicists can explore the role of population size and demography on protein evolution. Availability and implementation Source code and binaries are freely available at https://github.com/louisgt/SodaPop under the GNU GPLv3 license. The software is implemented in C++ and supported on Linux, Mac OS/X and Windows. Supplementary information Supplementary data are available at Bioinformatics online.