Joint identification of sex and sex‐linked scaffolds in non‐model organisms using low depth sequencing data

AbstractSarcopenia, the age-related loss of skeletal muscle mass and function, affects 5–13% of individuals aged over 60 years. While rodents are widely-used model organisms, which aspects of sarcopenia are recapitulated in different animal models is unknown. Here we generated a time series of phenotypic measurements and RNA sequencing data in mouse gastrocnemius muscle and analyzed them alongside analogous data from rats and humans. We found that rodents recapitulate mitochondrial changes observed in human sarcopenia, while inflammatory responses are conserved at pathway but not gene level. Perturbations in the extracellular matrix are shared by rats, while mice recapitulate changes in RNA processing and autophagy. We inferred transcription regulators of early and late transcriptome changes, which could be targeted therapeutically. Our study demonstrates that phenotypic measurements, such as muscle mass, are better indicators of muscle health than chronological age and should be considered when analyzing aging-related molecular data.

Download Full-text

AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics ◽

10.1093/bioinformatics/bty1008 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2654-2656 ◽

Cited By ~ 5

Author(s):

Guoli Ji ◽

Wenbin Ye ◽

Yaru Su ◽

Moliang Chen ◽

Guangzao Huang ◽

...

Keyword(s):

Machine Learning ◽

Alternative Splicing ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Supplementary Information ◽

Model Organisms ◽

Sequencing Data ◽

Extensive Evaluation ◽

Reference Genomes

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome-scale metabolic network reconstruction of model animals as a platform for translational research

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2102344118 ◽

2021 ◽

Vol 118 (30) ◽

pp. e2102344118

Author(s):

Hao Wang ◽

Jonathan L. Robinson ◽

Pinar Kocabas ◽

Johan Gustafsson ◽

Mihail Anton ◽

...

Keyword(s):

Transgenic Mice ◽

Metabolic Network ◽

Model Organisms ◽

Protein Overexpression ◽

Sequencing Data ◽

Proteomics Data ◽

Gm2 Ganglioside ◽

Species Specific ◽

Specific Reactions ◽

Genome Scale

Genome-scale metabolic models (GEMs) are used extensively for analysis of mechanisms underlying human diseases and metabolic malfunctions. However, the lack of comprehensive and high-quality GEMs for model organisms restricts translational utilization of omics data accumulating from the use of various disease models. Here we present a unified platform of GEMs that covers five major model animals, including Mouse1 (Mus musculus), Rat1 (Rattus norvegicus), Zebrafish1 (Danio rerio), Fruitfly1 (Drosophila melanogaster), and Worm1 (Caenorhabditis elegans). These GEMs represent the most comprehensive coverage of the metabolic network by considering both orthology-based pathways and species-specific reactions. All GEMs can be interactively queried via the accompanying web portal Metabolic Atlas. Specifically, through integrative analysis of Mouse1 with RNA-sequencing data from brain tissues of transgenic mice we identified a coordinated up-regulation of lysosomal GM2 ganglioside and peptide degradation pathways which appears to be a signature metabolic alteration in Alzheimer’s disease (AD) mouse models with a phenotype of amyloid precursor protein overexpression. This metabolic shift was further validated with proteomics data from transgenic mice and cerebrospinal fluid samples from human patients. The elevated lysosomal enzymes thus hold potential to be used as a biomarker for early diagnosis of AD. Taken together, we foresee that this evolving open-source platform will serve as an important resource to facilitate the development of systems medicines and translational biomedical applications.

Download Full-text

Accurate allele frequencies from ultra-low coverage pool-seq samples in evolve-and-resequence experiments

10.1101/244004 ◽

2018 ◽

Author(s):

Susanne Tilk ◽

Alan Bergland ◽

Aaron Goodman ◽

Paul Schmidt ◽

Dmitri Petrov ◽

...

Keyword(s):

Allele Frequency ◽

Model Organism ◽

Software Tool ◽

Allele Frequencies ◽

Model Organisms ◽

Sequencing Data ◽

High Coverage ◽

Next Generation Sequencing Technology ◽

Low Coverage ◽

Pooled Samples

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.

Download Full-text

Mouse Gut Microbiome-Encoded β-Glucuronidases Identified Using Metagenome Analysis Guided by Protein Structure

mSystems ◽

10.1128/msystems.00452-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 5

Author(s):

Benjamin C. Creekmore ◽

Josh H. Gray ◽

William G. Walton ◽

Kristen A. Biernat ◽

Michael S. Little ◽

...

Keyword(s):

Protein Structure ◽

Active Site ◽

Human Microbiome ◽

Drug Efficacy ◽

Human Microbiome Project ◽

Structural Features ◽

Model Organisms ◽

Mouse Strains ◽

Sequencing Data ◽

Metagenome Analysis

ABSTRACT Gut microbial β-glucuronidase (GUS) enzymes play important roles in drug efficacy and toxicity, intestinal carcinogenesis, and mammalian-microbial symbiosis. Recently, the first catalog of human gut GUS proteins was provided for the Human Microbiome Project stool sample database and revealed 279 unique GUS enzymes organized into six categories based on active-site structural features. Because mice represent a model biomedical research organism, here we provide an analogous catalog of mouse intestinal microbial GUS proteins—a mouse gut GUSome. Using metagenome analysis guided by protein structure, we examined 2.5 million unique proteins from a comprehensive mouse gut metagenome created from several mouse strains, providers, housing conditions, and diets. We identified 444 unique GUS proteins and organized them into six categories based on active-site features, similarly to the human GUSome analysis. GUS enzymes were encoded by the major gut microbial phyla, including Firmicutes (60%) and Bacteroidetes (21%), and there were nearly 20% for which taxonomy could not be assigned. No differences in gut microbial gus gene composition were observed for mice based on sex. However, mice exhibited gus differences based on active-site features associated with provider, location, strain, and diet. Furthermore, diet yielded the largest differences in gus composition. Biochemical analysis of two low-fat-associated GUS enzymes revealed that they are variable with respect to their efficacy of processing both sulfated and nonsulfated heparan nonasaccharides containing terminal glucuronides. IMPORTANCE Mice are commonly employed as model organisms of mammalian disease; as such, our understanding of the compositions of their gut microbiomes is critical to appreciating how the mouse and human gastrointestinal tracts mirror one another. GUS enzymes, with importance in normal physiology and disease, are an attractive set of proteins to use for such analyses. Here we show that while the specific GUS enzymes differ at the sequence level, a core GUSome functionality appears conserved between mouse and human gastrointestinal bacteria. Mouse strain, provider, housing location, and diet exhibit distinct GUSomes and gus gene compositions, but sex seems not to affect the GUSome. These data provide a basis for understanding the gut microbial GUS enzymes present in commonly used laboratory mice. Further, they demonstrate the utility of metagenome analysis guided by protein structure to provide specific sets of functionally related proteins from whole-genome metagenome sequencing data.

Download Full-text

An Improved Phenotype-Driven Tool for Rare Mendelian Variant Prioritization: Benchmarking Exomiser on Real Patient Whole-Exome Data

Genes ◽

10.3390/genes11040460 ◽

2020 ◽

Vol 11 (4) ◽

pp. 460

Author(s):

Valentina Cipriani ◽

Nikolas Pontikos ◽

Gavin Arno ◽

Panagiotis I. Sergouniotis ◽

Eva Lenassi ◽

...

Keyword(s):

Molecular Diagnosis ◽

Protein Interactions ◽

Model Organisms ◽

Protein Protein Interactions ◽

Sequencing Data ◽

Variant Prioritization ◽

Real Patient ◽

Human Phenotype ◽

Support Tool ◽

Variant Frequency

Next-generation sequencing has revolutionized rare disease diagnostics, but many patients remain without a molecular diagnosis, particularly because many candidate variants usually survive despite strict filtering. Exomiser was launched in 2014 as a Java tool that performs an integrative analysis of patients’ sequencing data and their phenotypes encoded with Human Phenotype Ontology (HPO) terms. It prioritizes variants by leveraging information on variant frequency, predicted pathogenicity, and gene-phenotype associations derived from human diseases, model organisms, and protein–protein interactions. Early published releases of Exomiser were able to prioritize disease-causative variants as top candidates in up to 97% of simulated whole-exomes. The size of the tested real patient datasets published so far are very limited. Here, we present the latest Exomiser version 12.0.1 with many new features. We assessed the performance using a set of 134 whole-exomes from patients with a range of rare retinal diseases and known molecular diagnosis. Using default settings, Exomiser ranked the correct diagnosed variants as the top candidate in 74% of the dataset and top 5 in 94%; not using the patients’ HPO profiles (i.e., variant-only analysis) decreased the performance to 3% and 27%, respectively. In conclusion, Exomiser is an effective support tool for rare Mendelian phenotype-driven variant prioritization.

Download Full-text

Microsatellite loci discovery from next-generation sequencing data and loci characterization in the epizoic barnacleChelonibia testudinaria(Linnaeus, 1758)

PeerJ ◽

10.7717/peerj.2019 ◽

2016 ◽

Vol 4 ◽

pp. e2019 ◽

Cited By ~ 5

Author(s):

Christine Ewers-Saucedo ◽

John D. Zardus ◽

John P. Wares

Keyword(s):

Next Generation Sequencing ◽

Microsatellite Markers ◽

Microsatellite Loci ◽

Next Generation Sequencing Data ◽

Model Organisms ◽

Next Generation ◽

Sequencing Data ◽

Genetic Studies ◽

Evolutionary Features ◽

Generation Sequencing

Microsatellite markers remain an important tool for ecological and evolutionary research, but are unavailable for many non-model organisms. One such organism with rare ecological and evolutionary features is the epizoic barnacleChelonibia testudinaria(Linnaeus, 1758).Chelonibia testudinariaappears to be a host generalist, and has an unusual sexual system, androdioecy. Genetic studies on host specificity and mating behavior are impeded by the lack of fine-scale, highly variable markers, such as microsatellite markers. In the present study, we discovered thousands of new microsatellite loci from next-generation sequencing data, and characterized 12 loci thoroughly. We conclude that 11 of these loci will be useful markers in future ecological and evolutionary studies onC. testudinaria.

Download Full-text

SECAPR - A bioinformatics pipeline for the rapid and user-friendly alignment of hybrid enrichment sequences, from raw reads to alignments

10.7287/peerj.preprints.26477v2 ◽

2018 ◽

Author(s):

Tobias Andermann ◽

Angela Cano ◽

Alexander Zizka ◽

Christine Bacon ◽

Alexandre Antonelli

Keyword(s):

Evolutionary Biology ◽

Sequence Data ◽

Model Organisms ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Sequence Alignments ◽

Multiple Sequence ◽

Sequence Capture ◽

Sequencing Platforms ◽

User Friendly

Evolutionary biology has entered an era of unprecedented amounts of DNA sequence data, as new sequencing platforms such as Massive Parallel Sequencing (MPS) can generate billions of nucleotides within less than a day. The current bottleneck is how to efficiently handle, process, and analyze such large amounts of data in an automated and reproducible way. To tackle these challenges we introduce the Sequence Capture Processor (SECAPR) pipeline for processing raw sequencing data into multiple sequence alignments for downstream phylogenetic and phylogeographic analyses. SECAPR is user-friendly and we provide an exhaustive tutorial intended for users with no prior experience with analyzing MPS output. SECAPR is particularly useful for the processing of sequence capture (= hybrid enrichment) datasets for non-model organisms, as we demonstrate using an empirical dataset of the palm genus Geonoma (Arecaceae). Various quality control and plotting functions help the user to decide on the most suitable settings for even challenging datasets. SECAPR is an easy-to-use, free, and versatile pipeline, aimed to enable efficient and reproducible processing of MPS data for many samples in parallel.

Download Full-text

An exploration of assembly strategies and quality metrics on the accuracy of the Knightia excelsa (rewarewa) genome.

10.22541/au.161048558.86691399/v1 ◽

2021 ◽

Author(s):

Ann McCartney ◽

Elena Hilario ◽

Seung-Sub Choi ◽

Joseph Guhlin ◽

Jessie Prebble ◽

...

Keyword(s):

New Zealand ◽

De Novo ◽

Quality Metrics ◽

Read Length ◽

Model Organisms ◽

Sequencing Data ◽

Contig Assembly ◽

High Quality ◽

Aotearoa New Zealand ◽

Long Read

We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.

Download Full-text

gNOMO: a multi-omics pipeline for integrated host and microbiome analysis of non-model organisms

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa058 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Maria Muñoz-Benavent ◽

Felix Hartkopf ◽

Tim Van Den Bossche ◽

Vitor C Piro ◽

Carlos García-Ferris ◽

...

Keyword(s):

Workflow Management ◽

Model Organism ◽

Model Organisms ◽

Omics Data ◽

Sequencing Data ◽

Data Types ◽

Expression Ratio ◽

Bioinformatic Pipeline ◽

Cockroach Blattella Germanica ◽

Microbiome Data

Abstract The study of bacterial symbioses has grown exponentially in the recent past. However, existing bioinformatic workflows of microbiome data analysis do commonly not integrate multiple meta-omics levels and are mainly geared toward human microbiomes. Microbiota are better understood when analyzed in their biological context; that is together with their host or environment. Nevertheless, this is a limitation when studying non-model organisms mainly due to the lack of well-annotated sequence references. Here, we present gNOMO, a bioinformatic pipeline that is specifically designed to process and analyze non-model organism samples of up to three meta-omics levels: metagenomics, metatranscriptomics and metaproteomics in an integrative manner. The pipeline has been developed using the workflow management framework Snakemake in order to obtain an automated and reproducible pipeline. Using experimental datasets of the German cockroach Blattella germanica, a non-model organism with very complex gut microbiome, we show the capabilities of gNOMO with regard to meta-omics data integration, expression ratio comparison, taxonomic and functional analysis as well as intuitive output visualization. In conclusion, gNOMO is a bioinformatic pipeline that can easily be configured, for integrating and analyzing multiple meta-omics data types and for producing output visualizations, specifically designed for integrating paired-end sequencing data with mass spectrometry from non-model organisms.

Download Full-text