Complete sequences of Schizosaccharomyces pombe subtelomeres reveal multiple patterns of genome variation

AbstractGenome sequences have been determined for many model organisms; however, repetitive regions such as centromeres, telomeres, and subtelomeres have not yet been sequenced completely. Here, we report the complete sequences of subtelomeric homologous (SH) regions of the fission yeast Schizosaccharomyces pombe. We overcame technical difficulties to obtain subtelomeric repetitive sequences by constructing strains that possess single SH regions of a standard laboratory strain. In addition, some natural isolates of S. pombe were analyzed using previous sequencing data. Whole sequences of SH regions revealed that each SH region consists of two distinct parts with mosaics of multiple common segments or blocks showing high variation among subtelomeres and strains. Subtelomere regions show relatively high frequency of nucleotide variations among strains compared with the other chromosomal regions. Furthermore, we identified subtelomeric RecQ-type helicase genes, tlh3 and tlh4, which add to the already known tlh1 and tlh2, and found that the tlh1–4 genes show high sequence variation with missense mutations, insertions, and deletions but no severe effects on their RNA expression. Our results indicate that SH sequences are highly polymorphic and hot spots for genome variation. These features of subtelomeres may have contributed to genome diversity and, conversely, various diseases.

Download Full-text

Complete sequences of Schizosaccharomyces pombe subtelomeres reveal multiple patterns of genome variation

10.1101/2020.03.09.983726 ◽

2020 ◽

Author(s):

Takuto Kaji ◽

Yusuke Oizumi ◽

Sanki Tashiro ◽

Yumiko Takeshita ◽

Junko Kanoh

Keyword(s):

Schizosaccharomyces Pombe ◽

Sequence Similarity ◽

Repetitive Sequences ◽

Proximal Part ◽

Model Organisms ◽

High Sequence Similarity ◽

Genome Database ◽

Genome Variation ◽

Complete Sequences ◽

Striking Contrast

AbstractGenome sequences have been determined for many model organisms; however, repetitive regions such as centromeres, telomeres, and subtelomeres have not yet been sequenced completely. Here, we report the complete sequences of subtelomeric homologous (SH) regions of the fission yeast Schizosaccharomyces pombe. We overcame technical difficulties to obtain subtelomeric repetitive sequences by constructing strains that possess single SH regions. Whole sequences of SH regions revealed that each SH region consists of two distinct parts: the telomere-proximal part with mosaics of multiple common segments showing high variation among subtelomeres and strains, and the telomere-distal part showing high sequence similarity among subtelomeres with some insertions and deletions. The newly sequenced SH regions showed differences in nucleotide sequences and common segment composition compared to those in the S. pombe genome database (PomBase), which is in striking contrast to the regions outside of SH, where mutations are rarely detected. Furthermore, we identified new subtelomeric RecQ-type helicase genes, tlh3 and tlh4, which add to the already known tlh1 and tlh2, and found that the tlh1–4 genes show high sequence variation. Our results indicate that SH sequences are highly polymorphic and hot spots for genome variation. These features of subtelomeres may have contributed to genome diversity and, conversely, various diseases.

Download Full-text

Molecular and phenotypic analysis of rodent models reveals conserved and species-specific modulators of human sarcopenia

Communications Biology ◽

10.1038/s42003-021-01723-z ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Anastasiya Börsch ◽

Daniel J. Ham ◽

Nitish Mittal ◽

Lionel A. Tintignac ◽

Eugenia Migliavacca ◽

...

Keyword(s):

Muscle Mass ◽

Inflammatory Responses ◽

Molecular Data ◽

Model Organisms ◽

Sequencing Data ◽

Phenotypic Analysis ◽

Age Related ◽

Analogous Data ◽

Species Specific ◽

And Function

AbstractSarcopenia, the age-related loss of skeletal muscle mass and function, affects 5–13% of individuals aged over 60 years. While rodents are widely-used model organisms, which aspects of sarcopenia are recapitulated in different animal models is unknown. Here we generated a time series of phenotypic measurements and RNA sequencing data in mouse gastrocnemius muscle and analyzed them alongside analogous data from rats and humans. We found that rodents recapitulate mitochondrial changes observed in human sarcopenia, while inflammatory responses are conserved at pathway but not gene level. Perturbations in the extracellular matrix are shared by rats, while mice recapitulate changes in RNA processing and autophagy. We inferred transcription regulators of early and late transcriptome changes, which could be targeted therapeutically. Our study demonstrates that phenotypic measurements, such as muscle mass, are better indicators of muscle health than chronological age and should be considered when analyzing aging-related molecular data.

Download Full-text

AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics ◽

10.1093/bioinformatics/bty1008 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2654-2656 ◽

Cited By ~ 5

Author(s):

Guoli Ji ◽

Wenbin Ye ◽

Yaru Su ◽

Moliang Chen ◽

Guangzao Huang ◽

...

Keyword(s):

Machine Learning ◽

Alternative Splicing ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Supplementary Information ◽

Model Organisms ◽

Sequencing Data ◽

Extensive Evaluation ◽

Reference Genomes

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A subset of pediatric thalamic gliomas share a distinct DNA methylation profile, H3K27me3 loss and frequent alteration of EGFR

10.1101/2020.08.28.239160 ◽

2020 ◽

Author(s):

Philipp Sievers ◽

Martin Sill ◽

Daniel Schrimpf ◽

Damian Stichel ◽

David E. Reuss ◽

...

Keyword(s):

Dna Methylation ◽

Therapeutic Strategy ◽

Malignant Gliomas ◽

Significant Proportion ◽

Methylation Pattern ◽

List Type ◽

Missense Mutations ◽

Sequencing Data ◽

Egfr Inhibition ◽

Astrocytic Gliomas

AbstractBackgroundMalignant astrocytic gliomas in children show a remarkable biological and clinical diversity. Small in-frame insertions or missense mutations in the EGFR gene have recently been identified in a distinct subset of pediatric bithalamic gliomas with a unique DNA methylation pattern.MethodsHere, we investigated an epigenetically homogeneous cohort of malignant gliomas (n=58) distinct from other subtypes and enriched for pediatric cases and thalamic location, in order to elucidate the overlap with this recently identified subtype of pediatric bithalamic gliomas.ResultsEGFR gene amplification was detected in 16/58 (27%) tumors, and missense mutations or small in-frame insertions in EGFR were found in 20/30 tumors with available sequencing data (67%; five of them co-occurring with EGFR amplification). Additionally, eight of the 30 tumors (27%) harbored an H3.1 or H3.3 K27M mutation (six of them with a concomitant EGFR alteration). All tumors tested showed loss of H3K27me3 staining, with evidence of EZHIP overexpression in the H3 wildtype cases. Although some tumors indeed showed a bithalamic growth pattern, a significant proportion of tumors occurred in the unilateral thalamus or in other (predominantly midline) locations.ConclusionsOur findings present a distinct molecular class of pediatric malignant gliomas largely overlapping with the recently reported bithalamic gliomas characterized by EGFR alteration, but additionally showing a broader spectrum of EGFR alterations and tumor localization. Global H3K27me3 loss in this group appears to be mediated by either H3 K27 mutation or EZHIP overexpression. EGFR inhibition may represent a potential therapeutic strategy in these highly aggressive gliomas.Key pointsThis study confirms a distinct new subset of pediatric diffuse midline glioma with H3K27me3 loss, with or without H3 K27 mutationThe poor outcome of these tumors is in line with the broader family of pediatric diffuse midline gliomas with H3 K27 mutation or EZHIP overexpressionFrequent EGFR alterations in these tumors may represent a therapeutic target in this subsetImportance of the StudyMalignant astrocytic gliomas in children show a remarkable biological and clinical diversity. Here, we highlight a distinct molecular class of pediatric malignant gliomas characterized by EGFR alteration and global H3K27me3 loss that appears to be mediated by either H3 K27 mutation or EZHIP overexpression. EGFR inhibition may represent a potential therapeutic strategy in these highly aggressive gliomas.

Download Full-text

Genome-scale metabolic network reconstruction of model animals as a platform for translational research

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2102344118 ◽

2021 ◽

Vol 118 (30) ◽

pp. e2102344118

Author(s):

Hao Wang ◽

Jonathan L. Robinson ◽

Pinar Kocabas ◽

Johan Gustafsson ◽

Mihail Anton ◽

...

Keyword(s):

Transgenic Mice ◽

Metabolic Network ◽

Model Organisms ◽

Protein Overexpression ◽

Sequencing Data ◽

Proteomics Data ◽

Gm2 Ganglioside ◽

Species Specific ◽

Specific Reactions ◽

Genome Scale

Genome-scale metabolic models (GEMs) are used extensively for analysis of mechanisms underlying human diseases and metabolic malfunctions. However, the lack of comprehensive and high-quality GEMs for model organisms restricts translational utilization of omics data accumulating from the use of various disease models. Here we present a unified platform of GEMs that covers five major model animals, including Mouse1 (Mus musculus), Rat1 (Rattus norvegicus), Zebrafish1 (Danio rerio), Fruitfly1 (Drosophila melanogaster), and Worm1 (Caenorhabditis elegans). These GEMs represent the most comprehensive coverage of the metabolic network by considering both orthology-based pathways and species-specific reactions. All GEMs can be interactively queried via the accompanying web portal Metabolic Atlas. Specifically, through integrative analysis of Mouse1 with RNA-sequencing data from brain tissues of transgenic mice we identified a coordinated up-regulation of lysosomal GM2 ganglioside and peptide degradation pathways which appears to be a signature metabolic alteration in Alzheimer’s disease (AD) mouse models with a phenotype of amyloid precursor protein overexpression. This metabolic shift was further validated with proteomics data from transgenic mice and cerebrospinal fluid samples from human patients. The elevated lysosomal enzymes thus hold potential to be used as a biomarker for early diagnosis of AD. Taken together, we foresee that this evolving open-source platform will serve as an important resource to facilitate the development of systems medicines and translational biomedical applications.

Download Full-text

Accurate allele frequencies from ultra-low coverage pool-seq samples in evolve-and-resequence experiments

10.1101/244004 ◽

2018 ◽

Author(s):

Susanne Tilk ◽

Alan Bergland ◽

Aaron Goodman ◽

Paul Schmidt ◽

Dmitri Petrov ◽

...

Keyword(s):

Allele Frequency ◽

Model Organism ◽

Software Tool ◽

Allele Frequencies ◽

Model Organisms ◽

Sequencing Data ◽

High Coverage ◽

Next Generation Sequencing Technology ◽

Low Coverage ◽

Pooled Samples

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.

Download Full-text

Abstract 2722: Large Scale Mutation Discovery Screen Identifies Functionally Variant UGDH Alleles in Patients with Atrioventricular Valve Defects

Circulation ◽

10.1161/circ.116.suppl_16.ii_604 ◽

2007 ◽

Vol 116 (suppl_16) ◽

Author(s):

Jeroen Bakkers ◽

Sonja Chocron ◽

Victor Gouriev ◽

Kelly Smith ◽

Ronald Lekanne dit Deprez ◽

...

Keyword(s):

Candidate Genes ◽

Congenital Heart Defects ◽

Congenital Heart ◽

Large Scale ◽

Heart Defects ◽

Glucose Dehydrogenase ◽

Model Organisms ◽

Missense Mutations ◽

Coding Regions

Background: Congenital heart defects are the most common birth defects. Although genetic dispositions are believed to cause CHDs, only few genes have been identified that harbour mutations causing such defects. Studies in model organisms have identified many essential genes for cardiac development. UDP-glucose dehydrogenase (UGDH) enzymatic activity is required for the signal transduction of FGF and Wnt ligands and zebrafish jekyll/ugdh mutations lack AV valves. Methods and Results: From literature candidate genes were selected that are essential for AV canal-, septum-, and valve formation. By large scale sequencing we analysed the coding regions of 36 candidate genes in 192 patients with reported AVSDs. As a result we identified 457 genetic variations of which 207 variants are in flanking non-coding regions, 156 variants are in coding regions but silent and 94 variants are non-synonymous variants that alter the protein sequence. Comparison with the available databases such as HapMap and screening 350 control individuals resulted in the validation of 49 non-synonomous missense mutations in 23 genes only present in the patient group. These included novel GATA4 missense mutations (R285C and M224V) located in the highly conserved DNA binding domains, which by in vitro analysis significantly reduce transcriptional activity of the protein. Three patients with mitral valvar prolapse and mitral regurgitation were identified with novel missense mutations in the UDP-glucose dehydrogenase (UGDH) gene (R141C and E416D). In vitro experiments demonstrated a negative affect on enzyme activity and stability by a change in protein conformation. Furthermore, experiments in zebrafish jekyll/ugdh mutants showed that UGDH R141C and UGDH E416D couldn’t rescue the defects in AV formation demonstrating an inactivating effect of these missense mutations in vivo. Conclusions: A model organism based candidate gene screen in CHD patients resulted in the identification of novel functional missense mutations in the UGDH gene not previously implicated in congenital heart defects.

Download Full-text

Mouse Gut Microbiome-Encoded β-Glucuronidases Identified Using Metagenome Analysis Guided by Protein Structure

mSystems ◽

10.1128/msystems.00452-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 5

Author(s):

Benjamin C. Creekmore ◽

Josh H. Gray ◽

William G. Walton ◽

Kristen A. Biernat ◽

Michael S. Little ◽

...

Keyword(s):

Protein Structure ◽

Active Site ◽

Human Microbiome ◽

Drug Efficacy ◽

Human Microbiome Project ◽

Structural Features ◽

Model Organisms ◽

Mouse Strains ◽

Sequencing Data ◽

Metagenome Analysis

ABSTRACT Gut microbial β-glucuronidase (GUS) enzymes play important roles in drug efficacy and toxicity, intestinal carcinogenesis, and mammalian-microbial symbiosis. Recently, the first catalog of human gut GUS proteins was provided for the Human Microbiome Project stool sample database and revealed 279 unique GUS enzymes organized into six categories based on active-site structural features. Because mice represent a model biomedical research organism, here we provide an analogous catalog of mouse intestinal microbial GUS proteins—a mouse gut GUSome. Using metagenome analysis guided by protein structure, we examined 2.5 million unique proteins from a comprehensive mouse gut metagenome created from several mouse strains, providers, housing conditions, and diets. We identified 444 unique GUS proteins and organized them into six categories based on active-site features, similarly to the human GUSome analysis. GUS enzymes were encoded by the major gut microbial phyla, including Firmicutes (60%) and Bacteroidetes (21%), and there were nearly 20% for which taxonomy could not be assigned. No differences in gut microbial gus gene composition were observed for mice based on sex. However, mice exhibited gus differences based on active-site features associated with provider, location, strain, and diet. Furthermore, diet yielded the largest differences in gus composition. Biochemical analysis of two low-fat-associated GUS enzymes revealed that they are variable with respect to their efficacy of processing both sulfated and nonsulfated heparan nonasaccharides containing terminal glucuronides. IMPORTANCE Mice are commonly employed as model organisms of mammalian disease; as such, our understanding of the compositions of their gut microbiomes is critical to appreciating how the mouse and human gastrointestinal tracts mirror one another. GUS enzymes, with importance in normal physiology and disease, are an attractive set of proteins to use for such analyses. Here we show that while the specific GUS enzymes differ at the sequence level, a core GUSome functionality appears conserved between mouse and human gastrointestinal bacteria. Mouse strain, provider, housing location, and diet exhibit distinct GUSomes and gus gene compositions, but sex seems not to affect the GUSome. These data provide a basis for understanding the gut microbial GUS enzymes present in commonly used laboratory mice. Further, they demonstrate the utility of metagenome analysis guided by protein structure to provide specific sets of functionally related proteins from whole-genome metagenome sequencing data.

Download Full-text

Bacsnp: Using Single Nucleotide Polymorphism (SNP) Specificities and Frequencies to Identify Genotype Composition in Baculoviruses

Viruses ◽

10.3390/v12060625 ◽

2020 ◽

Vol 12 (6) ◽

pp. 625 ◽

Cited By ~ 1

Author(s):

Jörg T. Wennmann ◽

Jiangbin Fan ◽

Johannes A. Jehle

Keyword(s):

Nucleotide Polymorphisms ◽

Downstream Process ◽

Sequencing Data ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Natural Isolates ◽

Genotype Composition ◽

R Programming ◽

Dsdna Viruses ◽

Virus Isolates

Natural isolates of baculoviruses (as well as other dsDNA viruses) generally consist of homogenous or heterogenous populations of genotypes. The number and positions of single nucleotide polymorphisms (SNPs) from sequencing data are often used as suitable markers to study their genotypic composition. Identifying and assigning the specificities and frequencies of SNPs from high-throughput genome sequencing data can be very challenging, especially when comparing between several sequenced isolates or samples. In this study, the new tool “bacsnp”, written in R programming langue, was developed as a downstream process, enabling the detection of SNP specificities across several virus isolates. The basis of this analysis is the use of a common, closely related reference to which the sequencing reads of an isolate are mapped. Thereby, the specificities of SNPs are linked and their frequencies can be used to analyze the genetic composition across the sequenced isolate. Here, the downstream process and analysis of detected SNP positions is demonstrated on the example of three baculovirus isolates showing the fast and reliable detection of a mixed sequenced sample.

Download Full-text

Whole Genome Sequencing of the Mutamouse Model Reveals Strain- and Colony-Level Variation, and Genomic Features of the Transgene Integration Site

Scientific Reports ◽

10.1038/s41598-019-50302-0 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Matthew J. Meier ◽

Marc A. Beal ◽

Andrew Schoenrock ◽

Carole L. Yauk ◽

Francesco Marchetti

Keyword(s):

Integration Site ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Missense Mutations ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

High Coverage ◽

Deletion Event ◽

Transgene Integration

Abstract The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse’s whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.

Download Full-text