The most developmentally truncated fishes show extensive Hox gene loss and miniaturized genomes

AbstractHox genes play a fundamental role in regulating the embryonic development of all animals. Manipulation of these transcription factors in model organisms has unraveled key aspects of evolution, like the transition from fin to limb. However, by virtue of their fundamental role and pleiotropic effects, simultaneous knockouts of several of these genes pose significant challenges. Here, we report on evolutionary simplification in two species of the dwarf minnow genus Paedocypris using whole genome sequencing. The two species feature unprecedented Hox gene loss and genome reduction in association with their massive developmental truncation. We also show how other genes involved in the development of musculature, nervous system, and skeleton have been lost in Paedocypris, mirroring its highly progenetic phenotype. Further, we identify two mechanisms responsible for genome streamlining: severe intron shortening and reduced repeat content. As a naturally simplified system closely related to zebrafish, Paedocypris provides novel insights into vertebrate development.

Download Full-text

Challenges and Approaches to Genotyping Repetitive DNA

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400771 ◽

2019 ◽

Vol 10 (1) ◽

pp. 417-430 ◽

Cited By ~ 2

Author(s):

Elizabeth A. Morton ◽

Ashley N. Hall ◽

Elizabeth Kwan ◽

Calvin Mok ◽

Konstantin Queitsch ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Repetitive Dna ◽

Complex Traits ◽

Copy Number ◽

Model Organisms ◽

Whole Genome ◽

Copy Number Estimation ◽

Rdna Copy ◽

Rdna Copy Number

Individuals within a species can exhibit vast variation in copy number of repetitive DNA elements. This variation may contribute to complex traits such as lifespan and disease, yet it is only infrequently considered in genotype-phenotype associations. Although the possible importance of copy number variation is widely recognized, accurate copy number quantification remains challenging. Here, we assess the technical reproducibility of several major methods for copy number estimation as they apply to the large repetitive ribosomal DNA array (rDNA). rDNA encodes the ribosomal RNAs and exists as a tandem gene array in all eukaryotes. Repeat units of rDNA are kilobases in size, often with several hundred units comprising the array, making rDNA particularly intractable to common quantification techniques. We evaluate pulsed-field gel electrophoresis, droplet digital PCR, and Nextera-based whole genome sequencing as approaches to copy number estimation, comparing techniques across model organisms and spanning wide ranges of copy numbers. Nextera-based whole genome sequencing, though commonly used in recent literature, produced high error. We explore possible causes for this error and provide recommendations for best practices in rDNA copy number estimation. We present a resource of high-confidence rDNA copy number estimates for a set of S. cerevisiae and C. elegans strains for future use. We furthermore explore the possibility for FISH-based copy number estimation, an alternative that could potentially characterize copy number on a cellular level.

Download Full-text

Whole genome sequencing identifies a novel factor required for secretory granule maturation in Tetrahymena thermophila

10.1101/042085 ◽

2016 ◽

Author(s):

Cassandra Kontur ◽

Santosh Kumar ◽

Xun Lan ◽

Jonathan K Pritchard ◽

Aaron P Turkewitz

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Tetrahymena Thermophila ◽

Secretory Granules ◽

Protein A ◽

Model Organisms ◽

Whole Genome ◽

Lysosomal Sorting ◽

Granule Maturation ◽

Multiple Strains

Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded into a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies in part on ancestral lysosomal sorting machinery but is also likely to involve novel factors. In prior work, multiple strains with defect in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation, a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wildtype copy of MMA1, and disruption of MMA1 in an otherwise wildtype strain generated a phenocopy of UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation.

Download Full-text

Identifying TCDD-resistance genes via murine and rat comparative genomics and transcriptomics

10.1101/602698 ◽

2019 ◽

Cited By ~ 1

Author(s):

Stephenie D. Prokopec ◽

Aileen Lu ◽

Sandy Che-Eun S. Lee ◽

Cindy Q. Yao ◽

Ren X. Sun ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Transgenic Mouse ◽

Genome Sequencing ◽

Mrna Abundance ◽

Model Organisms ◽

Whole Genome ◽

Contributing Factors ◽

Rat Models ◽

Toxic Responses ◽

Mouse Lines

AbstractThe aryl hydrocarbon receptor (AHR) mediates many of the toxic effects of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). However, the AHR alone is insufficient to explain the widely different outcomes among organisms. Attempts to identify unknown factor(s) have been confounded by genetic variability of model organisms. Here, we evaluated three transgenic mouse lines, each expressing a different rat AHR isoform (rWT, DEL, and INS), as well as C57BL/6 and DBA/2 mice. We supplement these with whole-genome sequencing and transcriptomic analyses of the corresponding rat models: Long-Evans (L-E) and Han/Wistar (H/W) rats. These integrated multi-species genomic and transcriptomic data were used to identify genes associated with TCDD-response phenotypes.We identified several genes that show consistent transcriptional changes in both transgenic mice and rats. Hepatic Pxdc1 was significantly repressed by TCDD in C57BL/6, rWT mice, and in L-E rat. Three genes demonstrated different AHRE-1 (full) motif occurrences within their promoter regions: Cxxc5 had fewer occurrences in H/W, as compared with L-E; Sugp1 and Hgfac (in either L-E or H/W respectively). These genes also showed different patterns of mRNA abundance across strains.The AHR isoform explains much of the transcriptional variability: up to 50% of genes with altered mRNA abundance following TCDD exposure are associated with a single AHR isoform (30% and 10% unique to DEL and rWT respectively following 500 μg/kg TCDD). Genomic and transcriptomic evidence allowed identification of genes potentially involved in phenotypic outcomes: Pxdc1 had differential mRNA abundance by phenotype; Cxxc5 had altered AHR binding sites and differential mRNA abundance.Author SummaryEnvironmental contaminants such as dioxins cause many toxic responses, anything from chloracne (common in humans) to death. These toxic responses are mostly regulated by the Ahr, a ligand-activated transcription factor with roles in drug metabolism and immune responses, however other contributing factors remain unclear. Studies are complicated by the underlying genetic heterogeneity of model organisms. Our team evaluated a number of mouse and rat models, including two strains of mouse, two strains of rat and three transgenic mouse lines which differ only at the Ahr locus, that present widely different sensitivities to the most potent dioxin: 2,3,7,8 tetrachlorodibenzo-p-dioxin (TCDD). We identified a number of changes to gene expression that were associated with different toxic responses. We then contrasted these findings with results from whole-genome sequencing of the H/W and L-E rats and found some key genes, such as Cxxc5 and Mafb, which might contribute to TCDD toxicity. These transcriptomic and genomic datasets will provide a valuable resource for future studies into the mechanisms of dioxin toxicities.

Download Full-text

Mapping challenging mutations by whole-genome sequencing

10.1101/036046 ◽

2016 ◽

Author(s):

Harold E. Smith ◽

Amy S. Fabritius ◽

Aimee Jaramillo-Lambert ◽

Andy Golden

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Single Gene ◽

Global Scale ◽

Model Organisms ◽

Whole Genome ◽

Genetic Screens ◽

Alternative Approach ◽

Mutation Identification

ABSTRACTWhole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semi-dominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means.

Download Full-text

Genomic Sequencing To Identify Potential Causative Mutation(s) of Neurospora crassa col-4

Microbiology Resource Announcements ◽

10.1128/mra.01009-19 ◽

2020 ◽

Vol 9 (2) ◽

Author(s):

Thomas A. Randall

Keyword(s):

Neurospora Crassa ◽

Whole Genome Sequencing ◽

Genetic Markers ◽

Genome Sequencing ◽

Model Organisms ◽

Causative Mutation ◽

Genomic Sequencing ◽

Whole Genome ◽

Content Type

In many cases, genes for commonly used genetic markers in model organisms have not been identified; therefore, it is of interest to identify the causative genes. Whole-genome sequencing was used to identify potential causative mutations for a col-4 allele of Neurospora crassa.

Download Full-text

ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-03980-5 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 1

Author(s):

Elisa Pischedda ◽

Cristina Crava ◽

Martina Carlassara ◽

Susanna Zucca ◽

Leila Gasmi ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Model Organisms ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Regulation Of Expression ◽

Transfer Event ◽

A Genome ◽

Integration Sites

Abstract Background Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. Recent genomics and metagenomics analyses have shown that viruses also integrate into the genome of non-model organisms (i.e., arthropods, fish, plants, vertebrates). However, rarely studies of endogenous viral elements (EVEs) in non-model organisms have gone beyond their characterization from reference genome assemblies. In non-model organisms, we lack a thorough understanding of the widespread occurrence of EVEs and their biological relevance, apart from sporadic cases which nevertheless point to significant roles of EVEs in immunity and regulation of expression. The concomitance of repetitive DNA, duplications and/or assembly fragmentations in a genome sequence and intrasample variability in whole-genome sequencing (WGS) data could determine misalignments when mapping data to a genome assembly. This phenomenon hinders our ability to properly identify integration sites. Results To fill this gap, we developed ViR, a pipeline which solves the dispersion of reads due to intrasample variability in sequencing data from both single and pooled DNA samples thus ameliorating the detection of integration sites. We tested ViR to work with both in silico and real sequencing data from a non-model organism, the arboviral vector Aedes albopictus. Potential viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results. Conclusion ViR will open new venues to explore the biology of EVEs, especially in non-model organisms. Importantly, while we generated ViR with the identification of EVEs in mind, its application can be extended to detect any lateral transfer event providing an ad-hoc sequence to interrogate.

Download Full-text

Two Rapidly Growing Mycobacterial Species Isolated from a Brain Abscess: First Whole-Genome Sequences of Mycobacterium immunogenum and Mycobacterium llatzerense

Journal of Clinical Microbiology ◽

10.1128/jcm.00402-15 ◽

2015 ◽

Vol 53 (7) ◽

pp. 2374-2377 ◽

Cited By ~ 27

Author(s):

Alexander L. Greninger ◽

Charles Langelier ◽

Gail Cunningham ◽

Chris Keh ◽

Michael Melgar ◽

...

Keyword(s):

Central Nervous System ◽

Nervous System ◽

Brain Abscess ◽

Genome Sequencing ◽

Whole Genome ◽

Mycobacterial Species ◽

Central Nervous System Infections ◽

Genome Sequences ◽

Content Type ◽

Rapidly Growing Mycobacteria

Rapidly growing mycobacteria are rarely found in central nervous system infections. We describe a case of polymicrobial infection in a brain abscess including two rapidly growingMycobacteriumspecies,M. immunogenumandM. llatzerense. TheMycobacteriumisolates were distinguishable by molecular methods, and whole-genome sequencing showed <60% pairwise nucleotide identity.

Download Full-text

Deciphering complex genome rearrangements in C. elegans using short-read whole genome sequencing

Scientific Reports ◽

10.1038/s41598-021-97764-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tatiana Maroilley ◽

Xiao Li ◽

Matthew Oldach ◽

Francesca Jean ◽

Susan J. Stasiuk ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Chromosomal Rearrangements ◽

Large Deletion ◽

Genomic Rearrangements ◽

Model Organisms ◽

Whole Genome ◽

Short Read ◽

C Elegans ◽

Long Read

AbstractGenomic rearrangements cause congenital disorders, cancer, and complex diseases in human. Yet, they are still understudied in rare diseases because their detection is challenging, despite the advent of whole genome sequencing (WGS) technologies. Short-read (srWGS) and long-read WGS approaches are regularly compared, and the latter is commonly recommended in studies focusing on genomic rearrangements. However, srWGS is currently the most economical, accurate, and widely supported technology. In Caenorhabditis elegans (C. elegans), such variants, induced by various mutagenesis processes, have been used for decades to balance large genomic regions by preventing chromosomal crossover events and allowing the maintenance of lethal mutations. Interestingly, those chromosomal rearrangements have rarely been characterized on a molecular level. To evaluate the ability of srWGS to detect various types of complex genomic rearrangements, we sequenced three balancer strains using short-read Illumina technology. As we experimentally validated the breakpoints uncovered by srWGS, we showed that, by combining several types of analyses, srWGS enables the detection of a reciprocal translocation (eT1), a free duplication (sDp3), a large deletion (sC4), and chromoanagenesis events. Thus, applying srWGS to decipher real complex genomic rearrangements in model organisms may help designing efficient bioinformatics pipelines with systematic detection of complex rearrangements in human genomes.

Download Full-text

First de-novo transcriptome assembly of a South American frog, Oreobates cruralis, enables population genomic studies of Neotropical amphibians

10.7287/peerj.preprints.2980v1 ◽

2017 ◽

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A Leonard ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

South American ◽

Whole Genome ◽

Rna Seq ◽

De Novo Transcriptome

Whole genome sequencing is opening the door to novel insights into the population structure and evolutionary history of poorly known species. In organisms with large genomes, which includes most amphibians, whole-genome sequencing is excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome to facilitate assembly and the transcriptome sequence must be assembled de-novo. We used RNA-seq to obtain the transcriptome profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a workflow to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our workflow guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/PRACTICAL-GUIDE-TO-BUILD-DE-NOVO-TRANSCRIPTOME-ASSEMBLIES-FOR-NON-MODEL-ORGANISMS/wiki

Download Full-text

Homologous Escherichia coli Identified in Cerebrospinal Fluid and Bloodstream

Frontiers in Cellular and Infection Microbiology ◽

10.3389/fcimb.2021.674235 ◽

2021 ◽

Vol 11 ◽

Author(s):

Qingmiao Shi ◽

Jun Zhang ◽

Jinghui Wang ◽

Lijuan Du ◽

Zhaoyang Shi ◽

...

Keyword(s):

Central Nervous System ◽

Phylogenetic Analysis ◽

Cerebrospinal Fluid ◽

Nervous System ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Central Nervous System Infection ◽

Whole Genome ◽

E Coli ◽

Carbapenem Resistant

BackgroundEscherichia coli is an opportunistic bacterium that causes a wide range of diseases, such as bloodstream infection and central nervous system infection. The traditional culture-based method to detect E. coli usually takes more than 2 days. The object of this study is to explore the value of metagenomic next-generation sequencing (mNGS) in identifying E. coli from human cerebrospinal fluid. In addition, we investigated the infection source of E. coli through whole genome sequencing and phylogenetic analysis.MethodsWe combined a clinical example to analyze the function of mNGS in pathogen detection from cerebrospinal fluid. NextSeq 550Dx platform was applied for mNGS. Next, whole genome sequencing was performed to obtain the genomic characterization of E. coli. Furthermore, we screened 20 E. coli strains from the National Center for Biotechnology Information and conducted a phylogenetic analysis.ResultsA middle-aged patient who attended our hospital was diagnosed with craniopharyngioma and received surgery. The patient had recurrent fever and persistent lethargy after surgery. Cerebrospinal fluid culture firstly failed to grow the bacteria. Next the cerebrospinal fluid sample was detected by mNGS and the sequence readings of E. coli were identified. Later, E. coli was reported via the second cerebrospinal fluid culture, certifying the result of mNGS. Moreover, we also cultured carbapenem-resistant E. coli from the patient’s bloodstream. Through whole genome sequencing and phylogenetic analysis, we found that the E. coli isolated from cerebrospinal fluid and the bloodstream was 100% homologous, indicating the E. coli central nervous system infection was originated from the bloodstream.ConclusionMetagenomic next-generation sequencing is a valuable tool to identify the pathogens from cerebrospinal fluid, and seeking the infection source is of great significance in clinical diagnosis and treatment. Furthermore, carbapenem-resistant E. coli is a serious problem as the cause of bloodstream infection and central nervous system infection, and effective and adequate measures to prevent and control the present circumstance are urgent.

Download Full-text