scholarly journals Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin

Author(s):  
David B Neale ◽  
Aleksey V Zimin ◽  
Sumaira Zaman ◽  
Alison D Scott ◽  
Bikash Shrestha ◽  
...  

Abstract Sequencing, assembly, and annotation of the 26.5 Gbp hexaploid genome of coast redwood (Sequoia sempervirens) was completed leading toward discovery of genes related to climate adaptation and investigation of the origin of the hexaploid genome. Deep-coverage short-read Illumina sequencing data from haploid tissue from a single seed were combined with long-read Oxford Nanopore Technologies sequencing data from diploid needle tissue to create an initial assembly, which was then scaffolded using proximity ligation data to produce a highly contiguous final assembly, SESE 2.1, with a scaffold N50 size of 44.9 Mbp. The assembly included several scaffolds that span entire chromosome arms, confirmed by the presence of telomere and centromere sequences on the ends of the scaffolds. The structural annotation produced 118,906 genes with 113 containing introns that exceed 500 Kbp in length and one reaching 2 Mb. Nearly 19 Gbp of the genome represented repetitive content with the vast majority characterized as long terminal repeats, with a 2.9:1 ratio of Copia to Gypsy elements that may aid in gene expression control. Comparison of coast redwood to other conifers revealed species-specific expansions for a plethora of abiotic and biotic stress response genes, including those involved in fungal disease resistance, detoxification, and physical injury/structural remodeling and others supporting flavonoid biosynthesis. Analysis of multiple genes that exist in triplicate in coast redwood but only once in its diploid relative, giant sequoia, supports a previous hypothesis that the hexaploidy is the result of autopolyploidy rather than any hybridizations with separate but closely related conifer species.

Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1826
Author(s):  
Amanda R. De La Torre ◽  
Manoj K. Sekhwal ◽  
David B. Neale

Dissecting the genomic basis of local adaptation is a major goal in evolutionary biology and conservation science. Rapid changes in the climate pose significant challenges to the survival of natural populations, and the genomic basis of long-generation plant species is still poorly understood. Here, we investigated genome-wide climate adaptation in giant sequoia and coast redwood, two iconic and ecologically important tree species. We used a combination of univariate and multivariate genotype–environment association methods and a selective sweep analysis using non-overlapping sliding windows. We identified genomic regions of potential adaptive importance, showing strong associations to moisture variables and mean annual temperature. Our results found a complex architecture of climate adaptation in the species, with genomic regions showing signatures of selective sweeps, polygenic adaptation, or a combination of both, suggesting recent or ongoing climate adaptation along moisture and temperature gradients in giant sequoia and coast redwood. The results of this study provide a first step toward identifying genomic regions of adaptive significance in the species and will provide information to guide management and conservation strategies that seek to maximize adaptive potential in the face of climate change.


Author(s):  
Eric S Tvedte ◽  
Mark Gasser ◽  
Benjamin C Sparklin ◽  
Jane Michalski ◽  
Carl E Hjelmen ◽  
...  

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Caitlin M. Singleton ◽  
Francesca Petriglieri ◽  
Jannie M. Kristensen ◽  
Rasmus H. Kirkegaard ◽  
Thomas Y. Michaelsen ◽  
...  

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Anastasiya Börsch ◽  
Daniel J. Ham ◽  
Nitish Mittal ◽  
Lionel A. Tintignac ◽  
Eugenia Migliavacca ◽  
...  

AbstractSarcopenia, the age-related loss of skeletal muscle mass and function, affects 5–13% of individuals aged over 60 years. While rodents are widely-used model organisms, which aspects of sarcopenia are recapitulated in different animal models is unknown. Here we generated a time series of phenotypic measurements and RNA sequencing data in mouse gastrocnemius muscle and analyzed them alongside analogous data from rats and humans. We found that rodents recapitulate mitochondrial changes observed in human sarcopenia, while inflammatory responses are conserved at pathway but not gene level. Perturbations in the extracellular matrix are shared by rats, while mice recapitulate changes in RNA processing and autophagy. We inferred transcription regulators of early and late transcriptome changes, which could be targeted therapeutically. Our study demonstrates that phenotypic measurements, such as muscle mass, are better indicators of muscle health than chronological age and should be considered when analyzing aging-related molecular data.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chong Chu ◽  
Rebeca Borges-Monroy ◽  
Vinayak V. Viswanadham ◽  
Soohyun Lee ◽  
Heng Li ◽  
...  

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.


2020 ◽  
Author(s):  
Andrew J. Page ◽  
Nabil-Fareed Alikhan ◽  
Michael Strinden ◽  
Thanh Le Viet ◽  
Timofey Skvortsov

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4840 ◽  
Author(s):  
Kai Wei ◽  
Tingting Zhang ◽  
Lei Ma

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.


Author(s):  
Huan Zhong ◽  
Zongwei Cai ◽  
Zhu Yang ◽  
Yiji Xia

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.


2021 ◽  
Vol 118 (30) ◽  
pp. e2102344118
Author(s):  
Hao Wang ◽  
Jonathan L. Robinson ◽  
Pinar Kocabas ◽  
Johan Gustafsson ◽  
Mihail Anton ◽  
...  

Genome-scale metabolic models (GEMs) are used extensively for analysis of mechanisms underlying human diseases and metabolic malfunctions. However, the lack of comprehensive and high-quality GEMs for model organisms restricts translational utilization of omics data accumulating from the use of various disease models. Here we present a unified platform of GEMs that covers five major model animals, including Mouse1 (Mus musculus), Rat1 (Rattus norvegicus), Zebrafish1 (Danio rerio), Fruitfly1 (Drosophila melanogaster), and Worm1 (Caenorhabditis elegans). These GEMs represent the most comprehensive coverage of the metabolic network by considering both orthology-based pathways and species-specific reactions. All GEMs can be interactively queried via the accompanying web portal Metabolic Atlas. Specifically, through integrative analysis of Mouse1 with RNA-sequencing data from brain tissues of transgenic mice we identified a coordinated up-regulation of lysosomal GM2 ganglioside and peptide degradation pathways which appears to be a signature metabolic alteration in Alzheimer’s disease (AD) mouse models with a phenotype of amyloid precursor protein overexpression. This metabolic shift was further validated with proteomics data from transgenic mice and cerebrospinal fluid samples from human patients. The elevated lysosomal enzymes thus hold potential to be used as a biomarker for early diagnosis of AD. Taken together, we foresee that this evolving open-source platform will serve as an important resource to facilitate the development of systems medicines and translational biomedical applications.


Sign in / Sign up

Export Citation Format

Share Document