scholarly journals Subgenomic RNAs as molecular indicators of asymptomatic SARS-CoV-2 infection

2021 ◽  
Author(s):  
Chee Hong Wong ◽  
Chew Yee Ngan ◽  
Rachel L. Goldfeder ◽  
Jennifer Idol ◽  
Chris Kuhlberg ◽  
...  

SummaryIn coronaviridae such as SARS-CoV-2, subgenomic RNAs (sgRNA) are replicative intermediates, therefore, their abundance and structures could infer viral replication activity and severity of host infection. Here, we systematically characterized the sgRNA expression and their structural variation in 81 clinical specimens collected from symptomatic and asymptomatic individuals with a goal of assessing viral genomic signatures of disease severity. We demonstrated the highly coordinated and consistent expression of sgRNAs from individuals with robust infections that results in symptoms, and found their expression is significantly repressed in the asymptomatic infections, indicating that the ratio of sgRNAs to genomic RNA (sgRNA/gRNA) is highly correlated with the severity of the disease. Using long read sequencing technologies to characterize full-length sgRNA structures, we also observed widespread deletions in viral RNAs, and identified unique sets of deletions preferentially found primarily in symptomatic individuals, with many likely to confer changes in SARS-CoV-2 virulence and host responses. Furthermore, based on the sgRNA structures, the frequently occurred structural variants in SARS-CoV-2 genomes serves as a mechanism to further induce SARS-CoV-2 proteome complexity. Taken together, our results show that differential sgRNA expression and structural mutational burden both appear to be correlated with the clinical severity of SARS-CoV-2 infection. Longitudinally monitoring sgRNA expression and structural diversity could further guide treatment responses, testing strategies, and vaccine development.

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Chee Hong Wong ◽  
Chew Yee Ngan ◽  
Rachel L. Goldfeder ◽  
Jennifer Idol ◽  
Chris Kuhlberg ◽  
...  

Abstract Background It is estimated that up to 80% of infections caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are asymptomatic and asymptomatic patients can still effectively transmit the virus and cause disease. While much of the effort has been placed on decoding single nucleotide variation in SARS-CoV-2 genomes, considerably less is known about their transcript variation and any correlation with clinical severity in human hosts, as defined here by the presence or absence of symptoms. Methods To assess viral genomic signatures of disease severity, we conducted a systematic characterization of SARS-CoV-2 transcripts and genetic variants in 81 clinical specimens collected from symptomatic and asymptomatic individuals using multi-scale transcriptomic analyses including amplicon-seq, short-read metatranscriptome and long-read Iso-seq. Results Here we show a highly coordinated and consistent pattern of sgRNA expression from individuals with robust SARS-CoV-2 symptomatic infection and their expression is significantly repressed in the asymptomatic infections. We also observe widespread inter- and intra-patient variants in viral RNAs, known as quasispecies frequently found in many RNA viruses. We identify unique sets of deletions preferentially found primarily in symptomatic individuals, with many likely to confer changes in SARS-CoV-2 virulence and host responses. Moreover, these frequently occurring structural variants in SARS-CoV-2 genomes serve as a mechanism to further induce SARS-CoV-2 proteome complexity. Conclusions Our results indicate that differential sgRNA expression and structural mutational burden are highly correlated with the clinical severity of SARS-CoV-2 infection. Longitudinally monitoring sgRNA expression and structural diversity could further guide treatment responses, testing strategies, and vaccine development.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chong Chu ◽  
Rebeca Borges-Monroy ◽  
Vinayak V. Viswanadham ◽  
Soohyun Lee ◽  
Heng Li ◽  
...  

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.


Genes ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 481 ◽  
Author(s):  
Chen ◽  
Lin ◽  
Xie ◽  
Zhong ◽  
Zhang ◽  
...  

The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
J. Robert Macey ◽  
Stephan Pabinger ◽  
Charles G. Barbieri ◽  
Ella S. Buring ◽  
Vanessa L. Gonzalez ◽  
...  

AbstractAnimal mitochondrial genomic polymorphism occurs as low-level mitochondrial heteroplasmy and deeply divergent co-existing molecules. The latter is rare, known only in bivalvian mollusks. Here we show two deeply divergent co-existing mt-genomes in a vertebrate through genomic sequencing of the Tuatara (Sphenodon punctatus), the sole-representative of an ancient reptilian Order. The two molecules, revealed using a combination of short-read and long-read sequencing technologies, differ by 10.4% nucleotide divergence. A single long-read covers an entire mt-molecule for both strands. Phylogenetic analyses suggest a 7–8 million-year divergence between genomes. Contrary to earlier reports, all 37 genes typical of animal mitochondria, with drastic gene rearrangements, are confirmed for both mt-genomes. Also unique to vertebrates, concerted evolution drives three near-identical putative Control Region non-coding blocks. Evidence of positive selection at sites linked to metabolically important transmembrane regions of encoded proteins suggests these two mt-genomes may confer an adaptive advantage for an unusually cold-tolerant reptile.


2018 ◽  
Author(s):  
Luisa Berná ◽  
Matías Rodríguez ◽  
María Laura Chiribao ◽  
Adriana Parodi-Talice ◽  
Sebastián Pita ◽  
...  

Although the genome ofTrypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degree of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated withT. cruzi´sgenome since they permit directly determining the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, allows not only accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of twoT. cruziclones: the hybrid TCC (DTU TcVI) and the non-hybrid Dm28c (DTU TcI), determined by PacBio SMRT technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome ofT. cruziis composed of a "core compartment" and a "disruptive compartment" which exhibit opposite gene and GC content composition. New tandem and disperse repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families MUC and trans-sialidases allows now a better overview of these complex groups of genes.


2017 ◽  
Author(s):  
Mircea Cretu Stancu ◽  
Markus J. van Roosmalen ◽  
Ivo Renkens ◽  
Marleen Nieboer ◽  
Sjors Middelkamp ◽  
...  

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.


2021 ◽  
Vol 10 (46) ◽  
Author(s):  
Kentaro Miyazaki ◽  
Natsuko Tokito

Complete genome resequencing was conducted for Thermus thermophilus strain TMY by hybrid assembly of Oxford Nanopore Technologies long-read and MGI short-read data. Errors in the previously reported genome sequence determined by PacBio technology alone were corrected, allowing for high-quality comparative genomic analysis of closely related T. thermophilus genomes.


Data Mining ◽  
2013 ◽  
pp. 1131-1148
Author(s):  
Patricio A. Manque ◽  
Ute Woehlbier

Vaccines represent one of the most cost-effective ways to prevent and treat diseases. The use of vaccines in the control of viral diseases represents an important milestone in the history of medicine. The genomic revolution brought us the possibility to scan genomes in the search of new and more effective vaccine candidates and the advancement of bioinformatics provided the framework for the application of strategies that were focused not only on antigen discovery but also on comparative genomics, and pathogenic factor identification and data mining. In addition, the progress in post-genomic technologies including gene expression technologies such as microarray and proteomics gave us the opportunity to explore the host responses to vaccines leading to a better understanding of immune responses to pathogens and/or to vaccines, assisting in the development of new and better vaccines and adjuvants. This chapter will review how systems biology-based approaches including genomics, gene expression technologies, and bioinformatics have changed the way of thinking about antigen discovery and vaccine development. In addition, the chapter will discuss how the study of the host responses in combination with “in silico” approaches could help predict immunogenicity and improve the efficacy of vaccines.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Robert M. Waterhouse ◽  
Sergey Aganezov ◽  
Yoann Anselmetti ◽  
Jiyoung Lee ◽  
Livio Ruzzante ◽  
...  

Abstract Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.


Sign in / Sign up

Export Citation Format

Share Document