scholarly journals Intra-host site-specific polymorphisms of SARS-CoV-2 is consistent across multiple samples and methodologies

Author(s):  
Rebecca Rose ◽  
David J. Nolan ◽  
Samual Moot ◽  
Amy Feehan ◽  
Sissy Cross ◽  
...  

ABSTRACTDespite the potential relevance to clinical outcome, intra-host dynamics of SARS-CoV-2 are unclear. Here, we quantify and characterize intra-host variation in SARS-CoV-2 raw sequence data uploaded to SRA as of 14 April 2020, and compare results between two sequencing methods (amplicon and RNA-Seq). Raw fastq files were quality filtered and trimmed using Trimmomatic, mapped to the WuhanHu1 reference genome using Bowtie2, and variants called with bcftools mpileup. To ensure sufficient coverage, we only included samples with 10X coverage for >90% of the genome (n=406 samples), and only variants with a depth >=10. Derived (i.e. non-reference) alleles were found at 408 sites. The number of polymorphic sites (i.e. sites with multiple alleles) within samples ranged from 0-13, with 72% of samples (295/406) having at least one polymorphic site. Correlation between number of polymorphic sites and coverage was very low for both sequencing methods (R2 < 0.1, p < 0.05). Polymorphisms were observed >1 sample at 66 sites (range: 2-38 samples). The minor allele frequency (MAF) at each shared polymorphic site was 0.03% - 48.5%. 33/66 sites occurred in ORF1a1b, and 37/66 changes were non-synonymous. At 10/66 sites, derived alleles were found in samples sequenced using both methods. Polymorphic amplicon samples were found at 10/10 positions, while polymorphic RNA-Seq samples were found at 7/10 positions. In conclusion, our results suggest that intra-host variation is prevalent among clinical samples. While mutations resulting from amplification and/or sequencing errors cannot be excluded, the observation of shared polymorphic sites with high MAF across multiple samples and sequencing methods is consistent with true underlying variation. Further investigation into intra-host evolutionary dynamics, particularly with longitudinal sampling, is critical for broader understanding of disease progression.

Author(s):  
Yoshiaki Yasumizu ◽  
Atsushi Hara ◽  
Shimon Sakaguchi ◽  
Naganari Ohkura

Abstract Summary The possibility that RNA transcripts from clinical samples contain plenty of virus RNAs has not been pursued actively so far. We here developed a new tool for analyzing virus-transcribed mRNAs, not virus copy numbers, in the data of bulk and single-cell RNA-sequencing of human cells. Our pipeline, named VIRTUS (VIRal Transcript Usage Sensor), was able to detect 762 viruses including herpesviruses, retroviruses and even SARS-CoV-2 (COVID-19), and quantify their transcripts in the sequence data. This tool thus enabled simultaneously detecting infected cells, the composition of multiple viruses within the cell, and the endogenous host-gene expression profile of the cell. This bioinformatics method would be instrumental in addressing the possible effects of covertly infecting viruses on certain diseases and developing new treatments to target such viruses. Availability and implementation : VIRTUS is implemented using Common Workflow Language and Docker under a CC-NC license. VIRTUS is freely available at https://github.com/yyoshiaki/VIRTUS. Supplementary information Supplementary data are available at Bioinformatics online.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 297 ◽  
Author(s):  
Jason R. Miller ◽  
Sergey Koren ◽  
Kari A. Dilley ◽  
Derek M. Harkins ◽  
Timothy B. Stockwell ◽  
...  

Background:The tick cell line ISE6, derived fromIxodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples.Methods:To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology.Results:The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to theIxodes scapularisreference genome sequence.Conclusions:The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.


2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
M Galiano ◽  
S Miah ◽  
O Akinbami ◽  
S Gonzalez Gonoggia ◽  
J Ellis ◽  
...  

Abstract For the last four influenza seasons in the UK, genetic characterization of seasonal influenza viruses has shifted from single hemagglutinin (HA) and neuraminidase (NA) genes to whole genome (WG) analysis, allowing for better insight into the evolutionary dynamics of this virus. Sequences (WG or HA/NA) were obtained from >900A (H3N2) viruses sampled in the UK during influenza seasons 2016/7 and 2017/8 and the inter-seasonal period. Viral RNA was extracted from clinical samples and amplified using a multi-segment RT-PCR. Amplicons were sequenced using Nextera library preparation for Illumina MiSeq sequencing. Sequence data ????were processed using BAM-SAM tools and PHE in-house scripts. Phylogenetic analysis of the HA gene indicates that they belong to genetic group 3C.2a, which has circulated since 2014. Season 2016/7 was characterized by the emergence of cluster 3C.2a.1; further genetic heterogeneity was seen with 6 new subclusters within 3C.2a and 3C.2a.1, with predominance of those characterized by amino acid changes N121K and S144K (3C.2a) and N121K, N171K, I406K, G484E (3C.2a.1). The NA genes clustered with a similar topology to the HA. Season 2017/8 was characterized by persistence of some clades from previous season with further diversification. Three of the 3C.2a clusters continued to circulate, with predominance of clade showing T131K, R142K, and R261Q (clade 3C.2a.2). The majority of HA sequences in 3C.2a1 fall into a new subcluster which has become predominant within this subgroup, with amino acid changes E62G, K92R, and T135K (3C.2a.1b). The topology of NA and internal gene trees showed evidence of reassortment events occurring at some point between the two seasons, with group 3C.2a2 acquiring NA and some internal genes from 3C.2a1 lineage viruses. The predominance of this group during 2017–8 might be due to fitness advantage related to the new genetic constellation. Emerging viruses from group 3C.3a also have acquired genes from lineage 3C.2a1, which could be the reason for their increased frequency to 20 per cent by the end of season 2017–8. Molecular epidemiology indicates emerging genetic diversity in A(H3N2) viruses during the period of study, leading to co-circulation of variants. The frequency of circulating HA genetic groups was quite variable, with rapidly changing patterns of predominance. Evidence of reassortment events was observed which could be responsible for the rise and predominance of some clades, and might predict the emergence of other variants.


2020 ◽  
Author(s):  
Yoshiaki Yasumizu ◽  
Atsushi Hara ◽  
Shimon Sakaguchi ◽  
Naganari Ohkura

AbstractThe possibility that RNA transcripts from clinical samples contain plenty of virus RNAs has not been pursued actively so far. We here developed a new tool for analyzing virus-transcribed mRNAs, not virus copy numbers, in the data of conventional and single-cell RNA-sequencing of human cells. Our pipeline, named VIRTUS (VIRal Transcript Usage Sensor), was able to detect 763 viruses including herpesviruses, retroviruses, and even SARS-CoV-2 (COVID-19), and quantify their transcripts in the sequence data. This tool thus enabled simultaneously detecting infected cells, the composition of multiple viruses within the cell, and the endogenous host gene expression profile of the cell. This bioinformatics method would be instrumental in addressing the possible effects of covertly infecting viruses on certain diseases and developing new treatments to target such viruses.Availability and implementationVIRTUS is implemented using Common Workflow Language and Docker under a CC-NC license. VIRTUS is freely available at https://github.com/yyoshiaki/VIRTUS.Supplementary informationSupplementary data are available at Bioinformatics online.


Author(s):  
Xu Shi ◽  
Andrew F Neuwald ◽  
Xiao Wang ◽  
Tian-Li Wang ◽  
Leena Hilakivi-Clarke ◽  
...  

Abstract Motivation High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. Results We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance. Availability and implementation The IntAPT package is available at http://github.com/henryxushi/IntAPT. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Liang Cheng ◽  
Xudong Han ◽  
Zijun Zhu ◽  
Changlu Qi ◽  
Ping Wang ◽  
...  

Abstract Since the first report of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the COVID-19 pandemic has spread rapidly worldwide. Due to the limited virus strains, few key mutations that would be very important with the evolutionary trends of virus genome were observed in early studies. Here, we downloaded 1809 sequence data of SARS-CoV-2 strains from GISAID before April 2020 to identify mutations and functional alterations caused by these mutations. Totally, we identified 1017 nonsynonymous and 512 synonymous mutations with alignment to reference genome NC_045512, none of which were observed in the receptor-binding domain (RBD) of the spike protein. On average, each of the strains could have about 1.75 new mutations each month. The current mutations may have few impacts on antibodies. Although it shows the purifying selection in whole-genome, ORF3a, ORF8 and ORF10 were under positive selection. Only 36 mutations occurred in 1% and more virus strains were further analyzed to reveal linkage disequilibrium (LD) variants and dominant mutations. As a result, we observed five dominant mutations involving three nonsynonymous mutations C28144T, C14408T and A23403G and two synonymous mutations T8782C, and C3037T. These five mutations occurred in almost all strains in April 2020. Besides, we also observed two potential dominant nonsynonymous mutations C1059T and G25563T, which occurred in most of the strains in April 2020. Further functional analysis shows that these mutations decreased protein stability largely, which could lead to a significant reduction of virus virulence. In addition, the A23403G mutation increases the spike-ACE2 interaction and finally leads to the enhancement of its infectivity. All of these proved that the evolution of SARS-CoV-2 is toward the enhancement of infectivity and reduction of virulence.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuhua Zhan ◽  
Cortland Griswold ◽  
Lewis Lukens

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.


2021 ◽  
Vol 8 ◽  
Author(s):  
Hanhan Yao ◽  
Zhihua Lin ◽  
Yinghui Dong ◽  
Xianghui Kong ◽  
Lin He ◽  
...  

The razor clam, Sinonovacula constricta is a commercially important bivalve in the western Pacific Ocean, yet little is known about the mechanisms of sex determination/differentiation and gametogenesis. In the present study, the comparative transcriptome analysis of adult gonads (female gonads and male gonads) was conducted to identify potential sex-related genes in S. constricta. The number of reads generated for each target library (three females and three males) ranged from 31,853,422 to 37,750,848, and 20,489,472 to 26,152,448 could be mapped to the reference genome of S. constricta (the map percentage ranging from 63.71 to 71.48%). A total of 8,497 genes were identified to be differentially expressed between the female and male gonads, of which 4,253 were female-biased (upregulated in females), and 4,244 were male-biased. Forty-five genes were identified as potential sex-related genes, including DmrtA2, Sox9, Fem-1b, and Fem-1c involved in sex determination/differentiation and Vg, CYP17A1, SOHLH2, and TSSK involved in gametogenesis. The expression profiles of 12 genes were validated by qRT-PCR, which further confirmed the reliability and accuracy of the RNA-Seq results. Our results provide basic information about the genes involved in sex determination/differentiation and gametogenesis, and pave the way for further studies on reproduction and breeding in S. constricta and other marine bivalves.


Sign in / Sign up

Export Citation Format

Share Document