scholarly journals DiscoPlot: Discordant read visualisation

Author(s):  
Mitchell J Sullivan ◽  
Scott A Beatson

Over the last decade, the emergence of high-throughput sequencing has led to an increase in both the size and scope of genome sequencing projects. Although genome sequencing and analysis has changed dramatically during this time, the way read alignments are visualised has remained largely unchanged. To address the problem of visualising growing sequencing datasets, we have developed DiscoPlot, a tool for visualising read alignments using a two-dimensional scatterplot. DiscoPlot allows the user to quickly identify genomic rearrangements, misassemblies and sequencing artefacts by providing a scalable method for visualising large sections of the genome. It reads single-end or paired read alignments in SAM, BAM or standard BLAST tab format and creates a scatter plot of opaque crosses representing the alignments to a reference. DiscoPlot is freely available (under a GPL license) for download (Mac OS X, Unix and Windows) at https://mjsull.github.io/DiscoPlot.

2015 ◽  
Author(s):  
Mitchell J Sullivan ◽  
Scott A Beatson

Over the last decade, the emergence of high-throughput sequencing has led to an increase in both the size and scope of genome sequencing projects. Although genome sequencing and analysis has changed dramatically during this time, the way read alignments are visualised has remained largely unchanged. To address the problem of visualising growing sequencing datasets, we have developed DiscoPlot, a tool for visualising read alignments using a two-dimensional scatterplot. DiscoPlot allows the user to quickly identify genomic rearrangements, misassemblies and sequencing artefacts by providing a scalable method for visualising large sections of the genome. It reads single-end or paired read alignments in SAM, BAM or standard BLAST tab format and creates a scatter plot of opaque crosses representing the alignments to a reference. DiscoPlot is freely available (under a GPL license) for download (Mac OS X, Unix and Windows) at https://mjsull.github.io/DiscoPlot.


mSphere ◽  
2020 ◽  
Vol 5 (5) ◽  
Author(s):  
Bhavna Hora ◽  
Naila Gulzar ◽  
Yue Chen ◽  
Konstantinos Karagiannis ◽  
Fangping Cai ◽  
...  

ABSTRACT High-throughput sequencing (HTS) has been widely used to characterize HIV-1 genome sequences. There are no algorithms currently that can directly determine genotype and quasispecies population using short HTS reads generated from long genome sequences without additional software. To establish a robust subpopulation, subtype, and recombination analysis workflow, we amplified the HIV-1 3′-half genome from plasma samples of 65 HIV-1-infected individuals and sequenced the entire amplicon (∼4,500 bp) by HTS. With direct analysis of raw reads using HIVE-hexahedron, we showed that 48% of samples harbored 2 to 13 subpopulations. We identified various subtypes (17 A1s, 4 Bs, 27 Cs, 6 CRF02_AGs, and 11 unique recombinant forms) and defined recombinant breakpoints of 10 recombinants. These results were validated with viral genome sequences generated by single genome sequencing (SGS) or the analysis of consensus sequence of the HTS reads. The HIVE-hexahedron workflow is more sensitive and accurate than just evaluating the consensus sequence and also more cost-effective than SGS. IMPORTANCE The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS).


2020 ◽  
Author(s):  
Hai-Long Wang

ABSTRACTI performed whole-genome sequencing on SARS-CoV-2 collected from COVID-19 samples at Mayo Clinic Rochester in mid-April, 2020, generated 85 consensus genome sequences and compared them to other genome sequences collected worldwide. I proposed a novel illustrating method using a 2D map to display populations of co-occurring nucleotide variants for intra- and inter-viral clades. This method is highly advantageous for the new era of “big-data” when high-throughput sequencing is becoming readily available. Using this method, I revealed the emergence of inter-clade hybrid SARS-CoV-2 lineages that are potentially caused by homologous genetic recombination.


2016 ◽  
Vol 7 ◽  
Author(s):  
Maël Bessaud ◽  
Serge A. Sadeuh-Mba ◽  
Marie-Line Joffret ◽  
Richter Razafindratsimandresy ◽  
Patsy Polston ◽  
...  

2021 ◽  
Vol 20 ◽  
pp. 117693512110492
Author(s):  
Ahmed Ibrahim Samir Khalil ◽  
Anupam Chattopadhyay ◽  
Amartya Sanyal

Background: The revolution in next-generation sequencing (NGS) technology has allowed easy access and sharing of high-throughput sequencing datasets of cancer cell lines and their integrative analyses. However, long-term passaging and culture conditions introduce high levels of genomic and phenotypic diversity in established cell lines resulting in strain differences. Thus, clonal variation in cultured cell lines with respect to the reference standard is a major barrier in systems biology data analyses. Therefore, there is a pressing need for a fast and entry-level assessment of clonal variations within cell lines using their high-throughput sequencing data. Results: We developed a Python-based software, AStra, for de novo estimation of the genome-wide segmental aneuploidy to measure and visually interpret strain-level similarities or differences of cancer cell lines from whole-genome sequencing (WGS). We demonstrated that aneuploidy spectrum can capture the genetic variations in 27 strains of MCF7 breast cancer cell line collected from different laboratories. Performance evaluation of AStra using several cancer sequencing datasets revealed that cancer cell lines exhibit distinct aneuploidy spectra which reflect their previously-reported karyotypic observations. Similarly, AStra successfully identified large-scale DNA copy number variations (CNVs) artificially introduced in simulated WGS datasets. Conclusions: AStra provides an analytical and visualization platform for rapid and easy comparison between different strains or between cell lines based on their aneuploidy spectra solely using the raw BAM files representing mapped reads. We recommend AStra for rapid first-pass quality assessment of cancer cell lines before integrating scientific datasets that employ deep sequencing. AStra is an open-source software and is available at https://github.com/AISKhalil/AStra .


2012 ◽  
Vol 22 (11) ◽  
pp. 2250-2261 ◽  
Author(s):  
A. McPherson ◽  
C. Wu ◽  
A. W. Wyatt ◽  
S. Shah ◽  
C. Collins ◽  
...  

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Dave J. Baker ◽  
Alp Aydin ◽  
Thanh Le-Viet ◽  
Gemma L. Kay ◽  
Steven Rudder ◽  
...  

AbstractWe present CoronaHiT, a platform and throughput flexible method for sequencing SARS-CoV-2 genomes (≤ 96 on MinION or > 96 on Illumina NextSeq) depending on changing requirements experienced during the pandemic. CoronaHiT uses transposase-based library preparation of ARTIC PCR products. Method performance was demonstrated by sequencing 2 plates containing 95 and 59 SARS-CoV-2 genomes on nanopore and Illumina platforms and comparing to the ARTIC LoCost nanopore method. Of the 154 samples sequenced using all 3 methods, ≥ 90% genome coverage was obtained for 64.3% using ARTIC LoCost, 71.4% using CoronaHiT-ONT and 76.6% using CoronaHiT-Illumina, with almost identical clustering on a maximum likelihood tree. This protocol will aid the rapid expansion of SARS-CoV-2 genome sequencing globally.


Sign in / Sign up

Export Citation Format

Share Document