short read
Recently Published Documents





Francisca Rojas Ringeling ◽  
Shounak Chakraborty ◽  
Caroline Vissers ◽  
Derek Reiman ◽  
Akshay M. Patel ◽  

BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Guoshun Xu ◽  
Liwen Zhang ◽  
Xiaoqing Liu ◽  
Feifei Guan ◽  
Yuquan Xu ◽  

Abstract Background Advances in DNA sequencing technologies have transformed our capacity to perform life science research, decipher the dynamics of complex soil microbial communities and exploit them for plant disease management. However, soil is a complex conglomerate, which makes functional metagenomics studies very challenging. Results Metagenomes were assembled by long-read (PacBio, PB), short-read (Illumina, IL), and mixture of PB and IL (PI) sequencing of soil DNA samples were compared. Ortholog analyses and functional annotation revealed that the PI approach significantly increased the contig length of the metagenomic sequences compared to IL and enlarged the gene pool compared to PB. The PI approach also offered comparable or higher species abundance than either PB or IL alone, and showed significant advantages for studying natural product biosynthetic genes in the soil microbiomes. Conclusion Our results provide an effective strategy for combining long and short-read DNA sequencing data to explore and distill the maximum information out of soil metagenomics.

2022 ◽  
Derek M Bickhart ◽  
Lisa M Koch ◽  
Timothy P.L. Smith ◽  
Heathcliffe Riday ◽  
Michael L Sullivan

Red clover (Trifolium pratense L.) is used as a forage crop due to a variety of favorable traits relative to other crops. Improved varieties have been developed through conventional breeding approaches, but progress could be accelerated and gene discovery facilitated using modern genomic methods. Existing short-read based genome assemblies of the ~420 Megabase (Mb) genome are fragmented into >135,000 contigs with numerous errors in order and orientation within scaffolds, likely due to the biology of the plant which displays gametophytic self-incompatibility resulting in inherent high heterozygosity. A high-quality long-read based assembly of red clover is presented that reduces the number of contigs by more than 500-fold, improves the per-base quality, and increases the contig N50 statistic by three orders of magnitude. The 413.5 Mb assembly is nearly 20% longer than the 350 Mb short read assembly, closer to the predicted genome size. Quality measures are presented and full-length isoform sequence of RNA transcripts reported for use in assessing accuracy and for future annotation of the genome. The assembly accurately represents the seven main linkage groups present in the genome of an allogamous (outcrossing), highly heterozygous plant species.

2022 ◽  
Vol 23 (1) ◽  
Bohu Pan ◽  
Luyao Ren ◽  
Vitor Onuchic ◽  
Meijian Guan ◽  
Rebecca Kusko ◽  

Abstract Background Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.

2022 ◽  
Karl Johan Westrin ◽  
Warren W Kretzschmar ◽  
Olof Emanuelsson

Motivation: Transcriptome assembly from RNA sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms. Result: We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. An appreciable fraction were reconstructed to at least 95% of their length. We suggest that ClusTrast will be useful for studying alternative splicing in the absence of a reference genome. Availability and implementation: The code and usage instructions are available at

2022 ◽  
Vol 9 (1) ◽  
William S. Pearman ◽  
Sarah J. Wells ◽  
James Dale ◽  
Olin K. Silander ◽  
Nikki E. Freed

Most animal mitochondrial genomes are small, circular and structurally conserved. However, recent work indicates that diverse taxa possess unusual mitochondrial genomes. In Isopoda , species in multiple lineages have atypical and rearranged mitochondrial genomes. However, more species of this speciose taxon need to be evaluated to understand the evolutionary origins of atypical mitochondrial genomes in this group. In this study, we report the presence of an atypical mitochondrial structure in the New Zealand endemic marine isopod, Isocladus armatus. Data from long- and short-read DNA sequencing suggest that I. armatus has two mitochondrial chromosomes. The first chromosome consists of two mitochondrial genomes that have been inverted and fused together in a circular form, and the second chromosome consists of a single mitochondrial genome in a linearized form. This atypical mitochondrial structure has been detected in other isopod lineages, and our data from an additional divergent isopod lineage (Sphaeromatidae) lends support to the hypothesis that atypical structure evolved early in the evolution of Isopoda . Additionally, we find that an asymmetrical site previously observed across many species within Isopoda is absent in I. armatus , but confirm the presence of two asymmetrical sites recently reported in two other isopod species.

2021 ◽  
William Bolosky ◽  
Arun Subramaniyan ◽  
Matei Zaharia ◽  
Ravi Pandya ◽  
Taylor Sittler ◽  

Abstract Much genomic data comes in the form of paired-end reads: two reads that represent genetic material with a small gap between. We present a new algorithm for aligning both reads in a pair simultaneously by fuzzily intersecting the sets of candidate alignment locations for each read. This algorithm is often much faster and produces alignments that result in variant calls having roughly the same concordance as the best competing aligners.

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261374
Oscar L. Rodriguez ◽  
Andrew J. Sharp ◽  
Corey T. Watson

Lymphoblastoid cell lines (LCLs) have been critical to establishing genetic resources for biomedical science. They have been used extensively to study human genetic diversity, genome function, and inform the development of tools and methodologies for augmenting disease genetics research. While the validity of variant callsets from LCLs has been demonstrated for most of the genome, previous work has shown that DNA extracted from LCLs is modified by V(D)J recombination within the immunoglobulin (IG) loci, regions that harbor antibody genes critical to immune system function. However, the impacts of V(D)J on short read sequencing data generated from LCLs has not been extensively investigated. In this study, we used LCL-derived short read sequencing data from the 1000 Genomes Project (n = 2,504) to identify signatures of V(D)J recombination. Our analyses revealed sample-level impacts of V(D)J recombination that varied depending on the degree of inferred monoclonality. We showed that V(D)J associated somatic deletions impacted genotyping accuracy, leading to adulterated population-level estimates of allele frequency and linkage disequilibrium. These findings illuminate limitations of using LCLs and short read data for building genetic resources in the IG loci, with implications for interpreting previous disease association studies in these regions.

2021 ◽  
Vol 7 (12) ◽  
Suma Tiruvayipati ◽  
Wen Ying Tang ◽  
Timothy M. S. Barkham ◽  
Swaine L. Chen

Group B Streptococcus (GBS; Streptococcus agalactiae ) is the most common cause of neonatal meningitis and a rising cause of sepsis in adults. Recently, it has also been shown to cause foodborne disease. As with many other bacteria, the polysaccharide capsule of GBS is antigenic, enabling its use for strain serotyping. Recent advances in DNA sequencing have made sequence-based typing attractive (as has been implemented for several other bacteria, including Escherichia coli , Klebsiella pneumoniae species complex, Streptococcus pyogenes , and others). For GBS, existing WGS-based serotyping systems do not provide complete coverage of all known GBS serotypes (specifically including subtypes of serotype III), and none are simultaneously compatible with the two most common data types, raw short reads and assembled sequences. Here, we create a serotyping database (GBS-SBG, GBS Serotyping by Genome Sequencing), with associated scripts and running instructions, that can be used to call all currently described GBS serotypes, including subtypes of serotype III, using both direct short-read- and assembly-based typing. We achieved higher concordance using GBS-SBG on a previously reported data set of 790 strains. We further validated GBS-SBG on a new set of 572 strains, achieving 99.8% concordance with PCR-based molecular serotyping using either short-read- or assembly-based typing. The GBS-SBG package is publicly available and will hopefully accelerate and simplify serotyping by sequencing for GBS.

2021 ◽  
Alaina Shumate ◽  
Brandon Wong ◽  
Geo Pertea ◽  
Mihaela Pertea

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are unable to span multiple exons. Long-read technology can capture full-length transcripts, but its high error rate often leads to mis-identified splice sites, and its low throughput makes quantification difficult. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus,and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at

Sign in / Sign up

Export Citation Format

Share Document