scholarly journals SAUTE: sequence assembly using target enrichment

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alexandre Souvorov ◽  
Richa Agarwala

Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision.

Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 568
Author(s):  
Laura Vilanova ◽  
Claudio A. Valero-Jiménez ◽  
Jan A.L. van Kan

Brown rot is the most economically important fungal disease of stone fruits and is primarily caused by Monilinia laxa and Monlinia fructicola. Both species co-occur in European orchards although M. fructicola is considered to cause the most severe yield losses in stone fruit. This study aimed to generate a high-quality genome of M. fructicola and to exploit it to identify genes that may contribute to pathogen virulence. PacBio sequencing technology was used to assemble the genome of M. fructicola. Manual structural curation of gene models, supported by RNA-Seq, and functional annotation of the proteome yielded 10,086 trustworthy gene models. The genome was examined for the presence of genes that encode secreted proteins and more specifically effector proteins. A set of 134 putative effectors was defined. Several effector genes were cloned into Agrobacterium tumefaciens for transient expression in Nicotiana benthamiana plants, and some of them triggered necrotic lesions. Studying effectors and their biological properties will help to better understand the interaction between M. fructicola and its stone fruit host plants.


2018 ◽  
Vol 93 (1) ◽  
Author(s):  
Katherine L. James ◽  
Thushan I. de Silva ◽  
Katherine Brown ◽  
Hilton Whittle ◽  
Stephen Taylor ◽  
...  

ABSTRACTAccurate determination of the genetic diversity present in the HIV quasispecies is critical for the development of a preventative vaccine: in particular, little is known about viral genetic diversity for the second type of HIV, HIV-2. A better understanding of HIV-2 biology is relevant to the HIV vaccine field because a substantial proportion of infected people experience long-term viral control, and prior HIV-2 infection has been associated with slower HIV-1 disease progression in coinfected subjects. The majority of traditional and next-generation sequencing methods have relied on target amplification prior to sequencing, introducing biases that may obscure the true signals of diversity in the viral population. Additionally, target enrichment through PCR requiresa priorisequence knowledge, which is lacking for HIV-2. Therefore, a target enrichment free method of library preparation would be valuable for the field. We applied an RNA shotgun sequencing (RNA-Seq) method without PCR amplification to cultured viral stocks and patient plasma samples from HIV-2-infected individuals. Libraries generated from total plasma RNA were analyzed with a two-step pipeline: (i)de novogenome assembly, followed by (ii) read remapping. By this approach, whole-genome sequences were generated with a 28× to 67× mean depth of coverage. Assembled reads showed a low level of GC bias, and comparison of the genome diversities at the intrahost level showed low diversity in the accessory genevpxin all patients. Our study demonstrates that RNA-Seq is a feasible full-genomede novosequencing method for blood plasma samples collected from HIV-2-infected individuals.IMPORTANCEAn accurate picture of viral genetic diversity is critical for the development of a globally effective HIV vaccine. However, sequencing strategies are often complicated by target enrichment prior to sequencing, introducing biases that can distort variant frequencies, which are not easily corrected for in downstream analyses. Additionally, detaileda priorisequence knowledge is needed to inform robust primer design when employing PCR amplification, a factor that is often lacking when working with tropical diseases localized in developing countries. Previous work has demonstrated that direct RNA shotgun sequencing (RNA-Seq) can be used to circumvent these issues for hepatitis C virus (HCV) and norovirus. We applied RNA-Seq to total RNA extracted from HIV-2 blood plasma samples, demonstrating the applicability of this technique to HIV-2 and allowing us to generate a dynamic picture of genetic diversity over the whole genome of HIV-2 in the context of low-bias sequencing.


2019 ◽  
Author(s):  
Ru-pin Alicia Chi ◽  
Tianyuan Wang ◽  
Nyssa Adams ◽  
San-pin Wu ◽  
Steven L. Young ◽  
...  

ABSTRACTContextPoor uterine receptivity is one major factor leading to pregnancy loss and infertility. Understanding the molecular events governing successful implantation is hence critical in combating infertility.ObjectiveTo define PGR-regulated molecular mechanisms and epithelial roles in receptivity.DesignRNA-seq and PGR-ChIP-seq were conducted in parallel to identify PGR-regulated pathways during the WOI in endometrium of fertile women.SettingEndometrial biopsies from the proliferative and mid-secretory phases were analyzed.Patients or Other ParticipantsParticipants were fertile, reproductive aged (18-37) women with normal cycle length; and without any history of dysmenorrhea, infertility, or irregular cycles. In total, 42 endometrial biopsies obtained from 42 women were analyzed in this study.InterventionsThere were no interventions during this study.Main Outcome MeasuresHere we measured the alterations in gene expression and PGR occupancy in the genome during the WOI, based on the hypothesis that PGR binds uterine chromatin cycle-dependently to regulate genes involved in uterine cell differentiation and function.Results653 genes were identified with regulated PGR binding and differential expression during the WOI. These were involved in regulating inflammatory response, xenobiotic metabolism, EMT, cell death, interleukin/STAT signaling, estrogen response, and MTORC1 response. Transcriptome of the epithelium identified 3,052 DEGs, of which 658 were uniquely regulated. Transcription factors IRF8 and MEF2C were found to be regulated in the epithelium during the WOI at the protein level, suggesting potentially important functions that are previously unrecognized.ConclusionPGR binds the genomic regions of genes regulating critical processes in uterine receptivity and function.PrécisUsing a combination of RNA-seq and PGR ChIP-seq, novel signaling pathways and epithelial regulators were identified in the endometrium of fertile women during the window of implantation.


Author(s):  
Paul L. Auer ◽  
Rebecca W Doerge

RNA sequencing technology is providing data of unprecedented throughput, resolution, and accuracy. Although there are many different computational tools for processing these data, there are a limited number of statistical methods for analyzing them, and even fewer that acknowledge the unique nature of individual gene transcription. We introduce a simple and powerful statistical approach, based on a two-stage Poisson model, for modeling RNA sequencing data and testing for biologically important changes in gene expression. The advantages of this approach are demonstrated through simulations and real data applications.


2018 ◽  
Author(s):  
Elena Bushmanova ◽  
Dmitry Antipov ◽  
Alla Lapidus ◽  
Andrey D. Prjibelski

AbstractSummaryPossibility to generate large RNA-seq datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the model organisms with finished and annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing and paralogous genes. In this paper we describe a novel transcriptome assembler called rnaSPAdes, which is developed on top of SPAdes genome assembler and explores surprising computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-Seq datasets, and briefly highlight strong and weak points of different assemblers.Availability and implementationrnaSPAdes is implemented in C++ and Python and is freely available at cab.spbu.ru/software/rnaspades/.


Author(s):  
Camille Marchet ◽  
Zamin Iqbal ◽  
Daniel Gautheret ◽  
Mikael Salson ◽  
Rayan Chikhi

AbstractMotivationIn this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.ResultsWe used REINDEER to index the abundances of sequences within 2,585 human RNA-seq experiments in 45 hours using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of 4 billion distinct k-mers across 2,585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph (DBG) of each dataset, then conceptually merges those DBGs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances.Availabilityhttps://github.com/kamimrcht/[email protected]


2016 ◽  
Author(s):  
Francisco Avila Cobos ◽  
Jasper Anckaert ◽  
Pieter-Jan Volders ◽  
Dries Rombaut ◽  
Jo Vandesompele ◽  
...  

AbstractSummaryReconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. The Zipper plot is an application that enables users to interrogate putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These features are obtained from publicly available datasets including CAGE-sequencing (CAGE-seq), ChIP-sequencing (ChIP-seq) for histone marks and DNasesequencing (DNase-seq). The Zipper plot application requires three input fields (chromosome, genomic coordinate (hg19) of the TSS and strand) and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot.Availability and ImplementationThe Zipper plot is implemented using the statistical programming language R and is freely available at http://[email protected]; [email protected]; [email protected] informationSupplementary Methods available online.


2021 ◽  
Author(s):  
Lin Di ◽  
Bo Liu ◽  
Yuzhu Lyu ◽  
Shihui Zhao ◽  
Yuhong Pang ◽  
...  

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.


2019 ◽  
Author(s):  
Camille Marchet ◽  
Yoann Dufresne ◽  
Antoine Limasset

AbstractNext generation sequencing produces large volumes of short sequences with broad applications. The noise due to sequencing errors led to the development of several correction methods. The main correction paradigm expects a high (from 30-40X) uniform coverage to correctly infer a reference set of subsequences from the reads, that are used for correction. In practice, most accurate methods use k-mer spectrum techniques to obtain a set of reference k-mers. However, when correcting NGS datasets that present an uneven coverage, such as RNA-seq data, this paradigm tends to mistake rare variants for errors. It may therefore discard or alter them using highly covered sequences, which leads to an information loss and may introduce bias. In this paper we present two new contributions in order to cope with this situation.First, we show that starting from non-uniform sequencing coverages, a De Bruijn graph can be cleaned from most errors while preserving biological variability. Second, we demonstrate that reads can be efficiently corrected via local alignment on the cleaned De Bruijn graph paths. We implemented the described method in a tool dubbed BCT and evaluated its results on RNA-seq and metagenomic data. We show that the graph cleaning strategy combined with the mapping strategy leads to save more rare k-mers, resulting in a more conservative correction than previous methods. BCT is also capable to better take advantage of the signal of high depth datasets. We suggest that BCT, being scalable to large metagenomic datasets as well as correcting shallow single cell RNA-seq data, can be a general corrector for non-uniform data. Availability: BCT is open source and available at github.com/Malfoy/BCT under the Affero GPL License.


2019 ◽  
Author(s):  
Berkley E. Gryder ◽  
Marco Wachtel ◽  
Kenneth Chang ◽  
Osama El Demerdash ◽  
Nicholas G. Aboreden ◽  
...  

AbstractCore regularity transcription factors (CR TFs) define cell identity and lineage through an exquisitely precise and logical order during embryogenesis and development. These CR TFs regulate one another in three-dimensional space via distal enhancers that serve as logic gates embedded in their TF recognition sequences. Aberrant chromatin organization resulting in miswired circuitry of enhancer logic is a newly recognized feature in many cancers. Here, we report that PAX3-FOXO1 expression is driven by a translocated FOXO1 distal super enhancer (SE). Using 4C-seq, a technique detecting all genomic regions that interact with the translocated FOXO1 SE, we demonstrate its physical interaction with the PAX3 promotor only in the presence of the oncogenic translocation. Furthermore, RNA-seq and ChIP-seq in tumors bearing rare PAX translocations implicate enhancer miswiring is a pervasive feature across all FP-RMS tumors. HiChIP of enhancer mark H3K27ac showed extended connectivity between the distal FOXO1 SE and additional intra-domain enhancers and the PAX3 promoter. We show by CRISPR-paired-ChIP-Rx that PAX3-FOXO1 transcription is diminished when this network of enhancers is selectively ablated. Therefore, our data reveal a mechanism of a translocated hijacked enhancer which disrupts the normal CR TF logic during skeletal muscle development (PAX3 to MYOD to MYOG), replacing it with an infinite loop logic that locks rhabdomyosarcoma cells in an undifferentiated proliferating stage.


Sign in / Sign up

Export Citation Format

Share Document