Enhancing De Novo Transcriptome Assembly by Incorporating Multiple Overlap Sizes

Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

Download Full-text

Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists

Scientific Reports ◽

10.1038/s41598-020-75270-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

D. N. U. Naranpanawa ◽

C. H. W. M. R. B. Chandrasekara ◽

P. C. G. Bandaranayake ◽

A. U. Bandaranayake

Keyword(s):

De Novo ◽

Sequence Data ◽

Transcriptome Assembly ◽

Low Cost ◽

Santalum Album ◽

Sequencing Data ◽

Illumina Hiseq ◽

Tissue Samples ◽

Downstream Analysis ◽

Bioinformatics Workflow

Abstract Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.

Download Full-text

Optimization of de novo transcriptome assembly from next-generation sequencing data

Genome Research ◽

10.1101/gr.103846.109 ◽

2010 ◽

Vol 20 (10) ◽

pp. 1432-1440 ◽

Cited By ~ 259

Author(s):

Y. Surget-Groba ◽

J. I. Montoya-Burgos

Keyword(s):

Next Generation Sequencing ◽

De Novo ◽

Transcriptome Assembly ◽

Next Generation Sequencing Data ◽

De Novo Transcriptome Assembly ◽

Next Generation ◽

Sequencing Data ◽

De Novo Transcriptome ◽

Generation Sequencing

Download Full-text

Molecular marker information from de novo assembled transcriptomes of chilli pepper (Capsicum annuum L.) varieties based on next-generation sequencing technology

Plant Genetic Resources ◽

10.1017/s147926211400032x ◽

2014 ◽

Vol 12 (S1) ◽

pp. S83-S86 ◽

Cited By ~ 1

Author(s):

Yul-Kyun Ahn ◽

Swati Tripathi ◽

Young-Il Cho ◽

Jeong-Ho Kim ◽

Hye-Eun Lee ◽

...

Keyword(s):

Molecular Markers ◽

Next Generation Sequencing ◽

De Novo ◽

Transcriptome Assembly ◽

Sequence Variant ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Chilli Pepper ◽

Next Generation Sequencing Technology ◽

Generation Sequencing

Next-generation sequencing technique has been known as a useful tool for de novo transcriptome assembly, functional annotation of genes and identification of molecular markers. This study was carried out to mine molecular markers from de novo assembled transcriptomes of four chilli pepper varieties, the highly pungent ‘Saengryeg 211’ and non-pungent ‘Saengryeg 213’ and variably pigmented ‘Mandarin’ and ‘Blackcluster’. Pyrosequencing of the complementary DNA library resulted in 361,671, 274,269, 279,221, and 316,357 raw reads, which were assembled in 23,607, 19,894, 18,340 and 20,357 contigs, for the four varieties, respectively. Detailed sequence variant analysis identified numerous potential single-nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) for all the varieties for which the primers were designed. The transcriptome information and SNP/SSR markers generated in this study provide valuable resources for high-density molecular genetic mapping in chilli pepper and Quantitative trait loci analysis related to fruit qualities. These markers for pepper will be highly valuable for marker-assisted breeding and other genetic studies.

Download Full-text

De Novo Genome Assembly of Next-Generation Sequencing Data

Compendium of Plant Genomes - The Brassica rapa Genome ◽

10.1007/978-3-662-47901-8_4 ◽

2015 ◽

pp. 41-51

Author(s):

Min Liu ◽

Dongyuan Liu ◽

Hongkun Zheng

Keyword(s):

Next Generation Sequencing ◽

Genome Assembly ◽

De Novo ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Generation Sequencing

Download Full-text

Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform

GigaScience ◽

10.1093/gigascience/giaa087 ◽

2020 ◽

Vol 9 (8) ◽

Author(s):

Marcela Sandoval-Velasco ◽

Juan Antonio Rodríguez ◽

Cynthia Perez Estrada ◽

Guojie Zhang ◽

Erez Lieberman Aiden ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Data Generation ◽

Next Generation ◽

Sequencing Data ◽

Yield Data ◽

Chromosome Conformation ◽

Sequencing Platform ◽

Sequencing Platforms ◽

Generation Sequencing

Abstract Background Hi-C experiments couple DNA-DNA proximity with next-generation sequencing to yield an unbiased description of genome-wide interactions. Previous methods describing Hi-C experiments have focused on the industry-standard Illumina sequencing. With new next-generation sequencing platforms such as BGISEQ-500 becoming more widely available, protocol adaptations to fit platform-specific requirements are useful to give increased choice to researchers who routinely generate sequencing data. Results We describe an in situ Hi-C protocol adapted to be compatible with the BGISEQ-500 high-throughput sequencing platform. Using zebra finch (Taeniopygia guttata) as a biological sample, we demonstrate how Hi-C libraries can be constructed to generate informative data using the BGISEQ-500 platform, following circularization and DNA nanoball generation. Our protocol is a modification of an Illumina-compatible method, based around blunt-end ligations in library construction, using un-barcoded, distally overhanging double-stranded adapters, followed by amplification using indexed primers. The resulting libraries are ready for circularization and subsequent sequencing on the BGISEQ series of platforms and yield data similar to what can be expected using Illumina-compatible approaches. Conclusions Our straightforward modification to an Illumina-compatible in situHi-C protocol enables data generation on the BGISEQ series of platforms, thus expanding the options available for researchers who wish to utilize the powerful Hi-C techniques in their research.

Download Full-text

Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies

Current Bioinformatics ◽

10.2174/1574893614666190410155603 ◽

2020 ◽

Vol 15 (1) ◽

pp. 2-16

Author(s):

Yuwen Luo ◽

Xingyu Liao ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Critical Role ◽

High Sensitivity ◽

Biological Properties ◽

Sequencing Data ◽

Sequencing Technologies ◽

Long Reads ◽

Massive Sequencing ◽

Generation Sequencing

Transcriptome assembly plays a critical role in studying biological properties and examining the expression levels of genomes in specific cells. It is also the basis of many downstream analyses. With the increase of speed and the decrease in cost, massive sequencing data continues to accumulate. A large number of assembly strategies based on different computational methods and experiments have been developed. How to efficiently perform transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the issues with transcriptome assembly are explored based on different sequencing technologies. Specifically, transcriptome assemblies with next-generation sequencing reads are divided into reference-based assemblies and de novo assemblies. The examples of different species are used to illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength transcripts without assemblies. In addition, different transcriptome assemblies using the Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions of transcriptome assemblies.

Download Full-text

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

GigaScience ◽

10.1093/gigascience/giaa105 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Willem de Koning ◽

Milad Miladi ◽

Saskia Hiltemann ◽

Astrid Heikema ◽

John P Hays ◽

...

Keyword(s):

Genome Assembly ◽

Bioinformatics Analysis ◽

De Novo ◽

Sequence Data ◽

Ease Of Use ◽

Easy Access ◽

Complex Data ◽

Sequencing Data ◽

Long Read ◽

Sequencing Platforms

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

Download Full-text

Next generation sequencing allows deeper analysis and understanding of genomes and transcriptomes including aspects to fertility

Reproduction Fertility and Development ◽

10.1071/rd10247 ◽

2011 ◽

Vol 23 (1) ◽

pp. 75 ◽

Cited By ~ 7

Author(s):

Thomas Werner

Keyword(s):

Next Generation Sequencing ◽

Transcriptional Control ◽

Target Genes ◽

De Novo ◽

Alternative Promoters ◽

Next Generation ◽

Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Generation Sequencing

Reproduction and fertility are controlled by specific events naturally linked to oocytes, testes and early embryonal tissues. A significant part of these events involves gene expression, especially transcriptional control and alternative transcription (alternative promoters and alternative splicing). While methods to analyse such events for carefully predetermined target genes are well established, until recently no methodology existed to extend such analyses into a genome-wide de novo discovery process. With the arrival of next generation sequencing (NGS) it becomes possible to attempt genome-wide discovery in genomic sequences as well as whole transcriptomes at a single nucleotide level. This does not only allow identification of the primary changes (e.g. alternative transcripts) but also helps to elucidate the regulatory context that leads to the induction of transcriptional changes. This review discusses the basics of the new technological and scientific concepts arising from NGS, prominent differences from microarray-based approaches and several aspects of its application to reproduction and fertility research. These concepts will then be illustrated in an application example of NGS sequencing data analysis involving postimplantation endometrium tissue from cows.

Download Full-text

Updated results from phase II study of guadecitabine for patients with higher risk myelodysplastic syndromes or chronic myelomonocytic leukemia.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.7020 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. 7020-7020 ◽

Cited By ~ 4

Author(s):

Guillermo Montalban-Bravo ◽

Prithviraj Bose ◽

Yesid Alvarado ◽

Naval Guastad Daver ◽

Farhad Ravandi ◽

...

Keyword(s):

Clinical Trial ◽

Complete Response ◽

Stopping Rules ◽

Clinical Activity ◽

Current Response ◽

Complex Karyotype ◽

Next Generation ◽

Sequencing Data ◽

Tp53 Mutations ◽

Sequencing Platform

7020 Background: Improving the current response and survival outcomes of patients with higher risk MDS and CMML is fundamental. Guadecitabine is a next generation hypomethylating agent with increased length of exposure compared to decitabine and clinical activity in patients with MDS. Methods: Single arm phase 2 clinical trial of guadecitabine at a dose of 60mg/m2 sc daily for 5 days (days 1-5) every 28 days for patients with newly diagnosed MDS or CMML classified as Intermediate-2 or High risk by IPSS. Primary endpoint is complete response (CR). Responses were evaluated following the revised 2006 International Working Group criteria. Sequencing data was obtained at the time of pre-treatment evaluation by the use of a 28-gene next generation sequencing platform. Study included stopping rules for response and toxicity. Overall survival (OS) was censored at the time of transplant. Results: A total of 53 patients have been enrolled: 50 (94%) are evaluable for toxicity and 44 (83%) for response. Median age is 67 years (49-87). A total of 43 (86%) patients have MDS and 7 (14%) have CMML. A total of 21 (42%) have complex karyotype. Sequencing data was available in 48 (96%) patients with TP53 mutations being the most frequently detected in 36% patients. After a median of 6 treatment cycles (1-20), the ORR is 71% including 32% CR. Median best response occurred by 3 cycles (1-6). Seven (21%) out of 33 evaluable patients achieved a complete cytogenetic response. Ten (20%) subjects proceed to allogeneic stem cell transplantation. Median follow up was 6.3 months (0-23). Median OS is 14.1 months (CI 13.3-14.9 months) and median EFS is 8.4 months (CI 5.6-11.2 months). Forty-five (90%) patients experienced at least one AE during therapy. Most common grade 1-2 AEs included fatigue (66%), nausea (38%) and dyspnea (26%). Dose reductions due to cytopenias were required in 17 (34%) patients. Early 8-week mortality occurred in 3 (6%) patients. Conclusions: Guadecitabine is well-tolerated and active in patients with higher-risk MDS and CMML even in the presence of adverse biological features such as high frequency of complex karyotype, therapy related disease and TP53 mutations. Clinical trial information: NCT02131597.

Download Full-text