Strawberry: fast and accurate genome-guided transcript reconstruction and quantification from RNA-seq

Mapping Intimacies ◽

10.1101/043802 ◽

2016 ◽

Author(s):

Ruolin Liu ◽

Julie Dickerson

Keyword(s):

Gene Annotation ◽

Ground Truth ◽

Data Representation ◽

Data Availability ◽

Rna Seq ◽

Transcript Reconstruction ◽

Novel Method ◽

Fast Flow ◽

Reduced Data ◽

Transcript Assembly

We propose a novel method and computational tool, Strawberry, for transcript reconstruction and quantification from paired-end RNA-seq data under the guidance of genome alignment and independent of gene annotation. Strawberry achieves this through disentangling assembly and quantification in a sequential manner. The application of a fast flow network algorithm for assembly speeds up the construction of a parsimonious set of transcripts. The resulting reduced data representation improves the efficiency of expression-level quantification. Strawberry leverages the speed and accuracy of transcript assembly and quantification in such a way that processing 10 million simulated reads (after alignment) requires only 90 seconds using a single thread while achieving over 92% correlation with the ground truth, making it the state-of-the-art method. Strawberry outperforms Cufflinks and StringTie, the two other leading methods, in many aspects, including the number of corrected assembled transcripts and the correlation with the ground truth of simulated RNA-seq data. Availability: Strawberry is written in C++11, and is available as open source software at https://github.com/ruolin/Strawberry under the GPLv3 license.

Download Full-text

scRNAss: a single-cell RNA-seq assembler via imputing dropouts and combing junctions

Bioinformatics ◽

10.1093/bioinformatics/btz240 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4264-4271

Author(s):

Juntao Liu ◽

Xiangyu Liu ◽

Xianwen Ren ◽

Guojun Li

Keyword(s):

Single Cell ◽

Open Source Software ◽

State Of The Art ◽

Supplementary Information ◽

Rna Seq ◽

Transcript Reconstruction ◽

Full Length Transcript ◽

Free Open Source ◽

Transcript Assembly ◽

Novel Isoforms

Abstract Motivation Full-length transcript reconstruction is essential for single-cell RNA-seq data analysis, but dropout events, which can cause transcripts discarded completely or broken into pieces, pose great challenges for transcript assembly. Currently available RNA-seq assemblers are generally designed for bulk RNA sequencing. To fill the gap, we introduce single-cell RNA-seq assembler, a method that applies explicit strategies to impute lost information caused by dropout events and a combing strategy to infer transcripts using scRNA-seq. Results Extensive evaluations on both simulated and biological datasets demonstrated its superiority over the state-of-the-art RNA-seq assemblers including StringTie, Cufflinks and CLASS2. In particular, it showed a remarkable capability of recovering unknown ‘novel’ isoforms and highly computational efficiency compared to other tools. Availability and implementation scRNAss is free, open-source software available from https://sourceforge.net/projects/single-cell-rna-seq-assembly/files/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Precise Transcript Reconstruction with End-Guided Assembly

10.1101/2022.01.12.476004 ◽

2022 ◽

Author(s):

Michael A Schon ◽

Stefan Lutzmayer ◽

Falko Hofmann ◽

Michael D Nodine

Keyword(s):

Single Cells ◽

Embryonic Stem ◽

Full Length ◽

Rna Seq ◽

Genomics Research ◽

Long Read ◽

Transcript Reconstruction ◽

Guided Assembly ◽

Automated Methods ◽

Transcript Assembly

Accurate annotation of transcript isoforms is crucial for functional genomics research, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data are imprecise. We developed a generalized transcript assembly framework called Bookend that incorporates data from multiple modes of RNA-seq, with a focus on identifying, labeling, and deconvoluting RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correctly modeling transcript start and end sites is essential for precise transcript assembly. Furthermore, we discover that reads from full-length single-cell RNA-seq (scRNA-seq) methods are sparsely end-labeled, and that these ends are sufficient to dramatically improve precision of assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq in the model plant Arabidopsis and meta-assembly of single mouse embryonic stem cells (mESCs) are both capable of producing tissue-specific end-to-end transcript annotations of comparable or superior quality to existing reference isoforms.

Download Full-text

COVID-19 infection map generation and detection from chest X-ray images

Health Information Science and Systems ◽

10.1007/s13755-021-00146-8 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Aysen Degerli ◽

Mete Ahishali ◽

Mehmet Yamac ◽

Serkan Kiranyaz ◽

Muhammad E. H. Chowdhury ◽

...

Keyword(s):

State Of The Art ◽

Ground Truth ◽

Clinical Use ◽

X Ray ◽

Learning Techniques ◽

Map Generation ◽

Severity Grading ◽

Chest X Ray ◽

Novel Method ◽

Aided Diagnosis

AbstractComputer-aided diagnosis has become a necessity for accurate and immediate coronavirus disease 2019 (COVID-19) detection to aid treatment and prevent the spread of the virus. Numerous studies have proposed to use Deep Learning techniques for COVID-19 diagnosis. However, they have used very limited chest X-ray (CXR) image repositories for evaluation with a small number, a few hundreds, of COVID-19 samples. Moreover, these methods can neither localize nor grade the severity of COVID-19 infection. For this purpose, recent studies proposed to explore the activation maps of deep networks. However, they remain inaccurate for localizing the actual infestation making them unreliable for clinical use. This study proposes a novel method for the joint localization, severity grading, and detection of COVID-19 from CXR images by generating the so-called infection maps. To accomplish this, we have compiled the largest dataset with 119,316 CXR images including 2951 COVID-19 samples, where the annotation of the ground-truth segmentation masks is performed on CXRs by a novel collaborative human–machine approach. Furthermore, we publicly release the first CXR dataset with the ground-truth segmentation masks of the COVID-19 infected regions. A detailed set of experiments show that state-of-the-art segmentation networks can learn to localize COVID-19 infection with an F1-score of 83.20%, which is significantly superior to the activation maps created by the previous methods. Finally, the proposed approach achieved a COVID-19 detection performance with 94.96% sensitivity and 99.88% specificity.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

RcTGA1 and glucosinolate biosynthesis pathway involvement in the defence of rose against the necrotrophic fungus Botrytis cinerea

BMC Plant Biology ◽

10.1186/s12870-021-02973-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Penghua Gao ◽

Hao Zhang ◽

Huijun Yan ◽

Qigang Wang ◽

Bo Yan ◽

...

Keyword(s):

Disease Resistance ◽

Gene Annotation ◽

Infected Plant ◽

Grey Mould ◽

Cascade Reactions ◽

Postharvest Quality ◽

Rna Seq ◽

Protein Activity ◽

Phenylpropanoid Biosynthesis ◽

And Cluster Analysis

Abstract Background Rose is an important economic crop in horticulture. However, its field growth and postharvest quality are negatively affected by grey mould disease caused by Botrytis c. However, it is unclear how rose plants defend themselves against this fungal pathogen. Here, we used transcriptomic, metabolomic and VIGS analyses to explore the mechanism of resistance to Botrytis c. Result In this study, a protein activity analysis revealed a significant increase in defence enzyme activities in infected plants. RNA-Seq of plants infected for 0 h, 36 h, 60 h and 72 h produced a total of 54 GB of clean reads. Among these reads, 3990, 5995 and 8683 differentially expressed genes (DEGs) were found in CK vs. T36, CK vs. T60 and CK vs. T72, respectively. Gene annotation and cluster analysis of the DEGs revealed a variety of defence responses to Botrytis c. infection, including resistance (R) proteins, MAPK cascade reactions, plant hormone signal transduction pathways, plant-pathogen interaction pathways, Ca2+ and disease resistance-related genes. qPCR verification showed the reliability of the transcriptome data. The PTRV2-RcTGA1-infected plant material showed improved susceptibility of rose to Botrytis c. A total of 635 metabolites were detected in all samples, which could be divided into 29 groups. Metabonomic data showed that a total of 59, 78 and 74 DEMs were obtained for T36, T60 and T72 (T36: Botrytis c. inoculated rose flowers at 36 h; T60: Botrytis c. inoculated rose flowers at 60 h; T72: Botrytis c. inoculated rose flowers at 72 h) compared to CK, respectively. A variety of secondary metabolites are related to biological disease resistance, including tannins, amino acids and derivatives, and alkaloids, among others; they were significantly increased and enriched in phenylpropanoid biosynthesis, glucosinolates and other disease resistance pathways. This study provides a theoretical basis for breeding new cultivars that are resistant to Botrytis c. Conclusion Fifty-four GB of clean reads were generated through RNA-Seq. R proteins, ROS signalling, Ca2+ signalling, MAPK signalling, and SA signalling were activated in the Old Blush response to Botrytis c. RcTGA1 positively regulates rose resistance to Botrytis c. A total of 635 metabolites were detected in all samples. DEMs were enriched in phenylpropanoid biosynthesis, glucosinolates and other disease resistance pathways.

Download Full-text

A mutation in LacDWARF1 results in a GA-deficient dwarf phenotype in sponge gourd (Luffa acutangula)

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03938-4 ◽

2021 ◽

Author(s):

Gangjun Zhao ◽

Caixia Luo ◽

Jianning Luo ◽

Junxing Li ◽

Hao Gong ◽

...

Keyword(s):

Gene Annotation ◽

Recessive Gene ◽

Genomic Region ◽

Dwarf Mutant ◽

Rna Seq ◽

Dwarf Phenotype ◽

Sponge Gourd ◽

Response To Stress ◽

Luffa Acutangula ◽

Generation Sequencing

Abstract Key message A dwarfism gene LacDWARF1 was mapped by combined BSA-Seq and comparative genomics analyses to a 65.4 kb physical genomic region on chromosome 05. Abstract Dwarf architecture is one of the most important traits utilized in Cucurbitaceae breeding because it saves labor and increases the harvest index. To our knowledge, there has been no prior research about dwarfism in the sponge gourd. This study reports the first dwarf mutant WJ209 with a decrease in cell size and internodes. A genetic analysis revealed that the mutant phenotype was controlled by a single recessive gene, which is designated Lacdwarf1 (Lacd1). Combined with bulked segregate analysis and next-generation sequencing, we quickly mapped a 65.4 kb region on chromosome 5 using F2 segregation population with InDel and SNP polymorphism markers. Gene annotation revealed that Lac05g019500 encodes a gibberellin 3β-hydroxylase (GA3ox) that functions as the most likely candidate gene for Lacd1. DNA sequence analysis showed that there is an approximately 4 kb insertion in the first intron of Lac05g019500 in WJ209. Lac05g019500 is transcribed incorrectly in the dwarf mutant owing to the presence of the insertion. Moreover, the bioactive GAs decreased significantly in WJ209, and the dwarf phenotype could be restored by exogenous GA3 treatment, indicating that WJ209 is a GA-deficient mutant. All these results support the conclusion that Lac05g019500 is the Lacd1 gene. In addition, RNA-Seq revealed that many genes, including those related to plant hormones, cellular process, cell wall, membrane and response to stress, were significantly altered in WJ209 compared with the wild type. This study will aid in the use of molecular marker-assisted breeding in the dwarf sponge gourd.

Download Full-text

Motion estimation in vehicular environments based on Bayesian dynamic networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219255 ◽

2021 ◽

pp. 1-12

Author(s):

Lauro Reyes-Cocoletzi ◽

Ivan Olmos-Pineda ◽

J. Arturo Olvera-Lopez

Keyword(s):

Real World ◽

Dynamic Networks ◽

Ground Truth ◽

Change Of Direction ◽

Prediction Rate ◽

Different Types ◽

Novel Method ◽

Comparison Of The Results ◽

Multiple Obstacles ◽

Real Traffic

The cornerstone to achieve the development of autonomous ground driving with the lowest possible risk of collision in real traffic environments is the movement estimation obstacle. Predicting trajectories of multiple obstacles in dynamic traffic scenarios is a major challenge, especially when different types of obstacles such as vehicles and pedestrians are involved. According to the issues mentioned, in this work a novel method based on Bayesian dynamic networks is proposed to infer the paths of interest objects (IO). Environmental information is obtained through stereo video, the direction vectors of multiple obstacles are computed and the trajectories with the highest probability of occurrence and the possibility of collision are highlighted. The proposed approach was evaluated using test environments considering different road layouts and multiple obstacles in real-world traffic scenarios. A comparison of the results obtained against the ground truth of the paths taken by each detected IO is performed. According to experimental results, the proposed method obtains a prediction rate of 75% for the change of direction taking into consideration the risk of collision. The importance of the proposal is that it does not obviate the risk of collision in contrast with related work.

Download Full-text

Maximizing prediction of orphan genes in assembled genomes

10.1101/2019.12.17.880294 ◽

2019 ◽

Cited By ~ 2

Author(s):

Arun Seetharam ◽

Urminder Singh ◽

Jing Li ◽

Priyanka Bhandary ◽

Zeb Arendsee ◽

...

Keyword(s):

Sequence Homology ◽

Evolutionary History ◽

Direct Evidence ◽

Gene Annotation ◽

Rna Seq ◽

Orphan Genes ◽

Orphan Gene ◽

New Genes ◽

Conserved Genes ◽

Rapid Emergence

ABSTRACTThe evolutionary rapid emergence of new genes gives rise to “orphan genes” that share no sequence homology to genes in closely related genomes. These genes provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Gene annotation pipelines that combine ab initio machine-learning with sequence homology-based searches are efficient in identifying basal genes with a long evolutionary history. However, their ability to identify orphan genes and other young genes has not been systematically evaluated. Here, we classify the phylostrata of curated Arabidopsis thaliana genes and use these to assess the ability of two of the most prevalent annotation pipelines, MAKER and BRAKER, to predict orphans and other young genes. MAKER predictions are highly dependent on the RNA-Seq evidence, predicting between 11% and 60% of the orphan-genes and 95% to 98% of basal-genes in the annotated genome of Arabidopsis. In contrast, BRAKER consistently predicts 33% of orphan-genes and 98% of basal-genes. A less used method to identify genes is by directly aligning RNA-Seq data to the genome sequence. We present a Findable, Accessible, Interoperable and Reusable (FAIR) approach, called BIND, that mitigates the under-prediction of orphan genes. BIND combines BRAKER predictions with direct evidence-based inference of transcripts based on RNA-Seq alignments to the genome. BIND increases the number and accuracy of orphan gene predictions, identifying 68% of Araport11-annotated orphan genes and 99% of the conserved genes.

Download Full-text

TrancriptomeReconstructoR, A Data-Driven Annotation of Complex Transcriptomes

10.21203/rs.3.rs-131404/v1 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

Practical example for use of the supervised vicarious calibration (SVC) method on multisource hyperspectral imagery data – ValCalHyp airborne hyperspectral campaign under the EUFAR framework

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-7-51-2014 ◽

2014 ◽

Vol XL-7 ◽

pp. 51-53

Author(s):

A. Brook ◽

E. Ben Dor

Keyword(s):

Atmospheric Correction ◽

Ground Truth ◽

Critical Issue ◽

Radiometric Calibration ◽

Vicarious Calibration ◽

Flight Direction ◽

Novel Approach ◽

Novel Method ◽

General Goal ◽

Cross Calibration

A novel approach for radiometric calibration and atmospheric correction of airborne hyperspectral (HRS) data, termed supervised vicarious calibration (SVC) was proposed by Brook and Ben-Dor in 2010. The present study was aimed at validating this SVC approach by simultaneously using several different airborne HSR sensors that acquired HSR data over several selected sites at the same time. The general goal of this study was to apply a cross-calibration approach to examine the capability and stability of the SVC method and to examine its validity. This paper reports the result of the multi sensors campaign took place over Salon de Provenance, France on behalf of the ValCalHyp project took place in 2011. The SVC method enabled the rectification of the radiometric drift of each sensor and improves their performance significantly. The flight direction of the SVC targets was found to be a critical issue for such correction and recommendations have been set for future utilization of this novel method. The results of the SVC method were examined by comparing ground-truth spectra of several selected validation targets with the image spectra as well as by comparing the classified water quality images generated from all sensors over selected water bodies.

Download Full-text