A benchmarking of pipelines for detecting ncRNAs from RNA-Seq data

2019 ◽  
Vol 21 (6) ◽  
pp. 1987-1998 ◽  
Author(s):  
Sebastiano Di Bella ◽  
Alessandro La Ferlita ◽  
Giovanni Carapezza ◽  
Salvatore Alaimo ◽  
Antonella Isacchi ◽  
...  

Abstract Next-Generation Sequencing (NGS) is a high-throughput technology widely applied to genome sequencing and transcriptome profiling. RNA-Seq uses NGS to reveal RNA identities and quantities in a given sample. However, it produces a huge amount of raw data that need to be preprocessed with fast and effective computational methods. RNA-Seq can look at different populations of RNAs, including ncRNAs. Indeed, in the last few years, several ncRNAs pipelines have been developed for ncRNAs analysis from RNA-Seq experiments. In this paper, we analyze eight recent pipelines (iSmaRT, iSRAP, miARma-Seq, Oasis 2, SPORTS1.0, sRNAnalyzer, sRNApipe, sRNA workbench) which allows the analysis not only of single specific classes of ncRNAs but also of more than one ncRNA classes. Our systematic performance evaluation aims at guiding users to select the appropriate pipeline for processing each ncRNA class, focusing on three key points: (i) accuracy in ncRNAs identification, (ii) accuracy in read count estimation and (iii) deployment and ease of use.

2014 ◽  
Vol 32 (11) ◽  
pp. 1166-1166 ◽  
Author(s):  
Sheng Li ◽  
Scott W Tighe ◽  
Charles M Nicolet ◽  
Deborah Grove ◽  
Shawn Levy ◽  
...  

2010 ◽  
Vol 2010 ◽  
pp. 1-19 ◽  
Author(s):  
Valerio Costa ◽  
Claudia Angelini ◽  
Italia De Feis ◽  
Alfredo Ciccodicola

In recent years, the introduction of massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols, able to simultaneously sequence hundred thousand DNA fragments, dramatically changed the landscape of the genetics studies. RNA-Seq for transcriptome studies, Chip-Seq for DNA-proteins interaction, CNV-Seq for large genome nucleotide variations are only some of the intriguing new applications supported by these innovative platforms. Among them RNA-Seq is perhaps the most complex NGS application. Expression levels of specific genes, differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. All these attributes are not readily achievable from previously widespread hybridization-based or tag sequence-based approaches. However, the unprecedented level of sensitivity and the large amount of available data produced by NGS platforms provide clear advantages as well as new challenges and issues. This technology brings the great power to make several new biological observations and discoveries, it also requires a considerable effort in the development of new bioinformatics tools to deal with these massive data files. The paper aims to give a survey of the RNA-Seq methodology, particularly focusing on the challenges that this application presents both from a biological and a bioinformatics point of view.


2019 ◽  
Author(s):  
Tim O. Nieuwenhuis ◽  
Stephanie Yang ◽  
Rohan X. Verma ◽  
Vamsee Pillalamarri ◽  
Dan E. Arking ◽  
...  

AbstractOne of the challenges of next generation sequencing (NGS) is read contamination. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, to understand the factors that contribute to contamination. We obtained GTEx datasets and technical metadata and validating RNA-Seq from other studies. Of 48 analyzed tissues in GTEx, 26 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicated contamination. Sample contamination by non-native genes was associated with a sample being sequenced on the same day as a tissue that natively expressed those genes. This was highly significant for pancreas and esophagus genes (linear model, p=9.5e-237 and p=5e-260 respectively). Nine SNPs in four genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes, validating the contamination. Low-level contamination affected 4,497 (39.6%) samples (defined as 10 PRSS1 TPM). It also led ≥ to eQTL assignments in inappropriate tissues among these 18 genes. We note this type of contamination occurs widely, impacting bulk and single cell data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses. Awareness of this process is necessary to avoid assigning inaccurate importance to low-level gene expression in inappropriate tissues and cells.


Author(s):  
Naiyar Iqbal ◽  
Pradeep Kumar

Disease classification based on biological data is an important area in bioinformatics and biomedical research. It helps the doctors and medical practitioners for the early detection of disease and support them as a computer-aided diagnostic tool for accurate diagnosis, prognosis, and treatment of disease. Earlier Microarray gene expression data have wide application for the classification of disease, but now Next-generation sequencing (NGS) has replaced the Microarray technology. From the last few years, RNA sequence (RNA-Seq) data are widely used for the transcriptomic analysis. Hence, RNA-Seq based classification of disease is in its infancy. In this article, we present a general framework for the classification of disease constructed on RNA-Seq data. This framework will guide the researchers to process RNA-Seq, extract relevant features and apply the appropriate classifier to classify any kind of disease.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2851 ◽  
Author(s):  
Panu Artimo ◽  
Séverine Duvaud ◽  
Mikhail Pachkov ◽  
Vassilios Ioannidis ◽  
Erik van Nimwegen ◽  
...  

ISMARA (ismara.unibas.ch) automatically infers the key regulators and regulatory interactions from high-throughput gene expression or chromatin state data. However, given the large sizes of current next generation sequencing (NGS) datasets, data uploading times are a major bottleneck. Additionally, for proprietary data, users may be uncomfortable with uploading entire raw datasets to an external server. Both these problems could be alleviated by providing a means by which users could pre-process their raw data locally, transferring only a small summary file to the ISMARA server. We developed a stand-alone client application that pre-processes large input files (RNA-seq or ChIP-seq data) on the user's computer for performing ISMARA analysis in a completely automated manner, including uploading of small processed summary files to the ISMARA server. This reduces file sizes by up to a factor of 1000, and upload times from many hours to mere seconds. The client application is available from ismara.unibas.ch/ISMARA/client.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 888
Author(s):  
Elizabeth Baskin ◽  
Peter DeFord ◽  
Allison F. Dennis ◽  
Ian Misner ◽  
Frederick J. Tan ◽  
...  

The rapid rise of high-throughput, data intensive experimental techniques has thrust many biologists into the role of data analyst – a role many biologists feel ill equipped to fill. Novices often struggle to find the resources and expertise they need to analyze their experimental results in a wet-lab environment. To fill this need, we developed an educational resource as part of a National Center for Biotechnology Information (NCBI) hackathon. Using RNA-seq as a model, our tutorial guides new users through the steps of data analysis, while placing an emphasis on understanding the motivation behind choices made in the process. To advance the goal of providing a deeper understanding of the analysis process, we developed a new tool, bamDiff. bamDiff allows users to compare the performance of multiple RNA-seq aligners, allowing users to select the most appropriate aligner for the data in question and experimental end-goal. Our tutorial is accessible via a GitHub wiki, with associated data and software provided on an Amazon Machine Image (AMI), which can be completed at no cost to the user through the Amazon Educate Program. Following the hackathon, our tutorial was integrated into the October 2015 offering of NCBI NOW (Next Generation Sequencing (NGS) Online Workshop) a free online experience targeting individuals new to NGS analysis.


2014 ◽  
Vol 32 (9) ◽  
pp. 915-925 ◽  
Author(s):  
Sheng Li ◽  
Scott W Tighe ◽  
Charles M Nicolet ◽  
Deborah Grove ◽  
Shawn Levy ◽  
...  

2019 ◽  
Author(s):  
Elisha Krieg ◽  
Krishna Gupta ◽  
Andreas Dahl ◽  
Mathias Lesche ◽  
Susanne Boye ◽  
...  

AbstractSelective isolation of DNA is crucial for applications in biology, bionanotechnology, clinical diagnostics and forensics. We herein report a smart methanol-responsive polymer (MeRPy) that can be programmed to bind and separate single- as well as double-stranded DNA targets. Captured targets are quickly isolated and released back into solution by denaturation (sequence-agnostic) or toehold-mediated strand displacement (sequence-selective). The latter mode allows 99.8% efficient removal of unwanted sequences and 79% recovery of highly pure target sequences. We applied MeRPy for the depletion of insulin, glucagon, and transthyretin cDNA from clinical next-generation sequencing (NGS) libraries. This step improved data quality for low-abundance transcripts in expression profiles of pancreatic tissues. Its low cost, scalability, high stability and ease of use make MeRPy suitable for diverse applications in research and clinical laboratories, including enhancement of NGS libraries, extraction of DNA from biological samples, preparative-scale DNA isolations, and sorting of DNA-labeled non-nucleic acid targets.


2021 ◽  
Vol 12 ◽  
Author(s):  
Samuel Daniel Lup ◽  
David Wilson-Sánchez ◽  
Sergio Andreu-Sánchez ◽  
José Luis Micol

Mapping-by-sequencing strategies combine next-generation sequencing (NGS) with classical linkage analysis, allowing rapid identification of the causal mutations of the phenotypes exhibited by mutants isolated in a genetic screen. Computer programs that analyze NGS data obtained from a mapping population of individuals derived from a mutant of interest to identify a causal mutation are available; however, the installation and usage of such programs requires bioinformatic skills, modifying or combining pieces of existing software, or purchasing licenses. To ease this process, we developed Easymap, an open-source program that simplifies the data analysis workflows from raw NGS reads to candidate mutations. Easymap can perform bulked segregant mapping of point mutations induced by ethyl methanesulfonate (EMS) with DNA-seq or RNA-seq datasets, as well as tagged-sequence mapping for large insertions, such as transposons or T-DNAs. The mapping analyses implemented in Easymap have been validated with experimental and simulated datasets from different plant and animal model species. Easymap was designed to be accessible to all users regardless of their bioinformatics skills by implementing a user-friendly graphical interface, a simple universal installation script, and detailed mapping reports, including informative images and complementary data for assessment of the mapping results. Easymap is available at http://genetics.edu.umh.es/resources/easymap; its Quickstart Installation Guide details the recommended procedure for installation.


2020 ◽  
Vol 21 (19) ◽  
pp. 7069 ◽  
Author(s):  
Elisa Regalbuto ◽  
Anna Anselmo ◽  
Stefania De Sanctis ◽  
Valeria Franchini ◽  
Florigio Lista ◽  
...  

The increasing exposure to radiofrequency electromagnetic fields (RF-EMF), especially from wireless communication devices, raises questions about their possible adverse health effects. So far, several in vitro studies evaluating RF-EMF genotoxic and cytotoxic non-thermal effects have reported contradictory results that could be mainly due to inadequate experimental design and lack of well-characterized exposure systems and conditions. Moreover, a topic poorly investigated is related to signal modulation induced by electromagnetic fields. The aim of this study was to perform an analysis of the potential non-thermal biological effects induced by 2.45 GHz exposures through a characterized exposure system and a multimethodological approach. Human fibroblasts were exposed to continuous (CW) and pulsed (PW) signals for 2 h in a wire patch cell-based exposure system at the specific absorption rate (SAR) of 0.7 W/kg. The evaluation of the potential biological effects was carried out through a multimethodological approach, including classical biological markers (genotoxic, cell cycle, and ultrastructural) and the evaluation of gene expression profile through the powerful high-throughput next generation sequencing (NGS) RNA sequencing (RNA-seq) approach. Our results suggest that 2.45 GHz radiofrequency fields did not induce significant biological effects at a cellular or molecular level for the evaluated exposure parameters and conditions.


Sign in / Sign up

Export Citation Format

Share Document