scholarly journals RP-REP Ribosomal Profiling Reports: an open-source cloud-enabled framework for reproducible ribosomal profiling data processing, analysis, and result reporting

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 143
Author(s):  
Travis L. Jensen ◽  
William F. Hooper ◽  
Sami R. Cherikh ◽  
Johannes B. Goll

Ribosomal profiling is an emerging experimental technology to measure protein synthesis by sequencing short mRNA fragments undergoing translation in ribosomes. Applied on the genome wide scale, this is a powerful tool to profile global protein synthesis within cell populations of interest. Such information can be utilized for biomarker discovery and detection of treatment-responsive genes. However, analysis of ribosomal profiling data requires careful preprocessing to reduce the impact of artifacts and dedicated statistical methods for visualizing and modeling the high-dimensional discrete read count data. Here we present Ribosomal Profiling Reports (RP-REP), a new open-source cloud-enabled software that allows users to execute start-to-end gene-level ribosomal profiling and RNA-Seq analysis on a pre-configured Amazon Virtual Machine Image (AMI) hosted on AWS or on the user’s own Ubuntu Linux server. The software works with FASTQ files stored locally, on AWS S3, or at the Sequence Read Archive (SRA). RP-REP automatically executes a series of customizable steps including filtering of contaminant RNA, enrichment of true ribosomal footprints, reference alignment and gene translation quantification, gene body coverage, CRAM compression, reference alignment QC, data normalization, multivariate data visualization, identification of differentially translated genes, and generation of heatmaps, co-translated gene clusters, enriched pathways, and other custom visualizations. RP-REP provides functionality to contrast RNA-SEQ and ribosomal profiling results, and calculates translational efficiency per gene. The software outputs a PDF report and publication-ready table and figure files. As a use case, we provide RP-REP results for a dengue virus study that tested cytosol and endoplasmic reticulum cellular fractions of human Huh7 cells pre-infection and at 6 h, 12 h, 24 h, and 40 h post-infection. Case study results, Ubuntu installation scripts, and the most recent RP-REP source code are accessible at GitHub. The cloud-ready AMI is available at AWS (AMI ID: RPREP RSEQREP (Ribosome Profiling and RNA-Seq Reports) v2.1 (ami-00b92f52d763145d3)).

2017 ◽  
Author(s):  
Luca Venturini ◽  
Shabhonam Caim ◽  
Gemy G Kaithakottil ◽  
Daniel L Mapleson ◽  
David Swarbreck

AbstractThe performance of RNA-Seq aligners and assemblers varies greatly across different organisms and experiments, and often the optimal approach is not known beforehand. Here we show that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-Seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics, while solving common artefacts such as erroneous transcript chimerisms. We have implemented this method in an open-source Python3 and Cython program, Mikado, available at https://github.com/lucventurini/Mikado.


2019 ◽  
Author(s):  
Ayman Yousif ◽  
Nizar Drou ◽  
Jillian Rowe ◽  
Mohammed Khalfan ◽  
Kristin C Gunsalus

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.


2017 ◽  
Author(s):  
Luke Zappia ◽  
Belinda Phipson ◽  
Alicia Oshlack

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.


2017 ◽  
Vol 3 ◽  
pp. e121 ◽  
Author(s):  
Bahar Sateli ◽  
Felicitas Löffler ◽  
Birgitta König-Ries ◽  
René Witte

Motivation Scientists increasingly rely on intelligent information systems to help them in their daily tasks, in particular for managing research objects, like publications or datasets. The relatively young research field of Semantic Publishing has been addressing the question how scientific applications can be improved through semantically rich representations of research objects, in order to facilitate their discovery and re-use. To complement the efforts in this area, we propose an automatic workflow to construct semantic user profiles of scholars, so that scholarly applications, like digital libraries or data repositories, can better understand their users’ interests, tasks, and competences, by incorporating these user profiles in their design. To make the user profiles sharable across applications, we propose to build them based on standard semantic web technologies, in particular the Resource Description Framework (RDF) for representing user profiles and Linked Open Data (LOD) sources for representing competence topics. To avoid the cold start problem, we suggest to automatically populate these profiles by analyzing the publications (co-)authored by users, which we hypothesize reflect their research competences. Results We developed a novel approach, ScholarLens, which can automatically generate semantic user profiles for authors of scholarly literature. For modeling the competences of scholarly users and groups, we surveyed a number of existing linked open data vocabularies. In accordance with the LOD best practices, we propose an RDF Schema (RDFS) based model for competence records that reuses existing vocabularies where appropriate. To automate the creation of semantic user profiles, we developed a complete, automated workflow that can generate semantic user profiles by analyzing full-text research articles through various natural language processing (NLP) techniques. In our method, we start by processing a set of research articles for a given user. Competences are derived by text mining the articles, including syntactic, semantic, and LOD entity linking steps. We then populate a knowledge base in RDF format with user profiles containing the extracted competences.We implemented our approach as an open source library and evaluated our system through two user studies, resulting in mean average precision (MAP) of up to 95%. As part of the evaluation, we also analyze the impact of semantic zoning of research articles on the accuracy of the resulting profiles. Finally, we demonstrate how these semantic user profiles can be applied in a number of use cases, including article ranking for personalized search and finding scientists competent in a topic —e.g., to find reviewers for a paper. Availability All software and datasets presented in this paper are available under open source licenses in the supplements and documented at http://www.semanticsoftware.info/semantic-user-profiling-peerj-2016-supplements. Additionally, development releases of ScholarLens are available on our GitHub page: https://github.com/SemanticSoftwareLab/ScholarLens.


Trials ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Jackie Bonilla ◽  
Alia Alhomsi ◽  
Jasmine Santoyo-Olsson ◽  
Anita L. Stewart ◽  
Carmen Ortiz ◽  
...  

Abstract Background An often heard and justifiable concern of ethnic minorities is related to researchers’ lack of attention to sharing the results of a study with participants after the study has concluded. Few studies have examined the effects of returning overall study results on participants’ attitudes, especially among populations underrepresented in research. Among Latina research participants, providing a summary of study results could enhance participation in research. We assess Latina breast cancer survivors’ reactions to receiving study results and their attitudes about participating in future studies. Methods For this cross-sectional survey study, all women who had participated in two behavioral randomized controlled trials (RCTs) were mailed a letter summarizing the study results (using written and graphic formats) and a questionnaire assessing problems and understanding the results, importance of sharing results, willingness to participate in future studies, and format preferences for receiving the results. A postage-paid envelope for returning the completed questionnaire was included. Logistic regression examined the associations of age, education, and rural/urban residence on format preferences and willingness to participate. The survey sample consisted of 304 low-income, predominantly Spanish-speaking Latina breast cancer survivors (151 from urban and 153 from rural communities) who had participated in two RCTs testing a stress management program designed for Latina breast cancer survivors. Results Ninety-two women returned the questionnaires (30.3%). Most of the women (91.1%) indicated that they had no trouble understanding the results of the study, and 97% agreed that it is very/extremely important for researchers to share the study result with the participants. The majority (60.2%) reported that receiving the results increased their willingness to participate in future studies. About half (51.7%) did not have a format preference, 37.4% preferred written summaries, and 10.9% preferred graphs. Conclusions This study is an important first step to understanding the impact of returning study results among a population that is underrepresented in research. Returning the results of studies and understanding the impact of doing so is consistent with maintaining community involvement in all phases of research. The findings suggest that sharing aggregate research results in simple language yields few problems in participants’ understanding of the results and is viewed as important by participants. Trial registration ClinicalTrials.govNCT02931552 Date registered: October 13, 2016 and NCT01383174 Date registered: June 28, 2011.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2162 ◽  
Author(s):  
Travis L. Jensen ◽  
Michael Frasketi ◽  
Kevin Conway ◽  
Leigh Villarroel ◽  
Heather Hill ◽  
...  

RNA-Seq is increasingly being used to measure human RNA expression on a genome-wide scale. Expression profiles can be interrogated to identify and functionally characterize treatment-responsive genes. Ultimately, such controlled studies promise to reveal insights into molecular mechanisms of treatment effects, identify biomarkers, and realize personalized medicine. RNA-Seq Reports (RSEQREP) is a new open-source cloud-enabled framework that allows users to execute start-to-end gene-level RNA-Seq analysis on a preconfigured RSEQREP Amazon Virtual Machine Image (AMI) hosted by AWS or on their own Ubuntu Linux machine. The framework works with unstranded, stranded, and paired-end sequence FASTQ files stored locally, on Amazon Simple Storage Service (S3), or at the Sequence Read Archive (SRA). RSEQREP automatically executes a series of customizable steps including reference alignment, CRAM compression, reference alignment QC, data normalization, multivariate data visualization, identification of differentially expressed genes, heatmaps, co-expressed gene clusters, enriched pathways, and a series of custom visualizations. The framework outputs a file collection that includes a dynamically generated PDF report using R, knitr, and LaTeX, as well as publication-ready table and figure files. A user-friendly configuration file handles sample metadata entry, processing, analysis, and reporting options. The configuration supports time series RNA-Seq experimental designs with at least one pre- and one post-treatment sample for each subject, as well as multiple treatment groups and specimen types. All RSEQREP analyses components are built using open-source R code and R/Bioconductor packages allowing for further customization. As a use case, we provide RSEQREP results for a trivalent influenza vaccine (TIV) RNA-Seq study that collected 1 pre-TIV and 10 post-TIV vaccination samples (days 1-10) for 5 subjects and two specimen types (peripheral blood mononuclear cells and B-cells).


F1000Research ◽  
2018 ◽  
Vol 6 ◽  
pp. 2162 ◽  
Author(s):  
Travis L. Jensen ◽  
Michael Frasketi ◽  
Kevin Conway ◽  
Leigh Villarroel ◽  
Heather Hill ◽  
...  

RNA-Seq is increasingly being used to measure human RNA expression on a genome-wide scale. Expression profiles can be interrogated to identify and functionally characterize treatment-responsive genes. Ultimately, such controlled studies promise to reveal insights into molecular mechanisms of treatment effects, identify biomarkers, and realize personalized medicine. RNA-Seq Reports (RSEQREP) is a new open-source cloud-enabled framework that allows users to execute start-to-end gene-level RNA-Seq analysis on a preconfigured RSEQREP Amazon Virtual Machine Image (AMI) hosted by AWS or on their own Ubuntu Linux machine via a Docker container or installation script. The framework works with unstranded, stranded, and paired-end sequence FASTQ files stored locally, on Amazon Simple Storage Service (S3), or at the Sequence Read Archive (SRA). RSEQREP automatically executes a series of customizable steps including reference alignment, CRAM compression, reference alignment QC, data normalization, multivariate data visualization, identification of differentially expressed genes, heatmaps, co-expressed gene clusters, enriched pathways, and a series of custom visualizations. The framework outputs a file collection that includes a dynamically generated PDF report using R, knitr, and LaTeX, as well as publication-ready table and figure files. A user-friendly configuration file handles sample metadata entry, processing, analysis, and reporting options. The configuration supports time series RNA-Seq experimental designs with at least one pre- and one post-treatment sample for each subject, as well as multiple treatment groups and specimen types. All RSEQREP analyses components are built using open-source R code and R/Bioconductor packages allowing for further customization. As a use case, we provide RSEQREP results for a trivalent influenza vaccine (TIV) RNA-Seq study that collected 1 pre-TIV and 10 post-TIV vaccination samples (days 1-10) for 5 subjects and two specimen types (peripheral blood mononuclear cells and B-cells).


2019 ◽  
Author(s):  
Dylan Sheerin ◽  
Daniel O’Connor ◽  
Andrew J Pollard ◽  
Irina Mohorianu

AbstractMotivationInconsistent, analytical noise introduced either by the sequencing technology or by the choice of read-processing tools can bias bulk RNA-seq analyses by shifting the focus to the variation in expression of low-abundance transcripts; as a consequence these highly-variable genes are often included the differential expression (DE) call and impact the interpretation of results.ResultsTo illustrate the effects of “noise”, we present simulated datasets following closely the characteristics of a H.sapiens and a M.musculus dataset, respectively, highlighting the extent of technical-noise in both a high inter-individual variability (H. sapiens) and reduced variability (M. Musculus) setup. The sequencing-induced noise is assessed using correlations of distributions of expression across transcripts; analytical noise is evaluated through side-by-side comparisons of several standard choices. The proportion of genes in the noise-range differs for each tool combi-nation. Data-driven, sample-specific noise-thresholds were applied to reduce the impact of low-level variation. Noise-adjustment reduced the number of significantly DE genes and gave rise to convergent calls across tool combinations.AvailabilityThe code for determining the sequence-derived noise is available for download from: https://github.com/yry/noiseAnalysis/tree/master/noiseDetection_mRNA; the code for running the analysis is available for download from: https://github.com/sheerind/noise_detection.


2020 ◽  
Author(s):  
Davide Risso ◽  
Stefano M. Pagnotta

AbstractMotivationData transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformations on the outcome of unsupervised clustering procedures is still unclear.ResultsHere, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.AvailabilityThe AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst_analysis.


BMJ Open ◽  
2019 ◽  
Vol 9 (6) ◽  
pp. e027443 ◽  
Author(s):  
Marie Gérardin ◽  
Morgane Rousselet ◽  
Pascal Caillet ◽  
Marie Grall-Bronnec ◽  
Pierre Loué ◽  
...  

IntroductionIn recent years, data collected by the French Addictovigilance Network have shown the potential for abuse and addiction associated with zolpidem (the most sold hypnotic drug in France). Since 10 April 2017, new regulations have come into force that require zolpidem to be prescribed on special secure prescription pads, in order to reduce the risk of abuse or misuse. This measure has far-reaching repercussions that are not only limited to the consumption of zolpidem but also extend to the usage of sedative medication on a whole. The objective of the ZOlpidem and the Reinforcement of the Regulation of prescription Orders (ZORRO) study is to evaluate the overall impact of the new regulatory framework requiring zolpidem to be prescribed on special secure prescription pads. Three axes will be evaluated: the number of consumers, the type of consumption (chronic use versus occasional use, problematic consumption versus non-problematic use) and the consumption of other sedative molecules.The study has been registered in the Protocol Registration and Results System under the numberNCT03584542at stage "Pre-results".Methods and analysisThe ZORRO study is an epidemiological, observational, national multicentre, non-controlled, prospective research project supported by the French National Agency for Medicines and Health Products Safety. The evaluation of the impact of the regulatory framework change relative to zolpidem will be done according to two axes: via an epidemiological study of the French National Health Insurance database and by the implementation of field studies of prescribers and consumers of zolpidem.Ethics and disseminationThe Nantes Research Ethics Committee (Groupe Nantais d’Ethique dans le Domaine de la Santé), the Committee for the Protection of the Population and the Committee of Expertise in Research, Studies and Evaluations in the Field of Health approved this study. Results will be presented in national and international conferences and submitted to peer-reviewed journals.Trial registration numberNCT03584542; Pre-results.


Sign in / Sign up

Export Citation Format

Share Document