scholarly journals ORFhunteR: an accurate approach for the automatic identification and annotation of open reading frames in human mRNA molecules

2021 ◽  
Author(s):  
Vasily V. Grinev ◽  
Mikalai M. Yatskou ◽  
Victor V. Skakun ◽  
Maryna K. Chepeleva ◽  
Petr V. Nazarov

AbstractMotivationModern methods of whole transcriptome sequencing accurately recover nucleotide sequences of RNA molecules present in cells and allow for determining their quantitative abundances. The coding potential of such molecules can be estimated using open reading frames (ORF) finding algorithms, implemented in a number of software packages. However, these algorithms show somewhat limited accuracy, are intended for single-molecule analysis and do not allow selecting proper ORFs in the case of long mRNAs containing multiple ORF candidates.ResultsWe developed a computational approach, corresponding machine learning model and a package, dedicated to automatic identification of the ORFs in large sets of human mRNA molecules. It is based on vectorization of nucleotide sequences into features, followed by classification using a random forest. The predictive model was validated on sets of human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. The developed methods and pre-trained classification model were implemented in a powerful ORFhunteR computational tool that performs an automatic identification of true ORFs among large set of human mRNA molecules.Availability and implementationThe developed open-source R package ORFhunteR is available for the community at GitHub repository (https://github.com/rfctbio-bsu/ORFhunteR), from Bioconductor (https://bioconductor.org/packages/devel/bioc/html/ORFhunteR.html) and as a web application (http://orfhunter.bsu.by).

2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1405.1-1406
Author(s):  
F. Morton ◽  
J. Nijjar ◽  
C. Goodyear ◽  
D. Porter

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared


2019 ◽  
Vol 4 ◽  
pp. 113 ◽  
Author(s):  
Venexia M Walker ◽  
Neil M Davies ◽  
Gibran Hemani ◽  
Jie Zheng ◽  
Philip C Haycock ◽  
...  

Mendelian randomization (MR) estimates the causal effect of exposures on outcomes by exploiting genetic variation to address confounding and reverse causation. This method has a broad range of applications, including investigating risk factors and appraising potential targets for intervention. MR-Base has become established as a freely accessible, online platform, which combines a database of complete genome-wide association study results with an interface for performing Mendelian randomization and sensitivity analyses. This allows the user to explore millions of potentially causal associations. MR-Base is available as a web application or as an R package. The technical aspects of the tool have previously been documented in the literature. The present article is complementary to this as it focuses on the applied aspects. Specifically, we describe how MR-Base can be used in several ways, including to perform novel causal analyses, replicate results and enable transparency, amongst others. We also present three use cases, which demonstrate important applications of Mendelian randomization and highlight the benefits of using MR-Base for these types of analyses.


2020 ◽  
Author(s):  
Kumari Sonal Choudhary ◽  
Eoin Fahy ◽  
Kevin Coakley ◽  
Manish Sud ◽  
Mano R Maurya ◽  
...  

ABSTRACTWith the advent of high throughput mass spectrometric methods, metabolomics has emerged as an essential area of research in biomedicine with the potential to provide deep biological insights into normal and diseased functions in physiology. However, to achieve the potential offered by metabolomics measures, there is a need for biologist-friendly integrative analysis tools that can transform data into mechanisms that relate to phenotypes. Here, we describe MetENP, an R package, and a user-friendly web application deployed at the Metabolomics Workbench site extending the metabolomics enrichment analysis to include species-specific pathway analysis, pathway enrichment scores, gene-enzyme information, and enzymatic activities of the significantly altered metabolites. MetENP provides a highly customizable workflow through various user-specified options and includes support for all metabolite species with available KEGG pathways. MetENPweb is a web application for calculating metabolite and pathway enrichment analysis.Availability and ImplementationThe MetENP package is freely available from Metabolomics Workbench GitHub: (https://github.com/metabolomicsworkbench/MetENP), the web application, is freely available at (https://www.metabolomicsworkbench.org/data/analyze.php)


2019 ◽  
Vol 32 (9) ◽  
pp. 1067-1076 ◽  
Author(s):  
Javier F. Tabima ◽  
Niklaus J. Grünwald

Effectors are small, secreted proteins that facilitate infection of host plants by all major groups of plant pathogens. Effector protein identification in oomycetes relies on identification of open reading frames with certain amino acid motifs among additional minor criteria. To date, identification of effectors relies on custom scripts to identify motifs in candidate open reading frames. Here, we developed the R package effectR, which provides a convenient tool for rapid prediction of effectors in oomycete genomes, or with custom scripts for any genome, in a reproducible way. The effectR package relies on a combination of regular expressions statements and hidden Markov model approaches to predict candidate RxLR and crinkler effectors. Other custom motifs for novel effectors can easily be implemented and added to package updates. The effectR package has been validated with published oomycete genomes. This package provides a convenient tool for wet lab researchers interested in reproducible identification of candidate effectors in oomycete genomes.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Alexander Davis ◽  
Ruli Gao ◽  
Nicholas E. Navin

Abstract Background In single cell DNA and RNA sequencing experiments, the number of cells to sequence must be decided before running an experiment, and afterwards, it is necessary to decide whether sufficient cells were sampled. These questions can be addressed by calculating the probability of sampling at least a defined number of cells from each subpopulation (cell type or cancer clone). Results We developed an interactive web application called SCOPIT (Single-Cell One-sided Probability Interactive Tool), which calculates the required probabilities using a multinomial distribution (www.navinlab.com/SCOPIT). In addition, we created an R package called pmultinom for scripting these calculations. Conclusions Our tool for fast multinomial calculations provide a simple and intuitive procedure for prospectively planning single-cell experiments or retrospectively evaluating if sufficient numbers of cells have been sequenced. The web application can be accessed at navinlab.com/SCOPIT.


Genetics ◽  
1993 ◽  
Vol 133 (4) ◽  
pp. 933-942
Author(s):  
J L Azevedo ◽  
B C Hyman

Abstract Complete nucleotide sequences, precise endpoints and coding potential of several 3.0-kilobase mitochondrial DNA (mtDNA) repeating units derived from two isofemale lineages of the mermithid nematode Romanomermis culicivorax have been determined. Endpoint analysis has allowed us to infer deletion and inversion events that most likely generated the present day repeat configuration. Each amplified unit contains the genes for NADH dehydrogenase subunits 3 and 6 (ND3 and ND6), an open reading frame (ORF 1) that represents a cytochrome P450-like gene, and three additional unidentified open reading frames. The primary nucleotide sequences of the R. culicivorax mt-repeat copies within individual haplotypes are highly conserved; three nearly complete copies of the repeat unit vary by 0.01% at the nucleotide level. These observations suggest that concerted evolution mechanisms may be active, resulting in sequence homogenation of these lengthy duplications.


2018 ◽  
Author(s):  
Javier F. Tabima ◽  
Niklaus J. Grünwald

ABSTRACTEffectors are by one definition small, secreted proteins that facilitate infection of host plants by all major groups of plant pathogens. Effector protein identification in oomycetes relies on identification of open reading frames with certain amino acid motifs among additional minor criteria. To date, identification of effectors relies on custom scripts to identify motifs in candidate open reading frames. Here, we developed the R package effectR that provides a convenient tool for rapid prediction of effectors in oomycete genomes, or with custom scripts for any genome, in a reproducible way. The effectR package relies on a combination of regular expressions statements and hidden Markov model approaches to predict candidate RxLR and CRN effectors. Other custom motifs for novel effectors can easily be implemented and added to package updates. The effectR package has been validated with published oomycete genomes. This package provides a convenient tool for reproducible identification of candidate effectors in oomycete genomes.


2016 ◽  
Author(s):  
Nan Xiao ◽  
Qing-Song Xu ◽  
Miao-Zhu Li

AbstractSummaryWe developed hdnom, an R package for survival modeling with high-dimensional data. The package is the first free and open-source software package that streamlines the workflow of penalized Cox model building, validation, calibration, comparison, and nomogram visualization, with nine types of penalized Cox regression methods fully supported. A web application and an online prediction tool maker are offered to enhance interac-tivity and flexibility in high-dimensional survival analysis.AvailabilityThe hdnom R package is available from CRAN:https://cran.r-project.org/package=hdnomunder GPL. The hdnom web application can be accessed athttp://hdnom.io. The web application maker is available fromhttp://hdnom.org/appmaker. The hdnom project website:http://[email protected]@duke.edu


2020 ◽  
Author(s):  
Urminder Singh ◽  
Eve Syrkin Wurtele

SummarySearching for ORFs in transcripts is a critical step prior to annotating coding regions in newly-sequenced genomes and to search for alternative reading frames within known genes. With the tremendous increase in RNA-Seq data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in fasta sequences. The search is rapid and is fully customizable, with a choice of Fasta and BED output formats.Availability and implementationorfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy)[email protected], [email protected] informationSupplementary data are available at https://github.com/urmi-21/orfipy


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Henry E. Miller ◽  
Alexander J. R. Bishop

Abstract Background Co-expression correlations provide the ability to predict gene functionality within specific biological contexts, such as different tissue and disease conditions. However, current gene co-expression databases generally do not consider biological context. In addition, these tools often implement a limited range of unsophisticated analysis approaches, diminishing their utility for exploring gene functionality and gene relationships. Furthermore, they typically do not provide the summary visualizations necessary to communicate these results, posing a significant barrier to their utilization by biologists without computational skills. Results We present Correlation AnalyzeR, a user-friendly web interface for exploring co-expression correlations and predicting gene functions, gene–gene relationships, and gene set topology. Correlation AnalyzeR provides flexible access to its database of tissue and disease-specific (cancer vs normal) genome-wide co-expression correlations, and it also implements a suite of sophisticated computational tools for generating functional predictions with user-friendly visualizations. In the usage example provided here, we explore the role of BRCA1-NRF2 interplay in the context of bone cancer, demonstrating how Correlation AnalyzeR can be effectively implemented to generate and support novel hypotheses. Conclusions Correlation AnalyzeR facilitates the exploration of poorly characterized genes and gene relationships to reveal novel biological insights. The database and all analysis methods can be accessed as a web application at https://gccri.bishop-lab.uthscsa.edu/correlation-analyzer/ and as a standalone R package at https://github.com/Bishop-Laboratory/correlationAnalyzeR.


Sign in / Sign up

Export Citation Format

Share Document