scholarly journals phylostratr: A framework for phylostratigraphy

2018 ◽  
Author(s):  
Zebulun Arendsee ◽  
Jing Li ◽  
Urminder Singh ◽  
Arun Seetharam ◽  
Karin Dorman ◽  
...  

AbstractMotivationThe goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. Currently, there are no general pipelines for this task. We present an R package, phylostratr, to fill this gap, making high-quality phylostratigraphic analysis accessible to non-specialists.ResultsPhylostratigraphic analysis entails searching for homologs within increasingly broad clades. The highest clade that contains all homologs of a gene is that gene’s phylostratum. We have created a general R-based framework, phylostratr, for estimating the phylostratum of every gene in a species. The program can fully automate an analysis: select species for a balanced representation of each strata, retrieve the sequences from UniProt, build BLAST databases, run BLAST, infer homologs for each gene against each subject species, determine phylostrata, and return summaries and diagnostics. phylostratr allows extensive customization. A user may: modify the automatically-generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. phylostratr also offers proteome quality assessments, false-positive diagnostics, and checks for missing organelle genomes. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae.Availabilityphylostratr source code and vignettes are available on GitHub at https://github.com/arendsee/[email protected]

2019 ◽  
Vol 35 (19) ◽  
pp. 3617-3627 ◽  
Author(s):  
Zebulun Arendsee ◽  
Jing Li ◽  
Urminder Singh ◽  
Arun Seetharam ◽  
Karin Dorman ◽  
...  

Abstract Motivation The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene’s phylostratum. Results We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. Availability and implementation Source code available at https://github.com/arendsee/phylostratr. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Chun-Hui Gao ◽  
Guangchuang Yu ◽  
Peng Cai

Venn diagrams are widely used diagrams to show the set relationships in biomedical studies. In this study, we developed ggVennDiagram, an R package that could automatically generate high-quality Venn diagrams with two to seven sets. The ggVennDiagram is built based on ggplot2, and it integrates the advantages of existing packages, such as venn, RVenn, VennDiagram, and sf. Satisfactory results can be obtained with minimal configurations. Furthermore, we designed comprehensive objects to store the entire data of the Venn diagram, which allowed free access to both intersection values and Venn plot sub-elements, such as set label/edge and region label/filling. Therefore, high customization of every Venn plot sub-element can be fulfilled without increasing the cost of learning when the user is familiar with ggplot2 methods. To date, ggVennDiagram has been cited in more than 10 publications, and its source code repository has been starred by more than 140 GitHub users, suggesting a great potential in applications. The package is an open-source software released under the GPL-3 license, and it is freely available through CRAN (https://cran.r-project.org/package=ggVennDiagram).


Data Science ◽  
2021 ◽  
pp. 1-21
Author(s):  
Caspar J. Van Lissa ◽  
Andreas M. Brandmaier ◽  
Loek Brinkman ◽  
Anna-Lena Lamprecht ◽  
Aaron Peikert ◽  
...  

Adopting open science principles can be challenging, requiring conceptual education and training in the use of new tools. This paper introduces the Workflow for Open Reproducible Code in Science (WORCS): A step-by-step procedure that researchers can follow to make a research project open and reproducible. This workflow intends to lower the threshold for adoption of open science principles. It is based on established best practices, and can be used either in parallel to, or in absence of, top-down requirements by journals, institutions, and funding bodies. To facilitate widespread adoption, the WORCS principles have been implemented in the R package worcs, which offers an RStudio project template and utility functions for specific workflow steps. This paper introduces the conceptual workflow, discusses how it meets different standards for open science, and addresses the functionality provided by the R implementation, worcs. This paper is primarily targeted towards scholars conducting research projects in R, conducting research that involves academic prose, analysis code, and tabular data. However, the workflow is flexible enough to accommodate other scenarios, and offers a starting point for customized solutions. The source code for the R package and manuscript, and a list of examplesof WORCS projects, are available at https://github.com/cjvanlissa/worcs.


2021 ◽  
Author(s):  
Jason Hunter ◽  
Mark Thyer ◽  
Dmitri Kavetski ◽  
David McInerney

<p>Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.</p><p>We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application.  </p><p>We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.</p><p>The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.</p>


2018 ◽  
Author(s):  
Jianfeng Li ◽  
Bowen Cui ◽  
Yuting Dai ◽  
Ling Bai ◽  
Jinyan Huang

The number of bioinformatics resources, such as tools/scripts and databases are growing exponentially. This poses a great challenge for users to access, manage, and integrate the corresponding bioinformatics resources. To facilitate the request, we proposed a comprehensive R package, BioInstaller, which includes the R functions, Shiny application, and the HTTP representational state transfer (REST) application programming interfaces (APIs). We also established a community-based configuration pool to collect, access and share bioinformatics resources. The source code of BioInstaller is freely available at our lab website http://bioinfo.rjh.com.cn/labs/jhuang/tools/bioinstaller or popular package host GitHub at: https://github.com/JhuangLab/BioInstaller. Also, a docker image can be downloaded from DockerHub (https://hub.docker.com/r/bioinstaller).


2020 ◽  
Vol 84 (3) ◽  
Author(s):  
S. M. Taipakova ◽  
A. K. Kuanbay ◽  
D. Manatkyzy ◽  
I. T. Smekenov ◽  
S. D. Alybayev ◽  
...  

2020 ◽  
Author(s):  
Rodrigo Gazaffi ◽  
Rodrigo R. Amadeu ◽  
Marcelo Mollinari ◽  
João R. B. F. Rosa ◽  
Cristiane H. Taniguti ◽  
...  

ABSTRACTAccurate QTL mapping in outcrossing species requires software programs which consider genetic features of these populations, such as markers with different segregation patterns and different level of information. Although the available mapping procedures to date allow inferring QTL position and effects, they are mostly not based on multilocus genetic maps. Having a QTL analysis based in such maps is crucial since they allow informative markers to propagate their information to less informative intervals of the map. We developed fullsibQTL, a novel and freely available R package to perform composite interval QTL mapping considering outcrossing populations and markers with different segregation patterns. It allows to estimate QTL position, effects, segregation patterns, and linkage phase with flanking markers. Additionally, several statistical and graphical tools are implemented, for straightforward analysis and interpretations. fullsibQTL is an R open source package with C and R source code (GPLv3). It is multiplatform and can be installed from https://github.com/augusto-garcia/fullsibQTL.


2021 ◽  
Vol 118 (47) ◽  
pp. e2107543118
Author(s):  
Xiang Li ◽  
Jun Zhang ◽  
Jiyue Huang ◽  
Jing Xu ◽  
Zhiyu Chen ◽  
...  

During meiosis, crossovers (COs) are typically required to ensure faithful chromosomal segregation. Despite the requirement for at least one CO between each pair of chromosomes, closely spaced double COs are usually underrepresented due to a phenomenon called CO interference. Like Mus musculus and Saccharomyces cerevisiae, Arabidopsis thaliana has both interference-sensitive (Class I) and interference-insensitive (Class II) COs. However, the underlying mechanism controlling CO distribution remains largely elusive. Both AtMUS81 and AtFANCD2 promote the formation of Class II CO. Using both AtHEI10 and AtMLH1 immunostaining, two markers of Class I COs, we show that AtFANCD2 but not AtMUS81 is required for normal Class I CO distribution among chromosomes. Depleting AtFANCD2 leads to a CO distribution pattern that is intermediate between that of wild-type and a Poisson distribution. Moreover, in Atfancm, Atfigl1, and Atrmi1 mutants where increased Class II CO frequency has been reported previously, we observe Class I CO distribution patterns that are strikingly similar to Atfancd2. Surprisingly, we found that AtFANCD2 plays opposite roles in regulating CO frequency in Atfancm compared with either in Atfigl1 or Atrmi1. Together, these results reveal that although AtFANCD2, AtFANCM, AtFIGL1, and AtRMI1 regulate Class II CO frequency by distinct mechanisms, they have similar roles in controlling the distribution of Class I COs among chromosomes.


Sign in / Sign up

Export Citation Format

Share Document