scholarly journals Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data

2019 ◽  
Author(s):  
Lucas Czech ◽  
Pierre Barbera ◽  
Alexandros Stamatakis

SummaryWe present GENESIS, a library for working with phylogenetic data, and GAPPA, an accompanying command line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies, and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested, and field-proven.Availability and ImplementationBoth GENESIS and GAPPA are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/[email protected] and [email protected].

2020 ◽  
Vol 36 (10) ◽  
pp. 3263-3265 ◽  
Author(s):  
Lucas Czech ◽  
Pierre Barbera ◽  
Alexandros Stamatakis

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Julian Liber ◽  
Gregory Bonito ◽  
Gian Maria Niccolò Benucci

SummaryCONSTAX - the CONSensus TAXonomy classifier - was developed for accurate and reproducible taxonomic annotation of fungal rDNA amplicons and is based upon a consensus approach of RDP, SINTAX and UTAX algorithms. CONSTAX2 can be used to classify prokaryotes and incorporates BLAST-based classifiers to reduce classification errors. Additionally, CONSTAX2 implements a conda-installable, command line tool with improved classification metrics, faster training, multithreading support, capacity to incorporate external taxonomic databases, new isolate matching and high-level taxonomy tools, replete with documentation and example tutorials.Availability and ImplementationCONSTAX2 is available at https://github.com/liberjul/CONSTAXv2, and is packaged for Linux and MacOS from Bioconda. A tutorial and documentation are available at https://constax.readthedocs.io/en/latest/.


2015 ◽  
Author(s):  
Timothy G Stephens ◽  
Debashish Bhattacharya ◽  
Mark A Ragan ◽  
Cheong Xin Chan

A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, such as robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for sorting phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa, with customisable options to assess clades within the context of the whole tree. PhySortR is a command-line tool that is freely available, highly scalable, and easily automatable.


Author(s):  
Jonas Förster ◽  
Frank T Bergmann ◽  
Jürgen Pahle

Abstract Motivation COPASI is a biochemical simulator and model analyzer which has found widespread use in academic research, teaching and beyond. One of COPASI’s strengths is its graphical user interface, and this is what most users work with. COPASI also provides a command-line tool. So far, an intuitive scripting interface that allows the creation and documentation of systems biology workflows was missing though. Results We have developed CoRC, the COPASI R Connector, an R package which provides a high-level scripting interface for COPASI. It closely mirrors the thought process of a (graphical interface) user and should therefore be very easy to use. This allows for complex workflows to be reproducibly scripted, utilizing COPASI’s powerful analytic toolset in combination with R’s extensive analysis and package ecosystem. Availability and implementation CoRC is a free and open-source R package, available via GitHub at https://jpahle.github.io/CoRC/ under the Artistic-2.0 license.   Supplementary information: We provide tutorial articles as well as several example scripts on the project’s website.


2020 ◽  
pp. 115-125
Author(s):  
K.A. Zhereb ◽  

Python is a popular programming language used in many areas, but its performance is significantly lower than many compiled languages. We propose an approach to increasing performance of Python code by transforming fragments of code to more efficient languages such as Cython and C++. We use high-level algebraic models and rewriting rules technique for semi-automated code transformation. Performance-critical fragments of code are transformed into a low-level syntax model using Python parser. Then this low-level model is further transformed into a high-level algebraic model that is language-independent and easier to work with. The transformation is automated using rewriting rules implemented in Termware system. We also improve the constructed high-level model by deducing additional information such as data types and constraints. From this enhanced high-level model of code we generate equivalent fragments of code using code generators for Cython and C++ languages. Cython code is seamlessly integrated with Python code, and for C++ code we generate a small utility file in Cython that also integrates this code with Python. This way, the bulk of program code can stay in Python and benefit from its facilities, but performance-critical fragments of code are transformed into more efficient equivalents, improving the performance of resulting program. Comparison of execution times between initial version of Python code, different versions of transformed code and using automatic tools such as Cython compiler and PyPy demonstrates the benefits of our approach – we have achieved performance gains of over 50x compared to the initial version written in Python, and over 2x compared to the best automatic tool we have tested.


2020 ◽  
Author(s):  
Christopher Wilks ◽  
Omar Ahmed ◽  
Daniel N. Baker ◽  
David Zhang ◽  
Leonardo Collado-Torres ◽  
...  

AbstractMotivationA common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types.ResultsMegadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19,000 GTExV8 BigWig files in approximately one hour using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package.Availabilityhttps://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/[email protected]


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12450
Author(s):  
Cristian Román Palacios ◽  
April Wright ◽  
Josef Uyeda

The number of terminals in phylogenetic trees has significantly increased over the last decade. This trend reflects recent advances in next-generation sequencing, accessibility of public data repositories, and the increased use of phylogenies in many fields. Despite R being central to the analysis of phylogenetic data, manipulation of phylogenetic comparative datasets remains slow, complex, and poorly reproducible. Here, we describe the first R package extending the functionality and syntax of data.table to explicitly deal with phylogenetic comparative datasets. treedata.table significantly increases speed and reproducibility during the data manipulation steps involved in the phylogenetic comparative workflow in R. The latest release of treedata.table is currently available through CRAN (https://cran.r-project.org/web/packages/treedata.table/). Additional documentation can be accessed through rOpenSci (https://ropensci.github.io/treedata.table/).


2020 ◽  
Author(s):  
George G. Vega Yon ◽  
Duncan C. Thomas ◽  
John Morrison ◽  
Huaiyu Mi ◽  
Paul D. Thomas ◽  
...  

AbstractMotivationGene function annotation is important for a variety of downstream analyses of genetic data. Yet experimental characterization of function remains costly and slow, making computational prediction an important endeavor. In this paper we use a probabilistic evolutionary model built upon phylogenetic trees and experimental Gene Ontology functional annotations that allows automated prediction of function for unannotated genes.ResultsWe have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out validation, and we further validated some of the predictions in the experimental scientific literature.AvailabilityOur method has been implemented as an R package and it is available online at https://github.com/USCBiostats/aphylo. Code needed to reproduce the tables and figures can be found in https://github.com/USCbiostats/aphylo-simulations.Author summaryUnderstanding the individual role that genes play in life is a key issue in biomedical-sciences. While information regarding gene functions is continuously growing, the number of genes with unknown biological purpose is yet greater. Because of this, scientists have dedicated much of their time to build and design tools that automatically infer gene functions. In this paper, we present yet another attempt to do such. While very simple, our model of gene-function evolution has some key features that have the potential to generate an impact in the field: (a) compared to other methods, ours is highly-scalable, which means that it is possible to simultaneously analyze hundreds of what are known as gene-families, compromising thousands of genes, (b) supports our biological intuition as our model’s data-driven results coherently agree with what theory dictates regarding how gene-functions evolved, (c) notwithstanding its simplicity, the model’s prediction accuracy is comparable to other more complex alternatives, and (d) perhaps most importantly, it can be used to both support new annotations and to suggest areas in which existing annotations show inconsistencies that may indicate errors or controversies.


2018 ◽  
Author(s):  
Nicholas R. Waters ◽  
Florence Abram ◽  
Fiona Brennan ◽  
Ashleigh Holmes ◽  
Leighton Pritchard

SummaryThe Clermont PCR method of phylotyping Escherichia coli has remained a useful classification scheme despite the proliferation of higher-resolution sequence typing schemes. We have implemented an in silico Clermont PCR method as both a web app and as a command-line tool to allow researchers to easily apply this phylotyping scheme to genome assemblies easily.Availability and ImplementationEzClermont is available as a web app at http://www.ezclermont.org. For local use, EzClermont can be installed with pip or installed from the source code at https://github.com/nickp60/ezclermont. All analysis was done with version [email protected], [email protected] informationTable S1: test dataset; S2: validation dataset; S3: results.


2020 ◽  
Vol 2 (9) ◽  
Author(s):  
Nicholas R. Waters ◽  
Florence Abram ◽  
Fiona Brennan ◽  
Ashleigh Holmes ◽  
Leighton Pritchard

The Clermont PCR method for phylotyping Escherichia coli remains a useful classification scheme even though genome sequencing is now routine, and higher-resolution sequence typing schemes are now available. Relating present-day whole-genome E. coli classifications to legacy phylotyping is essential for harmonizing the historical literature and understanding of this important organism. Therefore, we present EzClermont – a novel in silico Clermont PCR phylotyping tool to enable ready application of this phylotyping scheme to whole-genome assemblies. We evaluate this tool against phylogenomic classifications, and an alternative software implementation of Clermont typing. EzClermont is available as a web app at www.ezclermont.org, and as a command-line tool at https://nickp60.github.io/EzClermont/.


Sign in / Sign up

Export Citation Format

Share Document