PyIOmica: Longitudinal Omics Analysis and Classification

AbstractSummaryPyIOmica is an open-source Python package focusing on integrating longitudinal multiple omics datasets, characterizing, and classifying temporal trends. The package includes multiple bioinformatics tools including data normalization, annotation, classification, visualization, and enrichment analysis for gene ontology terms and pathways. Additionally, the package includes an implementation of visibility graphs to visualize time series as networks.Availability and implementationPyIOmica is implemented as a Python package (pyiomica), available for download and installation through the Python Package Index (PyPI) (https://pypi.python.org/pypi/pyiomica), and can be deployed using the Python import function following installation. PyIOmica has been tested on Mac OS X, Unix/Linux and Microsoft Windows. The application is distributed under an MIT license. Source code for each release is also available for download on Zenodo (https://doi.org/10.5281/zenodo.3342612)[email protected]

Download Full-text

PyIOmica: longitudinal omics analysis and trend identification

Bioinformatics ◽

10.1093/bioinformatics/btz896 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2306-2307 ◽

Cited By ~ 3

Author(s):

Sergii Domanskyi ◽

Carlo Piermarocchi ◽

George I Mias

Keyword(s):

Source Code ◽

Temporal Trends ◽

Enrichment Analysis ◽

Supplementary Information ◽

Data Normalization ◽

Supplementary Data ◽

Visibility Graphs ◽

Trend Identification ◽

Microsoft Windows ◽

Python Package

Abstract Summary PyIOmica is an open-source Python package focusing on integrating longitudinal multiple omics datasets, characterizing and categorizing temporal trends. The package includes multiple bioinformatics tools including data normalization, annotation, categorization, visualization and enrichment analysis for gene ontology terms and pathways. Additionally, the package includes an implementation of visibility graphs to visualize time series as networks. Availability and implementation PyIOmica is implemented as a Python package (pyiomica), available for download and installation through the Python Package Index (https://pypi.python.org/pypi/pyiomica), and can be deployed using the Python import function following installation. PyIOmica has been tested on Mac OS X, Unix/Linux and Microsoft Windows. The application is distributed under an MIT license. Source code for each release is also available for download on Zenodo (https://doi.org/10.5281/zenodo.3548040). Supplementary information Supplementary data are available at Bioinformatics

Download Full-text

clinker & clustermap.js: Automatic generation of gene cluster comparison figures

10.1101/2020.11.08.370650 ◽

2020 ◽

Author(s):

Cameron L.M. Gilchrist ◽

Yit-Heng Chooi

Keyword(s):

Gene Cluster ◽

Evolutionary History ◽

Source Code ◽

Gene Clusters ◽

Automatic Generation ◽

Biological Pathways ◽

Link Type ◽

E Mail ◽

Python Package ◽

Publication Quality

AbstractSummaryGenes involved in biological pathways are often collocalised in gene clusters, the comparison of which can give valuable insights into their function and evolutionary history. However, comparison and visualisation of gene cluster homology is a tedious process, particularly when many clusters are being compared. Here, we present clinker, a Python based tool, and clustermap.js, a companion JavaScript visualisation library, which used together can automatically generate accurate, interactive, publication-quality gene cluster comparison figures directly from sequence files.Availability and ImplementationSource code and documentation for clinker and clustermap.js is available on GitHub (github.com/gamcil/clinker and github.com/gamcil/clustermap.js, respectively) under the MIT license. clinker can be installed directly from the Python Package Index via pip.ContactE-mail: [email protected], [email protected]

Download Full-text

PyLiger: Scalable single-cell multi-omic data integration in Python

10.1101/2021.12.24.474131 ◽

2021 ◽

Author(s):

Lu Lu ◽

Joshua D Welch

Keyword(s):

Gene Ontology ◽

Data Integration ◽

Single Cell ◽

Scientific Computing ◽

Enrichment Analysis ◽

R Package ◽

Gene Ontology Enrichment Analysis ◽

Omic Data Integration ◽

Omic Data ◽

Python Package

Motivation: LIGER is a widely-used R package for single-cell multi-omic data integration. However, many users prefer to analyze their single-cell datasets in Python, which offers an attractive syntax and highly-optimized scientific computing libraries for increased efficiency. Results: We developed PyLiger, a Python package for integrating single-cell multi-omic datasets. PyLiger offers faster performance than the previous R implementation (2-5× speedup), interoperability with AnnData format, flexible on-disk or in-memory analysis capability, and new functionality for gene ontology enrichment analysis. The on-disk capability enables analysis of arbitrarily large single-cell datasets using fixed memory.

Download Full-text

GenFam: A web application and database for gene family-based classification and functional enrichment analysis

10.1101/272187 ◽

2018 ◽

Cited By ~ 1

Author(s):

Renesh Bedre ◽

Kranthi Mandadi

Keyword(s):

Gene Family ◽

Web Application ◽

High Throughput Sequencing ◽

Source Code ◽

Enrichment Analysis ◽

Gene Families ◽

Functional Enrichment ◽

Multiple Testing Correction ◽

Web Based ◽

Link Type

ABSTRACTGenome-scale studies using high-throughput sequencing (HTS) technologies generate substantial lists of differentially expressed genes under different experimental conditions. These gene lists need to be further mined to narrow down biologically relevant genes and associated functions in order to guide downstream functional genetic analyses. A popular approach is to determine statistically overrepresented genes in a user-defined list through enrichment analysis tools, which rely on functional annotations of genes based on Gene Ontology (GO) terms. Here, we propose a new approach, GenFam, which allows classification and enrichment of genes based on their gene family, thus simplifying identification of candidate gene families and associated genes that may be relevant to the query. GenFam and its integrated database comprises of three-hundred and eighty-four unique gene families and supports gene family classification and enrichment analyses for sixty plant genomes. Four comparative case studies with plant species belonging to different clades and families were performed using GenFam which demonstrated its robustness and comprehensiveness over preexisting functional enrichment tools. To make it readily accessible for plant biologists, GenFam is available as a web-based application where users can input gene IDs and export enrichment results in both tabular and graphical formats. Users can also customize analysis parameters by choosing from the various statistical enrichment tests and multiple testing correction methods. Additionally, the web-based application, source code and database are freely available to use and download. Website: http://mandadilab.webfactional.com/home/. Source code and database: http://mandadilab.webfactional.com/home/dload/.

Download Full-text

ClassificaIO: machine learning for classification graphical user interface

10.1101/240184 ◽

2017 ◽

Cited By ~ 2

Author(s):

Raeuf Roushangar ◽

George I. Mias

Keyword(s):

Machine Learning ◽

User Interface ◽

Graphical User Interface ◽

Machine Learning Algorithms ◽

Machine Learning Classification ◽

Link Type ◽

Research Areas ◽

Testing Data ◽

Microsoft Windows ◽

Python Package

AbstractMachine learning methods are being used routinely by scientists in many research areas, typically requiring significant statistical and programing knowledge. Here we present ClassificaIO, an open-source Python graphical user interface for machine learning classification for the scikit-learn Python library. ClassificaIO provides an interactive way to train, validate, and test data on a range of classification algorithms. The software enables fast comparisons within and across classifiers, and facilitates uploading and exporting of trained models, and both validation and testing data results. ClassificaIO aims to provide not only a research utility, but also an educational tool that can enable biomedical and other researchers with minimal machine learning background to apply machine learning algorithms to their research in an interactive point-and-click way. The ClassificaIO package is available for download and installation through the Python Package Index (PyPI) (http://pypi.python.org/pypi/ClassificaIO) and it can be deployed using the “import” function in Python once the package is installed. The application is distributed under an MIT license and the source code is publicly available for download (for Mac OS X, Linux and Microsoft Windows) through PyPI and GitHub (http://github.com/gmiaslab/ClassificaIO, andhttps://doi.org/10.5281/zenodo.1320465).

Download Full-text

pyrpipe: a python package for RNA-Seq workflows

10.1101/2020.03.04.925818 ◽

2020 ◽

Author(s):

Urminder Singh ◽

Jing Li ◽

Arun Seetharam ◽

Eve Syrkin Wurtele

Keyword(s):

Detailed Analysis ◽

Source Code ◽

Workflow Management ◽

Object Oriented ◽

Third Party ◽

Rna Seq ◽

Link Type ◽

Computing Environments ◽

High Level ◽

Python Package

Implementing RNA-Seq analysis pipelines is challenging as data gets bigger and more complex. With the availability of terabytes of RNA-Seq data and continuous development of analysis tools, there is a pressing requirement for frameworks that allow for fast and efficient development, modification, sharing and reuse of workflows. Scripting is often used, but it has many challenges and drawbacks. We have developed a python package, python RNA-Seq Pipeliner (pyrpipe) that enables straightforward development of flexible, reproducible and easy-to-debug computational pipelines purely in python, in an object-oriented manner. pyrpipe provides high level APIs to popular RNA-Seq tools. Pipelines can be customized by integrating new python code, third-party programs, or python libraries. Researchers can create checkpoints in the pipeline or integrate pyrpipe into a workflow management system, thus allowing execution on multiple computing environments. pyrpipe produces detailed analysis, and benchmark reports which can be shared or included in publications. pyrpipe is implemented in python and is compatible with python versions 3.6 and higher. All source code is available at https://github.com/urmi-21/pyrpipe; the package can be installed from the source or from PyPi (https://pypi.org/project/pyrpipe). Documentation is available on Read the Docs (http://pyrpipe.rtfd.io).

Download Full-text

pyABC: distributed, likelihood-free inference

10.1101/162552 ◽

2017 ◽

Cited By ~ 1

Author(s):

Emmanuel Klinger ◽

Dennis Rickert ◽

Jan Hasenauer

Keyword(s):

Sequential Monte Carlo ◽

Source Code ◽

Distance Functions ◽

Web Interface ◽

Practical Application ◽

Acceptance Threshold ◽

Link Type ◽

Data Querying ◽

Approximate Bayesian ◽

Python Package

SummaryLikelihood-free methods are often required for inference in systems biology. While Approximate Bayesian Computation (ABC) provides a theoretical solution, its practical application has often been challenging due to its high computational demands. To scale likelihood-free inference to computationally demanding stochastic models we developed pyABC: a distributed and scalable ABC-Sequential Monte Carlo (ABC-SMC) framework. It implements computation-minimizing and scalable, runtime-minimizing parallelization strategies for multi-core and distributed environments scaling to thousands of cores. The framework is accessible to non-expert users and also enables advanced users to experiment with and to custom implement many options of ABC-SMC schemes, such as acceptance threshold schedules, transition kernels and distance functions without alteration of pyABC’s source code. pyABC includes a web interface to visualize ongoing and 1nished ABC-SMC runs and exposes an API for data querying and post-processing.Availability and ImplementationpyABC is written in Python 3 and is released under the GPLv3 license. The source code is hosted on https://github.com/neuralyzer/pyabc and the documentation on http://pyabc.readthedocs.io. It can be installed from the Python Package Index (PyPI).

Download Full-text

BOFdat: generating biomass objective function stoichiometric coefficients from experimental data

10.1101/243881 ◽

2018 ◽

Cited By ~ 4

Author(s):

Jean-Christophe Lachance ◽

Jonathan M. Monk ◽

Colton J. Lloyd ◽

Yara Seif ◽

Bernhard O. Palsson ◽

...

Keyword(s):

Experimental Data ◽

Objective Function ◽

Source Code ◽

Cell Composition ◽

Link Type ◽

Scale Models ◽

Relative Abundances ◽

Genome Scale ◽

Python Package

AbstractGenome-scale models (GEMs) rely on a biomass objective function (BOF) to predict phenotype from genotype. Here we present BOFdat, a Python package that offers functions to generate biomass objective function stoichiometric coefficients (BOFsc) from macromolecular cell composition and relative abundances of macromolecules obtained from omic datasets. Growth-associated and non-growth associated maintenance (GAM and NGAM) costs can also be calculated by BOFdat.BOFdat is freely available on the Python Package Index (pip install BOFdat). The source code and an example usage (Jupyter Notebook and example files) are available on GitHub (https://github.com/jclachance/BOFdat). The documentation and API are available through ReadTheDocs (https://bofdat.readthedocs.io)[email protected], [email protected], [email protected]

Download Full-text

DiNGO: standalone application for Gene Ontology and Human Phenotype Ontology term enrichment analysis

Bioinformatics ◽

10.1093/bioinformatics/btz836 ◽

2019 ◽

Author(s):

Radoslav Davidović ◽

Vladimir Perovic ◽

Branislava Gemovic ◽

Nevena Veljkovic

Keyword(s):

Gene Ontology ◽

Source Code ◽

Enrichment Analysis ◽

Human Phenotype Ontology ◽

Supplementary Information ◽

Phenotype Ontology ◽

Ontology Term ◽

Human Phenotype ◽

Term Enrichment Analysis ◽

Term Enrichment

Abstract Summary Although various tools for Gene Ontology (GO) term enrichment analysis are available, there is still room for improvement. Hence, we present DiNGO, a standalone application based on an open source code from BiNGO, a widely-used application to assess the overrepresentation of GO categories. Besides facilitating GO term enrichment analyses, DiNGO has been developed to allow for convenient Human Phenotype Ontology (HPO) term overrepresentation investigation. This is an important contribution considering the increasing interest in HPO in scientific research and its potential in clinical settings. DiNGO supports gene/protein identifier conversion and an automatic updating of GO and HPO annotation resources. Finally, DiNGO can rapidly process a large amount of data due to its multithread design. Availability and Implementation DiNGO is implemented in the JAVA language, and its source code, example datasets and instructions are available on GitHub: https://github.com/radoslav180/DiNGO. A pre-compiled jar file is available at: https://www.vin.bg.ac.rs/180/tools/DiNGO.php Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

dRep: A tool for fast and accurate genome de-replication that enables tracking of microbial genotypes and improved genome recovery from metagenomes

10.1101/108142 ◽

2017 ◽

Cited By ~ 3

Author(s):

Matthew R. Olm ◽

Christopher T. Brown ◽

Brandon Brooks ◽

Jillian F. Banfield

Keyword(s):

Time Series ◽

Source Code ◽

Computational Time ◽

Average Nucleotide Identity ◽

Microbial Genomes ◽

Large Genome ◽

Link Type ◽

Assembly Method ◽

Inaccurate Estimation ◽

Genome Distance

The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that sequentially applies a fast, inaccurate estimation of genome distance and a slow but accurate measure of average nucleotide identity to reduce the computational time for pair-wise genome set comparisons by orders of magnitude. We demonstrate its use in a study where we separately assembled each metagenome from time series datasets. Groups of essentially identical genomes were identified with dRep, and the best genome from each set was selected. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using the typical co-assembly method. Documentation is available at http://drep.readthedocs.io/en/master/ and source code is available at https://github.com/MrOlm/drep.

Download Full-text