NeuroPycon: An open-source Python toolbox for fast multi-modal and reproducible brain connectivity pipelines

AbstractRecent years have witnessed a massive push towards reproducible research in neuroscience. Unfortunately, this endeavor is often challenged by the large diversity of tools used, project-specific custom code and the difficulty to track all user-defined parameters. NeuroPycon is an open-source multi-modal brain data analysis toolkit which provides Python-based template pipelines for advanced multi-processing of MEG, EEG, functional and anatomical MRI data, with a focus on connectivity and graph theoretical analyses. Importantly, it provides shareable parameter files to facilitate replication of all analysis steps. NeuroPycon is based on the NiPype framework which facilitates data analyses by wrapping many commonly-used neuroimaging software tools into a common Python environment. In other words, rather than being a brain imaging software with is own implementation of standard algorithms for brain signal processing, NeuroPycon seamlessly integrates existing packages (coded in python, Matlab or other languages) into a unified python framework. Importantly, thanks to the multi-threaded processing and computational efficiency afforded by NiPype, NeuroPycon provides an easy option for fast parallel processing, which critical when handling large sets of multi-dimensional brain data. Moreover, its flexible design allows users to easily configure analysis pipelines by connecting distinct nodes to each other. Each node can be a Python-wrapped module, a user-defined function or a well-established tool (e.g. MNE-Python for MEG analysis, Radatools for graph theoretical metrics, etc.). Last but not least, the ability to use NeuroPycon parameter files to fully describe any pipeline is an important feature for reproducibility, as they can be shared and used for easy replication by others. The current implementation of NeuroPycon contains two complementary packages: The first, called ephypype, includes pipelines for electrophysiology analysis and a command-line interface for on the fly pipeline creation. Current implementations allow for MEG/EEG data import, pre-processing and cleaning by automatic removal of ocular and cardiac artefacts, in addition to sensor or source-level connectivity analyses. The second package, called graphpype, is designed to investigate functional connectivity via a wide range of graph-theoretical metrics, including modular partitions. The present article describes the philosophy, architecture, and functionalities of the toolkit and provides illustrative examples through interactive notebooks. NeuroPycon is available for download via github (https://github.com/neuropycon) and the two principal packages are documented online (https://neuropycon.github.io/ephypype/index.html. and https://neuropycon.github.io/graphpype/index.html). Future developments include fusion of multi-modal data (eg. MEG and fMRI or intracranial EEG and fMRI). We hope that the release of NeuroPycon will attract many users and new contributors, and facilitate the efforts of our community towards open source tool sharing and development, as well as scientific reproducibility.

Download Full-text

Dashing: fast and accurate genomic distances with HyperLogLog

Genome Biology ◽

10.1186/s13059-019-1875-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 9

Author(s):

Daniel N. Baker ◽

Ben Langmead

Keyword(s):

Open Source ◽

Software Tool ◽

Estimation Methods ◽

Cardinality Estimation ◽

Link Type ◽

Wide Range

AbstractDashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

Download Full-text

GemPy 1.0: open-source stochastic geological modeling and inversion

Geoscientific Model Development ◽

10.5194/gmd-12-1-2019 ◽

2019 ◽

Vol 12 (1) ◽

pp. 1-32 ◽

Cited By ~ 11

Author(s):

Miguel de la Varga ◽

Alexander Schaaf ◽

Florian Wellmann

Keyword(s):

Open Source ◽

Code Generation ◽

Raw Material ◽

Reproducible Research ◽

Geological Modeling ◽

Fault Surface ◽

Wide Range ◽

Geological Models ◽

Density Values ◽

Efficient Code

Abstract. The representation of subsurface structures is an essential aspect of a wide variety of geoscientific investigations and applications, ranging from geofluid reservoir studies, over raw material investigations, to geosequestration, as well as many branches of geoscientific research and applications in geological surveys. A wide range of methods exist to generate geological models. However, the powerful methods are behind a paywall in expensive commercial packages. We present here a full open-source geomodeling method, based on an implicit potential-field interpolation approach. The interpolation algorithm is comparable to implementations in commercial packages and capable of constructing complex full 3-D geological models, including fault networks, fault–surface interactions, unconformities and dome structures. This algorithm is implemented in the programming language Python, making use of a highly efficient underlying library for efficient code generation (Theano) that enables a direct execution on GPUs. The functionality can be separated into the core aspects required to generate 3-D geological models and additional assets for advanced scientific investigations. These assets provide the full power behind our approach, as they enable the link to machine-learning and Bayesian inference frameworks and thus a path to stochastic geological modeling and inversions. In addition, we provide methods to analyze model topology and to compute gravity fields on the basis of the geological models and assigned density values. In summary, we provide a basis for open scientific research using geological models, with the aim to foster reproducible research in the field of geomodeling.

Download Full-text

FastTrack: An open-source software for tracking varying numbers of deformable objects

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008697 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008697

Author(s):

Benjamin Gallois ◽

Raphaël Candelier

Keyword(s):

Open Source ◽

Cell Tracking ◽

Ad Hoc ◽

Ground Truth ◽

General Purpose ◽

Two Dimensions ◽

Deformable Objects ◽

Tracking Accuracy ◽

Link Type ◽

Wide Range

Analyzing the dynamical properties of mobile objects requires to extract trajectories from recordings, which is often done by tracking movies. We compiled a database of two-dimensional movies for very different biological and physical systems spanning a wide range of length scales and developed a general-purpose, optimized, open-source, cross-platform, easy to install and use, self-updating software called FastTrack. It can handle a changing number of deformable objects in a region of interest, and is particularly suitable for animal and cell tracking in two-dimensions. Furthermore, we introduce the probability of incursions as a new measure of a movie’s trackability that doesn’t require the knowledge of ground truth trajectories, since it is resilient to small amounts of errors and can be computed on the basis of an ad hoc tracking. We also leveraged the versatility and speed of FastTrack to implement an iterative algorithm determining a set of nearly-optimized tracking parameters—yet further reducing the amount of human intervention—and demonstrate that FastTrack can be used to explore the space of tracking parameters to optimize the number of swaps for a batch of similar movies. A benchmark shows that FastTrack is orders of magnitude faster than state-of-the-art tracking algorithms, with a comparable tracking accuracy. The source code is available under the GNU GPLv3 at https://github.com/FastTrackOrg/FastTrack and pre-compiled binaries for Windows, Mac and Linux are available at http://www.fasttrack.sh.

Download Full-text

GemPy 1.0: open-source stochastic geological modeling and inversion

10.5194/gmd-2018-61 ◽

2018 ◽

Cited By ~ 4

Author(s):

Miguel de la Varga ◽

Alexander Schaaf ◽

Florian Wellmann

Keyword(s):

Open Source ◽

Code Generation ◽

Raw Material ◽

Reproducible Research ◽

Geological Modeling ◽

Fault Surface ◽

Wide Range ◽

Geological Models ◽

Density Values ◽

Efficient Code

Abstract. The representation of subsurface structures is an essential aspect of a wide variety of geoscientific investigations and applications: ranging from geofluid reservoir studies, over raw material investigations, to geosequestration, as well as many branches of geoscientific research studies and applications in geological surveys. A wide range of methods exists to generate geological models. However, especially the powerful methods are behind a paywall in expensive commercial packages. We present here a full open-source geomodeling method, based on an implicit potential-field interpolation approach. The interpolation algorithm is comparable to implementations in commercial packages and capable of constructing complex full 3-D geological models, including fault networks, fault-surface interactions, unconformities, and dome structures. This algorithm is implemented in the programming language Python, making use of a highly efficient underlying library for efficient code generation (theano) that enables a direct execution on GPU's. The functionality can be separated into the core aspects required to generate 3-D geological models and additional assets for advanced scientific investigations. These assets provide the full power behind our approach, as they enable the link to Machine Learning and Bayesian inference frameworks and thus a path to stochastic geological modeling and inversions. In addition, we provide methods to analyse model topology and to compute gravity fields on the basis of the geological models and assigned density values. In summary, we provide a basis for open scientific research using geological models, with the aim to foster reproducible research in the field of geomodeling.

Download Full-text

The metaRbolomics Toolbox in Bioconductor and beyond

Metabolites ◽

10.3390/metabo9100200 ◽

2019 ◽

Vol 9 (10) ◽

pp. 200 ◽

Cited By ~ 18

Author(s):

Jan Stanstrup ◽

Corey Broeckling ◽

Rick Helmus ◽

Nils Hoffmann ◽

Ewy Mathé ◽

...

Keyword(s):

Experimental Data ◽

Data Processing ◽

Open Source ◽

User Interfaces ◽

Workflow Management ◽

Biochemical Network ◽

Reproducible Research ◽

Resonance Spectroscopy ◽

Major Interest ◽

Wide Range

Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.

Download Full-text

PhyD3: a phylogenetic tree viewer with extended phyloXML support for functional genomics data visualization

10.1101/107276 ◽

2017 ◽

Author(s):

Łukasz Kreft ◽

Alexander Botzki ◽

Frederik Coppens ◽

Klaas Vandepoele ◽

Michiel Van Bel

Keyword(s):

Phylogenetic Tree ◽

Functional Genomics ◽

Open Source ◽

Web Sites ◽

Phylogenetic Trees ◽

Biological Data ◽

Web Technologies ◽

Link Type ◽

Current Implementation ◽

Tree Viewer

AbstractMotivation:Comparative and evolutionary studies utilise phylogenetic trees to analyse and visualise biological data. Recently, several web-based tools for the display, manipulation, and annotation of phylogenetic trees, such as iTOL and Evolview, have released updates to be compatible with the latest web technologies. While those web tools operate an open server access model with a multitude of registered users, a feature-rich open source solution using current web technologies is not available.Results:Here, we present an extension of the widely used PhyloXML standard with several new options to accommodate functional genomics or annotation datasets for advanced visualization. Furthermore, PhyD3 has been developed as a lightweight tool using the JavaScript library D3.js to achieve a state-of-the-art phylogenetic tree visualisation in the web browser, with support for advanced annotations. The current implementation is open source, easily adaptable and easy to implement in third parties’ web sites.Availability:More information about PhyD3 itself, installation procedures, and implementation links are available at http://phyd3.bits.vib.be and at http://github.com/vibbits/phyd3/.Contact:[email protected]

Download Full-text

Dashing: Fast and Accurate Genomic Distances with HyperLogLog

10.1101/501726 ◽

2018 ◽

Cited By ~ 8

Author(s):

Daniel N Baker ◽

Ben Langmead

Keyword(s):

Open Source ◽

Software Tool ◽

Estimation Methods ◽

Cardinality Estimation ◽

Link Type ◽

Wide Range

Download Full-text

SemEHR: A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research

10.1101/235622 ◽

2017 ◽

Cited By ~ 1

Author(s):

Honghan Wu ◽

Giulia Toti ◽

Katherine I. Morley ◽

Zina M. Ibrahim ◽

Amos Folarin ◽

...

Keyword(s):

Information Extraction ◽

Open Source ◽

Language Processing ◽

Semantic Search ◽

Trial Recruitment ◽

College Hospital ◽

Link Type ◽

Semantic Data ◽

Wide Range ◽

The Uk

ABSTRACTObjectiveUnlocking the data contained within both structured and unstructured components of Electronic Health Records (EHRs) has the potential to provide a step change in data available forsecondary research use, generation of actionable medical insights, hospital management and trial recruitment. To achieve this, we implemented SemEHR - a semantic search and analytics, open source tool for EHRs.MethodsSemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualised mentions of a wide range of biomedical concepts within EHRs. Natural Language Processing (NLP) annotations are further assembled at patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data is serviced via ontology-based search and analytics interfaces.ResultsSemEHR has been deployed to a number of UK hospitals including the Clinical Record Interactive Search (CRIS), an anonymised replica of the EHR of the UK South London and Maudsley (SLaM) NHS Foundation Trust, one of Europes largest providers of mental health services. In two CRIS-based studies, SemEHR achieved 93% (Hepatitis C case) and 99% (HIV case) F-Measure results in identifying true positive patients. At King’s College Hospital in London, as part of the CogStack programme (github.com/cogstack), SemEHR is being used to recruit patients into the UK Dept of Health 100k Genome Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast in searching phenotypes - time for recruitment criteria checking reduced from days to minutes. Validated on an open intensive care EHR data - MIMICIII, the vital signs extracted by SemEHR can achieve around 97% accuracy.ConclusionResults from the multiple case studies demonstrate SemEHR’s efficiency - weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of a patient, bringing in more and unexpected insight compared to study-oriented bespoke information extraction systems.SemEHR is open source available at https://github.com/CogStack/SemEHR.

Download Full-text

Human mitochondrial variant annotation with HmtNote

10.1101/600619 ◽

2019 ◽

Cited By ~ 3

Author(s):

R. Preste ◽

R. Clima ◽

M. Attimonelli

Keyword(s):

Open Source ◽

Online Resources ◽

Annotation Database ◽

Variant Annotation ◽

Internet Connection ◽

Link Type ◽

Wide Range ◽

Using Data ◽

Cross Reference ◽

Python Package

AbstractHmtNote is a Python package to annotate human mitochondrial variants from VCF files.Variants are annotated using a wide range of information, which are grouped into basic, cross-reference, variability and prediction subsets so that users can either select specific annotations of interest or use them altogether.Annotations are performed using data from HmtVar, a recently published database of human mitochondrial variations, which collects information from several online resources as well as offering in-house pathogenicity predictions.HmtNote also allows users to download a local annotation database, that can be used to annotate variants offline, without having to rely on an internet connection.HmtNote is a free and open source package, and can be downloaded and installed from PyPI (https://pypi.org/project/hmtnote) or GitHub (https://github.com/robertopreste/HmtNote).

Download Full-text

BonMOLière: Small-Sized Libraries of Readily Purchasable Compounds, Optimized to Produce Genuine Hits in Biological Screens across the Protein Space

International Journal of Molecular Sciences ◽

10.3390/ijms22157773 ◽

2021 ◽

Vol 22 (15) ◽

pp. 7773

Author(s):

Neann Mathai ◽

Conrad Stork ◽

Johannes Kirchmair

Keyword(s):

Large Scale ◽

Computational Approach ◽

Large Sets ◽

Compound Libraries ◽

Wide Range ◽

Protein Space ◽

High Chance ◽

Large Scale Screening ◽

Early Drug ◽

Selection Of

Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the “fitness” of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle (“BonMOLière”).

Download Full-text