scholarly journals Dashing: fast and accurate genomic distances with HyperLogLog

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Daniel N. Baker ◽  
Ben Langmead

AbstractDashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

2018 ◽  
Author(s):  
Daniel N Baker ◽  
Ben Langmead

AbstractDashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.


2018 ◽  
Vol 18 ◽  
pp. 15 ◽  
Author(s):  
Václav Rada ◽  
Tomáš Fíla ◽  
Petr Zlámal ◽  
Daniel Kytýř ◽  
Petr Koudelka

In recent years, open-source applications have replaced proprietary software in many fields. Especially open-source software tools based on Linux operating system have wide range of utilization. In terms of CNC solutions, an open-source system LinuxCNC can be used. However, the LinuxCNC control software and the graphical user interface (GUI) could be developed only on top of Hardware Abstraction Layer. Nevertheless, the LinuxCNC community provided Python Interface, which allows for controlling CNC machine using Python programming language, therefore whole control software can be developed in Python. The paper focuses on a development of a multi-process control software mainly for in-house developed loading devices operated at our institute. The software tool is based on the LinuxCNC Python Interface and Qt framework, which gives the software an ability to be modular and effectively adapted for various devices.


2021 ◽  
Author(s):  
Richard Beare ◽  
Bonnie Alexander ◽  
Aaron Warren ◽  
Michael Kean ◽  
Marc Seal ◽  
...  

AbstractSubmitted to Magnetic Resonance in MedicinePurposeTo introduce a tool allowing neurosurgeons to evaluate the results of research tractography workflows for presurgical planning and intraoperative image-guidance, using standard neurosurgical navigation platforms.Theory and MethodsImproving communication between neurosurgeons and researchers developing new image acquisition and processing methods is critical for rapid translation of research to surgical practice. Presenting research outputs within existing clinical workflows is one approach that can assist such interdisciplinary communication. Neurosurgical navigation platforms can display and manipulate a wide range of medical image data and associated delineations and thus allow clinicians to evaluate the impact of new imaging research on their work. Currently, it is extremely difficult to integrate research-based image processing outputs into standard neurosurgical navigation platforms.ResultsIn this note we introduce Karawun, an open-source software tool for converting outputs from research imaging pipelines, especially diffusion MRI tractography reconstructions using advanced methodologies currently unavailable on commercial navigation platforms, into forms that can be imported into the Brainlab neurosurgical navigation platform (Brainlab AG, Munich, Germany). The externally created tractography images and delineations can be viewed and manipulated as if they were created by Brainlab. We illustrate how two surgical workups, created using open-source tools and different processing choices can be presented to the neurosurgeon who can evaluate the impact of the differences between the two workups on surgical decisions.ConclusionKarawun allows researchers developing novel imaging methodologies to display their results in environments that are familiar to clinical end-users, especially neurosurgeons, thus assisting translation of research into clinical practice.


2019 ◽  
Author(s):  
David Meunier ◽  
Annalisa Pascarella ◽  
Dmitrii Altukhov ◽  
Mainak Jas ◽  
Etienne Combrisson ◽  
...  

AbstractRecent years have witnessed a massive push towards reproducible research in neuroscience. Unfortunately, this endeavor is often challenged by the large diversity of tools used, project-specific custom code and the difficulty to track all user-defined parameters. NeuroPycon is an open-source multi-modal brain data analysis toolkit which provides Python-based template pipelines for advanced multi-processing of MEG, EEG, functional and anatomical MRI data, with a focus on connectivity and graph theoretical analyses. Importantly, it provides shareable parameter files to facilitate replication of all analysis steps. NeuroPycon is based on the NiPype framework which facilitates data analyses by wrapping many commonly-used neuroimaging software tools into a common Python environment. In other words, rather than being a brain imaging software with is own implementation of standard algorithms for brain signal processing, NeuroPycon seamlessly integrates existing packages (coded in python, Matlab or other languages) into a unified python framework. Importantly, thanks to the multi-threaded processing and computational efficiency afforded by NiPype, NeuroPycon provides an easy option for fast parallel processing, which critical when handling large sets of multi-dimensional brain data. Moreover, its flexible design allows users to easily configure analysis pipelines by connecting distinct nodes to each other. Each node can be a Python-wrapped module, a user-defined function or a well-established tool (e.g. MNE-Python for MEG analysis, Radatools for graph theoretical metrics, etc.). Last but not least, the ability to use NeuroPycon parameter files to fully describe any pipeline is an important feature for reproducibility, as they can be shared and used for easy replication by others. The current implementation of NeuroPycon contains two complementary packages: The first, called ephypype, includes pipelines for electrophysiology analysis and a command-line interface for on the fly pipeline creation. Current implementations allow for MEG/EEG data import, pre-processing and cleaning by automatic removal of ocular and cardiac artefacts, in addition to sensor or source-level connectivity analyses. The second package, called graphpype, is designed to investigate functional connectivity via a wide range of graph-theoretical metrics, including modular partitions. The present article describes the philosophy, architecture, and functionalities of the toolkit and provides illustrative examples through interactive notebooks. NeuroPycon is available for download via github (https://github.com/neuropycon) and the two principal packages are documented online (https://neuropycon.github.io/ephypype/index.html. and https://neuropycon.github.io/graphpype/index.html). Future developments include fusion of multi-modal data (eg. MEG and fMRI or intracranial EEG and fMRI). We hope that the release of NeuroPycon will attract many users and new contributors, and facilitate the efforts of our community towards open source tool sharing and development, as well as scientific reproducibility.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008697
Author(s):  
Benjamin Gallois ◽  
Raphaël Candelier

Analyzing the dynamical properties of mobile objects requires to extract trajectories from recordings, which is often done by tracking movies. We compiled a database of two-dimensional movies for very different biological and physical systems spanning a wide range of length scales and developed a general-purpose, optimized, open-source, cross-platform, easy to install and use, self-updating software called FastTrack. It can handle a changing number of deformable objects in a region of interest, and is particularly suitable for animal and cell tracking in two-dimensions. Furthermore, we introduce the probability of incursions as a new measure of a movie’s trackability that doesn’t require the knowledge of ground truth trajectories, since it is resilient to small amounts of errors and can be computed on the basis of an ad hoc tracking. We also leveraged the versatility and speed of FastTrack to implement an iterative algorithm determining a set of nearly-optimized tracking parameters—yet further reducing the amount of human intervention—and demonstrate that FastTrack can be used to explore the space of tracking parameters to optimize the number of swaps for a batch of similar movies. A benchmark shows that FastTrack is orders of magnitude faster than state-of-the-art tracking algorithms, with a comparable tracking accuracy. The source code is available under the GNU GPLv3 at https://github.com/FastTrackOrg/FastTrack and pre-compiled binaries for Windows, Mac and Linux are available at http://www.fasttrack.sh.


2016 ◽  
Author(s):  
Jacob Pritt ◽  
Ben Langmead

AbstractWe describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler.


2017 ◽  
Author(s):  
Honghan Wu ◽  
Giulia Toti ◽  
Katherine I. Morley ◽  
Zina M. Ibrahim ◽  
Amos Folarin ◽  
...  

ABSTRACTObjectiveUnlocking the data contained within both structured and unstructured components of Electronic Health Records (EHRs) has the potential to provide a step change in data available forsecondary research use, generation of actionable medical insights, hospital management and trial recruitment. To achieve this, we implemented SemEHR - a semantic search and analytics, open source tool for EHRs.MethodsSemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualised mentions of a wide range of biomedical concepts within EHRs. Natural Language Processing (NLP) annotations are further assembled at patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data is serviced via ontology-based search and analytics interfaces.ResultsSemEHR has been deployed to a number of UK hospitals including the Clinical Record Interactive Search (CRIS), an anonymised replica of the EHR of the UK South London and Maudsley (SLaM) NHS Foundation Trust, one of Europes largest providers of mental health services. In two CRIS-based studies, SemEHR achieved 93% (Hepatitis C case) and 99% (HIV case) F-Measure results in identifying true positive patients. At King’s College Hospital in London, as part of the CogStack programme (github.com/cogstack), SemEHR is being used to recruit patients into the UK Dept of Health 100k Genome Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast in searching phenotypes - time for recruitment criteria checking reduced from days to minutes. Validated on an open intensive care EHR data - MIMICIII, the vital signs extracted by SemEHR can achieve around 97% accuracy.ConclusionResults from the multiple case studies demonstrate SemEHR’s efficiency - weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of a patient, bringing in more and unexpected insight compared to study-oriented bespoke information extraction systems.SemEHR is open source available at https://github.com/CogStack/SemEHR.


2019 ◽  
Author(s):  
R. Preste ◽  
R. Clima ◽  
M. Attimonelli

AbstractHmtNote is a Python package to annotate human mitochondrial variants from VCF files.Variants are annotated using a wide range of information, which are grouped into basic, cross-reference, variability and prediction subsets so that users can either select specific annotations of interest or use them altogether.Annotations are performed using data from HmtVar, a recently published database of human mitochondrial variations, which collects information from several online resources as well as offering in-house pathogenicity predictions.HmtNote also allows users to download a local annotation database, that can be used to annotate variants offline, without having to rely on an internet connection.HmtNote is a free and open source package, and can be downloaded and installed from PyPI (https://pypi.org/project/hmtnote) or GitHub (https://github.com/robertopreste/HmtNote).


Author(s):  
Anne Krogh Nøhr ◽  
Kristian Hanghøj ◽  
Genis Garcia Erill ◽  
Zilong Li ◽  
Ida Moltke ◽  
...  

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.


Molecules ◽  
2021 ◽  
Vol 26 (15) ◽  
pp. 4504
Author(s):  
Muhanna Al-shaibani ◽  
Radin Maya Saphira Radin Mohamed ◽  
Nik Sidik ◽  
Hesham Enshasy ◽  
Adel Al-Gheethi ◽  
...  

The current review aims to summarise the biodiversity and biosynthesis of novel secondary metabolites compounds, of the phylum Actinobacteria and the diverse range of secondary metabolites produced that vary depending on its ecological environments they inhabit. Actinobacteria creates a wide range of bioactive substances that can be of great value to public health and the pharmaceutical industry. The literature analysis process for this review was conducted using the VOSviewer software tool to visualise the bibliometric networks of the most relevant databases from the Scopus database in the period between 2010 and 22 March 2021. Screening and exploring the available literature relating to the extreme environments and ecosystems that Actinobacteria inhabit aims to identify new strains of this major microorganism class, producing unique novel bioactive compounds. The knowledge gained from these studies is intended to encourage scientists in the natural product discovery field to identify and characterise novel strains containing various bioactive gene clusters with potential clinical applications. It is evident that Actinobacteria adapted to survive in extreme environments represent an important source of a wide range of bioactive compounds. Actinobacteria have a large number of secondary metabolite biosynthetic gene clusters. They can synthesise thousands of subordinate metabolites with different biological actions such as anti-bacterial, anti-parasitic, anti-fungal, anti-virus, anti-cancer and growth-promoting compounds. These are highly significant economically due to their potential applications in the food, nutrition and health industries and thus support our communities’ well-being.


Sign in / Sign up

Export Citation Format

Share Document