GUNC: Detection of Chimerism and Contamination in Prokaryotic Genomes

AbstractGenomes are critical units in microbiology, yet ascertaining quality in prokaryotic genomes remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15-30% of pre-filtered ‘high quality’ metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality. Source code (GPLv3+): https://github.com/grp-bork/gunc

Download Full-text

GUNC: detection of chimerism and contamination in prokaryotic genomes

Genome Biology ◽

10.1186/s13059-021-02393-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Askarbek Orakov ◽

Anthony Fullam ◽

Luis Pedro Coelho ◽

Supriya Khedkar ◽

Damian Szklarczyk ◽

...

Keyword(s):

Prokaryotic Genome ◽

High Quality ◽

Formidable Challenge ◽

Prokaryotic Genomes ◽

Full Complement ◽

Genome Assemblies

AbstractGenomes are critical units in microbiology, yet ascertaining quality in prokaryotic genome assemblies remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome’s full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15–30% of pre-filtered “high-quality” metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality.

Download Full-text

ZGA: a flexible pipeline for read processing, de novo assembly and annotation of prokaryotic genomes

10.1101/2021.04.27.441618 ◽

2021 ◽

Author(s):

A.A. Korzhenkov

Keyword(s):

Genome Sequencing ◽

De Novo ◽

Wide Spectrum ◽

Source Code ◽

Routine Method ◽

Genome Sequences ◽

Bioinformatic Pipeline ◽

Internet Connection ◽

Link Type ◽

Prokaryotic Genomes

AbstractWhole genome sequencing (WGS) became a routine method in modern days and may be applied to study a wide spectrum of scientific problems. Despite increasing availability of genome sequencing by itself, genome assembly and annotation could be a challenge for an inexperienced researcher. To solve this problem, a bioinformatic pipeline was developed to conduct a user from raw sequencing reads to annotated bacterial or archaeal genome ready for deposition to any INSDC database as NCBI, ENA or DDBJ. The pipeline is fully automated and doesn’t require internet connection after installation which prevents data leakage and premature publication of genome sequences. The source code of the pipeline is freely available at https://github.com/laxeye/zga/. The software may be installed from popular repositories: Anaconda Cloud (https://anaconda.org/bioconda/zga/) and PyPI (https://pypi.org/project/zga/).

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

GalaxyCloudRunner: enhancing scalable computing for Galaxy

10.1101/2020.05.28.121772 ◽

2020 ◽

Author(s):

N Goonasekera ◽

A Mahmoud ◽

J Chilton ◽

E Afgan

Keyword(s):

Source Code ◽

Supplementary Information ◽

Scalable Computing ◽

Link Type ◽

Cloud Providers ◽

Galaxy Server ◽

Cloud Resources

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone

Download Full-text

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa274 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4097-4098 ◽

Cited By ~ 3

Author(s):

Anna Breit ◽

Simon Ott ◽

Asan Agibetov ◽

Matthias Samwald

Keyword(s):

Link Prediction ◽

Large Scale ◽

Source Code ◽

Machine Learning Algorithms ◽

Knowledge Networks ◽

Supplementary Information ◽

Supplementary Data ◽

Biomedical Knowledge ◽

High Quality ◽

Baseline Evaluation

Abstract Summary Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. Availability and implementation Source code and data are openly available at https://github.com/OpenBioLink/OpenBioLink. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An Automated Approach for Constructing Framework Instantiation Documentation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194020500205 ◽

2020 ◽

Vol 30 (04) ◽

pp. 575-601

Author(s):

Raquel Fialho de Queiroz Lafetá ◽

Thiago Fialho de Queiroz Lafetá ◽

Marcelo de Almeida Maia

Keyword(s):

Empirical Study ◽

Human Subjects ◽

Source Code ◽

High Quality ◽

Source Code Analysis ◽

Code Analysis ◽

Empirical Assessment ◽

Significant Difference ◽

Api Documentation ◽

Substantial Effort

A substantial effort, in general, is required for understanding APIs of application frameworks. High-quality API documentation may alleviate the effort, but the production of such documentation still poses a major challenge for modern frameworks. To facilitate the production of framework instantiation documentation, we hypothesize that the framework code itself and the code of existing instantiations provide useful information. However, given the size and complexity of existent code, automated approaches are required to assist the documentation production. Our goal is to assess an automated approach for constructing relevant documentation for framework instantiation based on source code analysis of the framework itself and of existing instantiations. The criterion for defining whether documentation is relevant would be to compare the documentation with an traditional framework documentation, considering the time spent and correctness during instantiation activities, information usefulness, complexity of the activity, navigation, satisfaction, information localization and clarity. We propose an automated approach for constructing relevant documentation for framework instantiation based on source code analysis of the framework itself and of existing instantiations. The proposed approach generates documentation in a cookbook style, where the recipes are programming activities using the necessary API elements driven by the framework features. We performed an empirical study, consisting of three experiments with 44 human subjects executing real framework instantiations aimed at comparing the use of the proposed cookbooks to traditional manual framework documentation (baseline). Our empirical assessment shows that the generated cookbooks performed better or, at least, with non-significant difference when compared to the traditional documentation, evidencing the effectiveness of the approach.

Download Full-text

GraphAligner: rapid and versatile sequence-to-graph alignment

Genome Biology ◽

10.1186/s13059-020-02157-2 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

The State ◽

Graph Alignment ◽

Link Type ◽

Long Reads

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

Download Full-text

The Blood And Clot Thrombectomy Registry And Collaboration (BACTRAC) protocol: novel method for evaluating human stroke

Journal of NeuroInterventional Surgery ◽

10.1136/neurintsurg-2018-014118 ◽

2018 ◽

Vol 11 (3) ◽

pp. 265-270 ◽

Cited By ~ 11

Author(s):

Justin F Fraser ◽

Lisa A Collier ◽

Amy A Gorman ◽

Sarah R Martha ◽

Kathleen E Salmeron ◽

...

Keyword(s):

Ischemic Stroke ◽

Animal Models ◽

Mechanical Thrombectomy ◽

Tissue Banking ◽

Human Condition ◽

Molecular Response ◽

High Quality ◽

Link Type ◽

Arterial Blood ◽

First Time

BackgroundIschemic stroke research faces difficulties in translating pathology between animal models and human patients to develop treatments. Mechanical thrombectomy, for the first time, offers a momentary window into the changes occurring in ischemia. We developed a tissue banking protocol to capture intracranial thrombi and the blood immediately proximal and distal to it.ObjectiveTo develop and share a reproducible protocol to bank these specimens for future analysis.MethodsWe established a protocol approved by the institutional review board for tissue processing during thrombectomy (www.clinicaltrials.govNCT03153683). The protocol was a joint clinical/basic science effort among multiple laboratories and the NeuroInterventional Radiology service line. We constructed a workspace in the angiography suite, and developed a step-by-step process for specimen retrieval and processing.ResultsOur protocol successfully yielded samples for analysis in all but one case. In our preliminary dataset, the process produced adequate amounts of tissue from distal blood, proximal blood, and thrombi for gene expression and proteomics analyses. We describe the tissue banking protocol, and highlight training protocols and mechanics of on-call research staffing. In addition, preliminary integrity analyses demonstrated high-quality yields for RNA and protein.ConclusionsWe have developed a novel tissue banking protocol using mechanical thrombectomy to capture thrombus along with arterial blood proximal and distal to it. The protocol provides high-quality specimens, facilitating analysis of the initial molecular response to ischemic stroke in the human condition for the first time. This approach will permit reverse translation to animal models for treatment development.

Download Full-text

Efficacy and Safety of Fuzi Formulae on the Treatment of Heart Failure as Complementary Therapy: A Systematic Review and Meta-Analysis of High-Quality Randomized Controlled Trials

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2019/9728957 ◽

2019 ◽

Vol 2019 ◽

pp. 1-21

Author(s):

Meng-Qi Yang ◽

Yong-Mei Song ◽

Huan-Yu Gao ◽

Yi-Tao Xue

Keyword(s):

Heart Failure ◽

Randomized Controlled Trials ◽

Meta Analysis ◽

Risk Of Bias ◽

Controlled Trials ◽

Efficacy And Safety ◽

High Quality ◽

Randomized Controlled ◽

Link Type

Objective. Heart failure is a major public health problem worldwide nowadays. However, the morbidity, mortality, and awareness of heart failure are not satisfied as well as the status of current treatments. According to the standard treatment for chronic heart failure (CHFST), Fuzi (the seminal root of Aconitum carmichaelii Debx.) formulae are widely used as a complementary treatment for heart failure in clinical practice for a long time. We are aiming to assess the efficacy and safety of Fuzi formulae (FZF) on the treatment of heart failure according to high-quality randomized controlled trials (RCTs). Methods. RCTs in PubMed, Cochrane Library, China National Knowledge Infrastructure (CNKI), Chinese Scientific Journals Database (VIP), and Wanfang Database were searched from their inception until June 2019. In addition, the U.S. National Library of Medicine (clinicaltrials.gov) and the Chinese Clinical Trial Registry (http://www.chictr.org.cn) were also searched. We included RCTs that test the efficacy and safety of FZF for the treatment of heart failure, compared with placebo, CHFST, or placebo plus CHFST. The methodological quality of included studies were evaluated by the Cochrane Collaboration’s tool for assessing risk of bias. RCTs with Cochrane risk of bias (RoB) score ≥4 were included in the analysis. The meta-analysis was conducted through RevMan 5.2 software. The GRADE approach was used to assess the quality of the evidence. Results. Twelve RCTs with 1490 participants were identified. The studies investigated the efficacy and safety of FZF, such as FZF plus the CHFST vs placebo plus CHFST (n = 4), FZF plus CHFST vs CHFST (n = 6), FZF plus digoxin tablets (DT) plus CHFST vs placebo plus DT plus CHFST (n = 1), and FZF plus placebo plus CHFST vs placebo plus DT plus CHFST (n = 1). Meta-analysis indicated that FZF have additional benefits based on the CHFST in reducing plasma NT-proBNP level, MLHFQ scores, Lee’s heart failure scores (LHFs), and composite cardiac events (CCEs). Meanwhile, it also improved the efficacy on TCM symptoms (TCMs), NYHA functional classification (NYHAfc), 6MWD, and LVEF. Adverse events were reported in 6 out of 12 studies without significant statistical difference. However, after assessing the strength of evidence, it was found that only the quality of evidence for CCEs was high, and the others were either moderate or low or very low. So we could not draw confirmative conclusions on its additional benefits except CCEs. Further clinical trials should be well designed to avoid the issues that were identified in this study. Conclusion. The efficacy and additional benefits of FZF for CCEs were certain according to the high-quality evidence assessed through GRADE. However, the efficacy and additional benefits for the other outcomes were uncertain judging from current studies. In addition, the safety assessment has a great room for improvement. Thus, further research studies are needed to find more convincing proofs.

Download Full-text

Bringing data from curated pathway resources to Cytoscape with OmniPath

Bioinformatics ◽

10.1093/bioinformatics/btz968 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2632-2633 ◽

Cited By ~ 6

Author(s):

Francesco Ceccarelli ◽

Denes Turei ◽

Attila Gabor ◽

Julio Saez-Rodriguez

Keyword(s):

Source Code ◽

Large Body ◽

Supplementary Information ◽

Supplementary Data ◽

Network Resources ◽

High Quality ◽

Comprehensive Collection ◽

Intuitive Interface ◽

Growing Network

Abstract Summary Multiple databases provide valuable information about curated pathways and other resources that can be used to build and analyze networks. OmniPath combines 61 (and continuously growing) network resources into a comprehensive collection, with over 120 000 interactions. We present here the OmniPath App, a Cytoscape plugin to flexibly import data from OmniPath via a simple and intuitive interface. Thus, it makes possible to directly access the large body of high-quality knowledge provided by OmniPath within Cytoscape for inspection and further use with other tools. Availability and implementation The OmniPath App has been developed for Cytoscape 3 in the Java programing language. The latest source code and the plugin can be found at: https://github.com/saezlab/Omnipath_Cytoscape and http://apps.cytoscape.org/apps/omnipath, respectively. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text