query genome Latest Research Papers

LINflow: a computational pipeline that combines an alignment-free with an alignment-based method to accelerate generation of similarity matrices for prokaryotic genomes

PeerJ ◽

10.7717/peerj.10906 ◽

2021 ◽

Vol 9 ◽

pp. e10906

Author(s):

Long Tian ◽

Reza Mazloom ◽

Lenwood S. Heath ◽

Boris A. Vinatzer

Keyword(s):

Large Set ◽

Computational Pipeline ◽

Genome Sequences ◽

Alignment Free ◽

Prokaryotic Genomes ◽

Highly Correlated ◽

Query Genome ◽

Genomic Similarity ◽

Memory Efficient ◽

Similarity Matrices

Background Computing genomic similarity between strains is a prerequisite for genome-based prokaryotic classification and identification. Genomic similarity was first computed as Average Nucleotide Identity (ANI) values based on the alignment of genomic fragments. Since this is computationally expensive, faster and computationally cheaper alignment-free methods have been developed to estimate ANI. However, these methods do not reach the level of accuracy of alignment-based methods. Methods Here we introduce LINflow, a computational pipeline that infers pairwise genomic similarity in a set of genomes. LINflow takes advantage of the speed of the alignment-free sourmash tool to identify the genome in a dataset that is most similar to a query genome and the precision of the alignment-based pyani software to precisely compute ANI between the query genome and the most similar genome identified by sourmash. This is repeated for each new genome that is added to a dataset. The sequentially computed ANI values are stored as Life Identification Numbers (LINs), which are then used to infer all other pairwise ANI values in the set. We tested LINflow on four sets, 484 genomes in total, and compared the needed time and the generated similarity matrices with other tools. Results LINflow is up to 150 times faster than pyani and pairwise ANI values generated by LINflow are highly correlated with those computed by pyani. However, because LINflow infers most pairwise ANI values instead of computing them directly, ANI values occasionally depart from the ANI values computed by pyani. In conclusion, LINflow is a fast and memory-efficient pipeline to infer similarity among a large set of prokaryotic genomes. Its ability to quickly add new genome sequences to an already computed similarity matrix makes LINflow particularly useful for projects when new genome sequences need to be regularly added to an existing dataset.

SEA version 3.0: a comprehensive extension and update of the Super-Enhancer archive

Nucleic Acids Research ◽

10.1093/nar/gkz1028 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chuangeng Chen ◽

Dianshuang Zhou ◽

Yue Gu ◽

Cong Wang ◽

Mengyan Zhang ◽

...

Keyword(s):

Regulation Of Gene Expression ◽

Enrichment Analysis ◽

Cell Types ◽

Cell Type Specificity ◽

Chromatin Interactions ◽

Super Enhancer ◽

Genome Visualization ◽

Multiple Species ◽

New Criteria ◽

Query Genome

Abstract Super-enhancers (SEs) are critical for the transcriptional regulation of gene expression. We developed the super-enhancer archive version 3.0 (SEA v. 3.0, http://sea.edbc.org) to extend SE research. SEA v. 3.0 provides the most comprehensive archive to date, consisting of 164 545 super-enhancers. Of these, 80 549 are newly identified from 266 cell types/tissues/diseases using an optimized computational strategy, and 52 have been experimentally confirmed with manually curated references. We now support super-enhancers in 11 species including 7 new species (zebrafish, chicken, chimp, rhesus, sheep, Xenopus tropicalis and stickleback). To facilitate super-enhancer functional analysis, we added several new regulatory datasets including 3 361 785 typical enhancers, chromatin interactions, SNPs, transcription factor binding sites and SpCas9 target sites. We also updated or developed new criteria query, genome visualization and analysis tools for the archive. This includes a tool based on Shannon Entropy to evaluate SE cell type specificity, a new genome browser that enables the visualization of SE spatial interactions based on Hi-C data, and an enhanced enrichment analysis interface that provides online enrichment analyses of SE related genes. SEA v. 3.0 provides a comprehensive database of all available SE information across multiple species, and will facilitate super-enhancer research, especially as related to development and disease.

A Computational Framework for Tracing the Origins of Genomic Islands in Prokaryotes

International Scholarly Research Notices ◽

10.1155/2014/732857 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Peng Wan ◽

Dongsheng Che

Keyword(s):

Operating System ◽

Main Idea ◽

Computational Approach ◽

Genomic Islands ◽

Computational Framework ◽

Blast Search ◽

Gene Level ◽

Genome Search ◽

Genomic Regions ◽

Query Genome

Genomic islands (GIs) are chunks of genomic fragments that are acquired from nongenealogical organisms through horizontal gene transfer (HGT). Current researches on studying donor-recipient relationships for HGT are limited at a gene level. As more GIs have been identified and verified, the way of studying donor-recipient relationships can be better modeled by using GIs rather than individual genes. In this paper, we report the development of a computational framework for detecting origins of GIs. The main idea of our computational framework is to identify GIs in a query genome, search candidate genomes that contain genomic regions similar to those GIs in the query genome by BLAST search, and then filter out some candidate genomes if those similar genomic regions are also alien (detected by GI detection tools). We have applied our framework in finding the GI origins for Mycobacterium tuberculosis H37Rv, Herminiimonas arsenicoxydans, and three Thermoanaerobacter species. The predicted results were used to establish the donor-recipient network relationships and visualized by Gephi. Our studies have shown that donor genomes detected by our computational approach were mainly consistent with previous studies. Our framework was implemented with Perl and executed on Windows operating system.

query genome
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

LINflow: a computational pipeline that combines an alignment-free with an alignment-based method to accelerate generation of similarity matrices for prokaryotic genomes

SEA version 3.0: a comprehensive extension and update of the Super-Enhancer archive

A Computational Framework for Tracing the Origins of Genomic Islands in Prokaryotes

Export Citation Format

query genomeRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

LINflow: a computational pipeline that combines an alignment-free with an alignment-based method to accelerate generation of similarity matrices for prokaryotic genomes

SEA version 3.0: a comprehensive extension and update of the Super-Enhancer archive

A Computational Framework for Tracing the Origins of Genomic Islands in Prokaryotes

query genome
Recently Published Documents