scholarly journals Seeker: alignment-free identification of bacteriophage genomes by deep learning

2020 ◽  
Vol 48 (21) ◽  
pp. e121-e121
Author(s):  
Noam Auslander ◽  
Ayal B Gussow ◽  
Sean Benler ◽  
Yuri I Wolf ◽  
Eugene V Koonin

Abstract Recent advances in metagenomic sequencing have enabled discovery of diverse, distinct microbes and viruses. Bacteriophages, the most abundant biological entity on Earth, evolve rapidly, and therefore, detection of unknown bacteriophages in sequence datasets is a challenge. Most of the existing detection methods rely on sequence similarity to known bacteriophage sequences, impeding the identification and characterization of distinct, highly divergent bacteriophage families. Here we present Seeker, a deep-learning tool for alignment-free identification of phage sequences. Seeker allows rapid detection of phages in sequence datasets and differentiation of phage sequences from bacterial ones, even when those phages exhibit little sequence similarity to established phage families. We comprehensively validate Seeker's ability to identify previously unidentified phages, and employ this method to detect unknown phages, some of which are highly divergent from the known phage families. We provide a web portal (seeker.pythonanywhere.com) and a user-friendly Python package (github.com/gussow/seeker) allowing researchers to easily apply Seeker in metagenomic studies, for the detection of diverse unknown bacteriophages.

Author(s):  
Noam Auslander ◽  
Ayal B. Gussow ◽  
Sean Benler ◽  
Yuri I. Wolf ◽  
Eugene V. Koonin

SummaryAdvances in metagenomics enable massive discovery of diverse, distinct microbes and viruses. Bacteriophages, the most abundant biological entity on Earth, evolve rapidly, and therefore, detection of unknown bacteriophages in sequence datasets is a challenge. The existing methods rely on sequence similarity to known bacteriophage sequences, impeding the identification and characterization of distinct bacteriophage families. We present Seeker, a deep-learning tool for reference-free identification of phage sequences. Seeker allows rapid detection of phages in sequence datasets and clean differentiation of phage sequences from bacterial ones, even for phages with little sequence similarity to established phage families. We comprehensively validate Seeker’s ability to identify unknown phages and employ Seeker to detect unknown phages, some of which are highly divergent from known phage families. We provide a web portal (seeker.pythonanywhere.com) and a user-friendly python package (https://github.com/gussow/seeker) allowing researchers to easily apply Seeker in metagenomic studies, for the detection of diverse unknown bacteriophages.


2002 ◽  
Vol 365 (1) ◽  
pp. 13-18 ◽  
Author(s):  
Suren AGHAJANIAN ◽  
D.Margaret WORRALL

The final two enzymes in the CoA biosynthetic pathway, phosphopantetheine adenylyltransferase (PPAT; EC 2.7.7.3) and dephospho-CoA kinase (DPCK; EC 2.7.1.24), are separate proteins in prokaryotes, but exist as a bifunctional enzyme in pig liver. In the present study we have obtained sequence information from purified pig-liver enzyme, and identified the corresponding cDNA in a number of species. The human gene localizes to chromosome 17q12-21 and contains regions with sequence similarity to the monofunctional Escherichia coli DPCK and PPAT. The recombinant 564-amino-acid human protein confirmed the associated transferase and kinase activities, and gave similar kinetic properties to the wild-type pig enzyme.


2020 ◽  
Vol 32 (29) ◽  
pp. 2000953 ◽  
Author(s):  
Bingnan Han ◽  
Yuxuan Lin ◽  
Yafang Yang ◽  
Nannan Mao ◽  
Wenyue Li ◽  
...  

2016 ◽  
Vol 15 (8) ◽  
pp. 2697-2705 ◽  
Author(s):  
Damon H. May ◽  
Emma Timmins-Schiffman ◽  
Molly P. Mikan ◽  
H. Rodger Harvey ◽  
Elhanan Borenstein ◽  
...  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lianyu Lin ◽  
Anupma Sharma ◽  
Qingyi Yu

Abstract Background Miniature inverted-repeat transposable elements (MITEs) are non-autonomous DNA transposable elements that play important roles in genome organization and evolution. Genome-wide identification and characterization of MITEs provide essential information for understanding genome structure and evolution. Results We performed genome-wide identification and characterization of MITEs in the pineapple genome. The top two MITE families, accounting for 29.39% of the total MITEs and 3.86% of the pineapple genome, have insertion preference in (TA) n dinucleotide microsatellite regions. We therefore named these MITEs A. comosus microsatellite-associated MITEs (Ac-mMITEs). The two Ac-mMITE families, Ac-mMITE-1 and Ac-mMITE-2, shared sequence similarity in the terminal inverted repeat (TIR) regions, suggesting that these two Ac-mMITE families might be derived from a common or closely related autonomous elements. The Ac-mMITEs are frequently clustered via adjacent insertions. Among the 21,994 full-length Ac-mMITEs, 46.1% of them were present in clusters. By analyzing the Ac-mMITEs without (TA) n microsatellite flanking sequences, we found that Ac-mMITEs were likely derived from Mutator-like DNA transposon. Ac-MITEs showed highly polymorphic insertion sites between cultivated pineapples and their wild relatives. To better understand the evolutionary history of Ac-mMITEs, we filtered and performed comparative analysis on the two distinct groups of Ac-mMITEs, microsatellite-targeting MITEs (mt-MITEs) that are flanked by dinucleotide microsatellites on both sides and mutator-like MITEs (ml-MITEs) that contain 9/10 bp TSDs. Epigenetic analysis revealed a lower level of host-induced silencing on the mt-MITEs in comparison to the ml-MITEs, which partially explained the significantly higher abundance of mt-MITEs in pineapple genome. The mt-MITEs and ml-MITEs exhibited differential insertion preference to gene-related regions and RNA-seq analysis revealed their differential influences on expression regulation of nearby genes. Conclusions Ac-mMITEs are the most abundant MITEs in the pineapple genome and they were likely derived from Mutator-like DNA transposon. Preferential insertion in (TA) n microsatellite regions of Ac-mMITEs occurred recently and is likely the result of damage-limiting strategy adapted by Ac-mMITEs during co-evolution with their host. Insertion in (TA) n microsatellite regions might also have promoted the amplification of mt-MITEs. In addition, mt-MITEs showed no or negligible impact on nearby gene expression, which may help them escape genome control and lead to their amplification.


2021 ◽  
Author(s):  
Kelly A. Mulholland ◽  
Calvin L. Keeler

Abstract BackgroundThe complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals. Many microbiome studies characterize only the bacterial component, for which there are several well-developed sequencing methods, bioinformatics tools and databases available. The lack of comprehensive bioinformatics workflows and databases have limited efforts to characterize the other components existing in a microbiome. BiomeSeq is a tool for the analysis of the complete animal microbiome using metagenomic sequencing data. With its comprehensive workflow and customizable parameters and microbial databases, BiomeSeq can rapidly quantify the viral, fungal, bacteriophage and bacterial components of a sample and produce informative tables for analysis. ResultsSimulated datasets were constructed, which contained known abundances of microbial sequences, and several performance metrics were analyzed, including correlation of predicted abundance with known abundance, root mean square error and rate of speed. BiomeSeq demonstrated high precision (average of 99.52%) and sensitivity (average of 93.01%). BiomeSeq was employed in detecting and quantifying the respiratory microbiome of a commercial poultry broiler flock throughout its grow-out cycle from hatching to processing and successfully processed 780 million reads. For each microbial species detected, BiomeSeq calculated the normalized abundance, percent relative abundance, and coverage as well as the diversity for each sample. Rate of speed for each step in the pipeline, precision and accuracy were calculated to examine BiomeSeq’s performance using in silico sequencing datasets. When compared to bacterial results generated by the commonly used 16S rRNA sequencing method, BiomeSeq detected the same most abundant bacteria, including Gallibacterium, Corynebacterium and Staphylococcus, as well as several additional species. ConclusionsBiomeSeq provides for the detection and quantification of the microbiome from next-generation metagenomic sequencing data. This tool is implemented into a user-friendly container that requires one command and generates a table containing taxonomical information for each microbe detected. It also determines normalized abundance, percent relative abundance, genome coverage and sample diversity calculations for each sample.


Author(s):  
Yan Lin ◽  
Bei Zhou ◽  
Weiyun Zhu

Post-weaning diarrhoea in pigs is mainly caused by pathogenic Escherichia coli and is a major source of revenue loss to the livestock industry. Bacteriophages dominate the gut virome and have the potential to regulate bacterial communities and thus influence the intestinal physiology. To determine the biological characterization of intestinal coliphages, we isolated and identified the faecal coliphages of healthy pre-weaned and post-weaned piglets from Nanjing and Chuzhou pig farms. First, ahead of coliphage isolation, 87 E. coli strains were isolated from healthy or diarrheal faecal samples from three pig farms, of which 8 were pathogenic strains including ETEC and EPEC. 87.3% of E. coli strains possessed drug resistance against three antibiotics. Using these 87 E. coli strains as indicator hosts, we isolated 45 coliphages and found a higher presence in the post-weaning stage than pre-weaning stage (24 vs 17 in Nanjing farm, 13 vs 4 in Chuzhou farm). Further more, each farm had a one most prevalent coliphage strain. Pathogenic E. coli -specific bacteriophages were commonly detected (9/10 samples in Nanjing farm, 7/10 in Chuzhou farm) in guts of sampled piglet and most had significant bacteriostatic effects ( P < 0.05) on pathogenic E. coli strains. Three polyvalent bacteriophages (N24, N30, and C5) were identified. The N30 and C5 strains showed a genetic identity of 89.67% with mild differences in infection characteristics. Our findings suggest that pathogenic E. coli -specific bacteriophages as well as polyvalent bacteriophages are commonly present in piglet gut and that weaning is an important event that affects coliphage numbers. IMPORTANCE Previous studies based on metagenomic sequencing reported that gut bacteriophages profoundly influence gut physiology but did not provide information regarding the host range and biological significance. Here, we screened coliphages from pre-weaned and post-weaned piglet gut against indicator hosts, which allowed us to identify the pathogenic E. coli -specific bacteriophages and polyvalent bacteriophages in pig farms and quantify their presence. Our approach complements sequencing methods and provides new insights into the biological characterizations of bacteriophage in the gut along with the ecological effects of intestinal bacteriophages.


Development ◽  
2020 ◽  
Vol 147 (24) ◽  
pp. dev194589
Author(s):  
Benoit Aigouy ◽  
Claudio Cortes ◽  
Shanda Liu ◽  
Benjamin Prud'Homme

ABSTRACTEpithelia are dynamic tissues that self-remodel during their development. During morphogenesis, the tissue-scale organization of epithelia is obtained through a sum of individual contributions of the cells constituting the tissue. Therefore, understanding any morphogenetic event first requires a thorough segmentation of its constituent cells. This task, however, usually involves extensive manual correction, even with semi-automated tools. Here, we present EPySeg, an open-source, coding-free software that uses deep learning to segment membrane-stained epithelial tissues automatically and very efficiently. EPySeg, which comes with a straightforward graphical user interface, can be used as a Python package on a local computer, or on the cloud via Google Colab for users not equipped with deep-learning compatible hardware. By substantially reducing human input in image segmentation, EPySeg accelerates and improves the characterization of epithelial tissues for all developmental biologists.


2021 ◽  
Vol 17 (10) ◽  
pp. e1009428
Author(s):  
Ryota Sugimoto ◽  
Luca Nishimura ◽  
Phuong Thanh Nguyen ◽  
Jumpei Ito ◽  
Nicholas F. Parrish ◽  
...  

Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction.


Sign in / Sign up

Export Citation Format

Share Document