scholarly journals IslandHunter – A Java-based GI detection software

Author(s):  
Shakuntala Baichoo ◽  
Haswanee Goodur ◽  
Vyasanand Ramtohul

Over the past decade, researchers have discovered that apart from the essential genes, bacterial genomes also contain a variable amount of accessory genes acquired by horizontal gene transfer (HGT) that are categorized as genomic islands (GIs). GIs encode adaptive traits, which might be beneficial for the species under certain growth or environmental conditions. It has always been a challenge for biologists to identify GIs within a bacterial genome as they evolve very rapidly. This paper proposes a standalone software, IslanHunter, that has been developed using Java and BioJava and can extract GI regions using GC content, codon usage bias, dinucleotide frequency bias, tetranucleotide frequency bias, k-mer signature analysis (2-mer, 3-mer, 4-mer, 5-mer, and 6-mer) and presence of mobility genes. IslandHunter provides a simple graphical user interface where disclosed GIs are displayed in a tree-view and a circular graph. Users are presented with options to save the GI regions as blocks of DNA sequences in FASTA format. They can later use these predicted GI regions for further analysis. IslandHunter can take as input, files in GenBank, EMBL or FASTA formats. IslandHunter provides flexible display options and save options. The software has been evaluated against exiting tools with good performance. It is available for evaluation at https://github.com/ShakunBaichoo/IslandHunter .

2014 ◽  
Author(s):  
Shakuntala Baichoo ◽  
Haswanee Goodur ◽  
Vyasanand Ramtohul

Over the past decade, researchers have discovered that apart from the essential genes, bacterial genomes also contain a variable amount of accessory genes acquired by horizontal gene transfer (HGT) that are categorized as genomic islands (GIs). GIs encode adaptive traits, which might be beneficial for the species under certain growth or environmental conditions. It has always been a challenge for biologists to identify GIs within a bacterial genome as they evolve very rapidly. This paper proposes a standalone software, IslanHunter, that has been developed using Java and BioJava and can extract GI regions using GC content, codon usage bias, dinucleotide frequency bias, tetranucleotide frequency bias, k-mer signature analysis (2-mer, 3-mer, 4-mer, 5-mer, and 6-mer) and presence of mobility genes. IslandHunter provides a simple graphical user interface where disclosed GIs are displayed in a tree-view and a circular graph. Users are presented with options to save the GI regions as blocks of DNA sequences in FASTA format. They can later use these predicted GI regions for further analysis. IslandHunter can take as input, files in GenBank, EMBL or FASTA formats. IslandHunter provides flexible display options and save options. The software has been evaluated against exiting tools with good performance. It is available for evaluation at https://github.com/ShakunBaichoo/IslandHunter .


2017 ◽  
Author(s):  
Lena M. Joesch-Cohen ◽  
Max Robinson ◽  
Neda Jabbari ◽  
Christopher Lausted ◽  
Gustavo Glusman

AbstractBackgroundBacterial genomes have characteristic compositional skews, which are differences in nucleotide frequency between the leading and lagging DNA strands across a segment of a genome. It is thought that these strand asymmetries arise as a result of mutational biases and selective constraints, particularly for energy efficiency. Analysis of compositional skews in a diverse set of bacteria provides a comparative context in which mutational and selective environmental constraints can be studied. These analyses typically require finished and well-annotated genomic sequences.ResultsWe present three novel metrics for examining genome composition skews; all three metrics can be computed for unfinished or partially-annotated genomes. The first two metrics, (dot-skew and cross-skew) depend on sequence and gene annotation of a single genome, while the third metric (residual skew) highlights unusual genomes by subtracting a GC content-based model of a library of genome sequences. We applied these metrics to all 7738 available bacterial genomes, including partial drafts, and identified outlier species. A number of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns despite only distant phylogenetic relationship. While unrelated, some of the outlier bacterial species share lifestyle characteristics, in particular intracellularity and biosynthetic dependence on their hosts.ConclusionsOur novel metrics appear to reflect the effects of biosynthetic constraints and adaptations to life within one or more hosts on genome composition. We provide results for each analyzed genome, software and interactive visualizations at http://db.systemsbiology.net/gestalt/skew_metrics.


DNA Research ◽  
2019 ◽  
Vol 26 (5) ◽  
pp. 391-398 ◽  
Author(s):  
Mitsuhiko P Sato ◽  
Yoshitoshi Ogura ◽  
Keiji Nakamura ◽  
Ruriko Nishida ◽  
Yasuhiro Gotoh ◽  
...  

Abstract In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.


2018 ◽  
Vol 373 (1748) ◽  
pp. 20170078 ◽  
Author(s):  
Preeti Rathi ◽  
Sara Maurer ◽  
Daniel Summerer

The epigenetic DNA nucleobases 5-methylcytosine (5mC) and N 4-methylcytosine (4mC) coexist in bacterial genomes and have important functions in host defence and transcription regulation. To better understand the individual biological roles of both methylated nucleobases, analytical strategies for distinguishing unmodified cytosine (C) from 4mC and 5mC are required. Transcription-activator-like effectors (TALEs) are programmable DNA-binding repeat proteins, which can be re-engineered for the direct detection of epigenetic nucleobases in user-defined DNA sequences. We here report the natural, cytosine-binding TALE repeat to not strongly differentiate between 5mC and 4mC. To engineer repeats with selectivity in the context of C, 5mC and 4mC, we developed a homogeneous fluorescence assay and screened a library of size-reduced TALE repeats for binding to all three nucleobases. This provided insights into the requirements of size-reduced TALE repeats for 4mC binding and revealed a single mutant repeat as a selective binder of 4mC. Employment of a TALE with this repeat in affinity enrichment enabled the isolation of a user-defined DNA sequence containing a single 4mC but not C or 5mC from the background of a bacterial genome. Comparative enrichments with TALEs bearing this or the natural C-binding repeat provides an approach for the complete, programmable decoding of all cytosine nucleobases found in bacterial genomes. This article is part of a discussion meeting issue ‘Frontiers in epigenetic chemical biology’.


Toxins ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 467
Author(s):  
Aina Ichihara ◽  
Hinako Ojima ◽  
Kazuyoshi Gotoh ◽  
Osamu Matsushita ◽  
Susumu Take ◽  
...  

The infection caused by Helicobacter pylori is associated with several diseases, including gastric cancer. Several methods for the diagnosis of H. pylori infection exist, including endoscopy, the urea breath test, and the fecal antigen test, which is the serum antibody titer test that is often used since it is a simple and highly sensitive test. In this context, this study aims to find the association between different antibody reactivities and the organization of bacterial genomes. Next-generation sequences were performed to determine the genome sequences of four strains of antigens with different reactivity. The search was performed on the common genes, with the homology analysis conducted using a genome ring and dot plot analysis. The two antigens of the highly reactive strains showed a high gene homology, and Western blots for CagA and VacA also showed high expression levels of proteins. In the poorly responsive antigen strains, it was found that the inversion occurred around the vacA gene in the genome. The structure of bacterial genomes might contribute to the poor reactivity exhibited by the antibodies of patients. In the future, an accurate serodiagnosis could be performed by using a strain with few gene mutations of the antigen used for the antibody titer test of H. pylori.


Author(s):  
Melisa B Bonica ◽  
Dario E Balcazar ◽  
Ailen Chuchuy ◽  
Jorge A Barneche ◽  
Carolina Torres ◽  
...  

Abstract Diseases caused by flaviviruses are a major public health burden across the world. In the past decades, South America has suffered dengue epidemics, the re-emergence of yellow fever and St. Louis encephalitis viruses, and the introduction of West Nile and Zika viruses. Many insect-specific flaviviruses (ISFs) that cannot replicate in vertebrate cells have recently been described. In this study, we analyzed field-collected mosquito samples from six different ecoregions of Argentina to detect flaviviruses. We did not find any RNA belonging to pathogenic flaviviruses or ISFs in adults or immature stages. However, flaviviral-like DNA similar to flavivirus NS5 region was detected in 83–100% of Aedes aegypti (L.). Despite being previously described as an ancient element in the Ae. aegypti genome, the flaviviral-like DNA sequence was not detected in all Ae. aegypti samples and sequences obtained did not form a monophyletic group, possibly reflecting the genetic diversity of mosquito populations in Argentina.


2021 ◽  
Vol 54 (1) ◽  
pp. 1-22
Author(s):  
Rayan Chikhi ◽  
Jan Holub ◽  
Paul Medvedev

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k -mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k -mer set has emerged as a shared underlying component. A set of k -mers has unique features and applications that, over the past 10 years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k -mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.


mSystems ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Matthew R. Olm ◽  
Alexander Crits-Christoph ◽  
Spencer Diamond ◽  
Adi Lavy ◽  
Paula B. Matheus Carnevali ◽  
...  

ABSTRACT Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination. IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.


2014 ◽  
Vol 2014 ◽  
pp. 1-8
Author(s):  
Momchilo Vuyisich ◽  
Ayesha Arefin ◽  
Karen Davenport ◽  
Shihai Feng ◽  
Cheryl Gleasner ◽  
...  

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.


2021 ◽  
Author(s):  
Amit Kumar ◽  
Malyaj R Prajapati ◽  
Surendra Upadhyay ◽  
Anamika Bhordia ◽  
Vinod Kumar Singh ◽  
...  

Abstract The present report communicates the first complete genome sequence of Brucella abortus 2308 strain isolated from a an abortion storm in a dairy farm located at Kanpur, Uttar Pradesh in India. It caused the last trimester abortions of 32 animals out of 100 cows in a dairy over a period of 60 days. The bacteria were isolated in pure culture from the placenta of aborted cows. The genome sequence length of isolated bacteria is 3,285,606 bp with a 57.25 % GC content, an N50 value of 296,426, L50 value of 4 containing 3,119 coding DNA sequences (CDSs), 49 tRNAs, 1 transfer messenger RNA (mRNA), and 3 rRNA genes. It is the first report of Brucella abortus 2308 isolation and complete genome sequence from Indian subcontinent.


Sign in / Sign up

Export Citation Format

Share Document