Simultaneous gene finding in multiple genomes

10.7287/peerj.preprints.1296 ◽

2015 ◽

Cited By ~ 1

Author(s):

Stefanie König ◽

Lars Romoth ◽

Lizzy Gerischer ◽

Mario Stanke

Keyword(s):

Genome Alignment ◽

Whole Genome ◽

Gene Finding ◽

Dual Decomposition ◽

Closely Related Species ◽

Protein Coding ◽

Decomposition Approach ◽

New Approach ◽

Gene Structures ◽

Gains And Losses

As whole genome sequencing is taking on ever-increasing dimensions, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or – if not – where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on a whole-genome alignment of 12 Drosophila species and its accuracy evaluated on D. melanogaster. The method is being implemented as an extension to the gene finder AUGUSTUS.

Download Full-text

AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions

Bioinformatics ◽

10.1093/bioinformatics/btz714 ◽

2019 ◽

Author(s):

Deepank R Korandla ◽

Jacob M Wozniak ◽

Anaamika Campeau ◽

David J Gonzalez ◽

Erik S Wright

Keyword(s):

R Package ◽

Evolutionary Conservation ◽

Supplementary Information ◽

Bioconductor Package ◽

Gene Finding ◽

Proteomics Data ◽

Protein Coding ◽

New Approach ◽

Protein Coding Genes ◽

Clear Winner

Abstract Motivation A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. Results Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88–95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. Availability and implementation AssessORF is available as an R package via the Bioconductor package repository. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Discrimination of hospital isolates of Acinetobacter baumannii using repeated sequences and whole genome alignment differential analysis

Journal of Applied Genetics ◽

10.1007/s13353-021-00640-5 ◽

2021 ◽

Author(s):

Roman Kotłowski ◽

Alicja Nowak-Zaleska ◽

Grzegorz Węgrzyn

Keyword(s):

Acinetobacter Baumannii ◽

Time Frame ◽

Repeated Sequences ◽

Hospital Environment ◽

Genome Alignment ◽

Whole Genome ◽

Differential Analysis ◽

Gene Encoding ◽

Resistance Patterns ◽

Whole Genome Alignment

AbstractAn optimized method for bacterial strain differentiation, based on combination of Repeated Sequences and Whole Genome Alignment Differential Analysis (RS&WGADA), is presented in this report. In this analysis, 51 Acinetobacter baumannii multidrug-resistance strains from one hospital environment and patients from 14 hospital wards were classified on the basis of polymorphisms of repeated sequences located in CRISPR region, variation in the gene encoding the EmrA-homologue of E. coli, and antibiotic resistance patterns, in combination with three newly identified polymorphic regions in the genomes of A. baumannii clinical isolates. Differential analysis of two similarity matrices between different genotypes and resistance patterns allowed to distinguish three significant correlations (p < 0.05) between 172 bp DNA insertion combined with resistance to chloramphenicol and gentamycin. Interestingly, 45 and 55 bp DNA insertions within the CRISPR region were identified, and combined during analyses with resistance/susceptibility to trimethoprim/sulfamethoxazole. Moreover, 184 or 1374 bp DNA length polymorphisms in the genomic region located upstream of the GTP cyclohydrolase I gene, associated mainly with imipenem susceptibility, was identified. In addition, considerable nucleotide polymorphism of the gene encoding the gamma/tau subunit of DNA polymerase III, an enzyme crucial for bacterial DNA replication, was discovered. The differentiation analysis performed using the above described approach allowed us to monitor the distribution of A. baumannii isolates in different wards of the hospital in the time frame of several years, indicating that the optimized method may be useful in hospital epidemiological studies, particularly in identification of the source of primary infections.

Download Full-text

Genome Analysis of Rhodococcus Sp. DSSKP-R-001: A Highly Effective β-Estradiol-Degrading Bacterium

International Journal of Genomics ◽

10.1155/2018/3505428 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Hongyan Zhao ◽

Kejian Tian ◽

Qing Qiu ◽

Yu Wang ◽

Hongyan Zhang ◽

...

Keyword(s):

Sole Source ◽

Rhodococcus Opacus ◽

Whole Genome ◽

Metabolic Potential ◽

Protein Coding ◽

Degrading Enzymes ◽

Degrading Bacterium ◽

Rhodococcus Jostii Rha1 ◽

Multiple Strains ◽

Rhodococcus Sp

We screened bacteria that use E2 as its sole source of carbon and energy for growth and identified them as Rhodococcus, and we named them DSSKP-R-001. For a better understanding of the metabolic potential of the strain, whole genome sequencing of Rhodococcus DSSKP-R-001 and annotation of the functional genes were performed. The genomic sketches included a predicted protein-coding gene of approximately 5.4 Mbp with G + C content of 68.72% and 5180. The genome of Rhodococcus strain DSSKP-R-001 consists of three replicons: one chromosome and two plasmids of 5.2, 0.09, and 0.09, respectively. The results showed that there were ten steroid-degrading enzymes distributed in the whole genome of the strain. The existence and expression of estradiol-degrading enzymes were verified by PCR and RTPCR. Finally, comparative genomics was used to compare multiple strains of Rhodococcus. It was found that Rhodococcus DSSKP-R-001 had the highest similarity to Rhodococcus sp. P14 and there were 2070 core genes shared with Rhodococcus sp. P14, Rhodococcus jostii RHA1, Rhodococcus opacus B4, and Rhodococcus equi 103S, showing evolutionary homology. In summary, this study provides a comprehensive understanding of the role of Rhodococcus DSSKP-R-001 in estradiol-efficient degradation of these assays for Rhodococcus. DSSKP-R-001 in bioremediation and evolution within Rhodococcus has important meaning.

Download Full-text

Species Identification of Pinus Pollen Found in Belukha Glacier, Russian Altai Mountains, Using a Whole-Genome Amplification Method

Forests ◽

10.3390/f9080444 ◽

2018 ◽

Vol 9 (8) ◽

pp. 444

Author(s):

Fumio Nakazawa ◽

Yoshihisa Suyama ◽

Satoshi Imura ◽

Hideaki Motoyama

Keyword(s):

Species Identification ◽

Whole Genome Amplification ◽

Pollen Grains ◽

Pollen Grain ◽

Pinus Sibirica ◽

Whole Genome ◽

Accurate Identification ◽

Closely Related Species ◽

Genome Amplification ◽

Amplification Method

Pollen taxa in sediment samples can be identified based on morphology. However, closely related species do not differ substantially in pollen morphology, and accurate identification is generally limited to genera or families. Because many pollen grains in glaciers contain protoplasm, genetic information obtained from pollen grains should enable the identification of plant taxa at the species level. In the present study, species identification of Pinus pollen grains was attempted using whole-genome amplification (WGA). We used pollen grains extracted from surface snow (depth, 1.8–1.9 m) from the Belukha glacier in the summer of 2003. WGA was performed using a single pollen grain. Some regions of the chloroplast genome were amplified by PCR, and the DNA products were sequenced to identify the pollen grain. Pinus includes approximately 111 recognized species in two subgenera, four sections, and 11 subsections. The tree species Pinus sibirica and P. sylvestris are currently found at the periphery of the glacier. We identified the pollen grains from the Belukha glacier to the level of section or subsection to which P. sibirica and P. sylvestris belong. Moreover, we specifically identified two pollen grains as P. sibirica or P. cembra. Fifteen species, including P. sibirica, were candidates for the remaining pollen grain.

Download Full-text

MUM&Co: accurate detection of all SV types through whole-genome alignment

Bioinformatics ◽

10.1093/bioinformatics/btaa115 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3242-3243 ◽

Cited By ~ 2

Author(s):

Samuel O’Donnell ◽

Gilles Fischer

Keyword(s):

De Novo ◽

Supplementary Information ◽

Genome Alignment ◽

Whole Genome ◽

Structural Variations ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Human Genomes ◽

Whole Genome Alignment ◽

Primary Output

Abstract Summary MUM&Co is a single bash script to detect structural variations (SVs) utilizing whole-genome alignment (WGA). Using MUMmer’s nucmer alignment, MUM&Co can detect insertions, deletions, tandem duplications, inversions and translocations greater than 50 bp. Its versatility depends upon the WGA and therefore benefits from contiguous de-novo assemblies generated by third generation sequencing technologies. Benchmarked against five WGA SV-calling tools, MUM&Co outperforms all tools on simulated SVs in yeast, plant and human genomes and performs similarly in two real human datasets. Additionally, MUM&Co is particularly unique in its ability to find inversions in both simulated and real datasets. Lastly, MUM&Co’s primary output is an intuitive tabulated file containing a list of SVs with only necessary genomic details. Availability and implementation https://github.com/SAMtoBAM/MUMandCo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Nonlinear distributed model predictive control for complex systems: application for hydraulic system

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331218808340 ◽

2018 ◽

Vol 41 (10) ◽

pp. 2751-2763 ◽

Cited By ~ 2

Author(s):

Nadia Hajji ◽

Saber Maraoui ◽

Larbi Chrifi-Alaoui ◽

Kais Bouzrara

Keyword(s):

Model Predictive Control ◽

Predictive Control ◽

Hydraulic System ◽

Water Levels ◽

Distributed Model ◽

Dual Decomposition ◽

Decomposition Approach ◽

Distributed Model Predictive Control ◽

Nonlinear Predictive Control ◽

Tank System

In this paper, a nonlinear distributed model predictive control based on dual decomposition approach is proposed for complex system. The global system can be decomposed into several subsystems and each one will be managed by its own controller. To design the nonlinear predictive control in a distributed fashion, an analytical solution is proposed. The latter is based on the approximation of the error using its expansion of Taylor series. The proposed approach is implemented on the three tank system to control the water levels. Simulation results demonstrate the effectiveness of the proposed approach.

Download Full-text

Whole Genome Alignment with BLAST on Grid Environment

The Sixth IEEE International Conference on Computer and Information Technology (CIT'06) ◽

10.1109/cit.2006.196 ◽

2006 ◽

Author(s):

Min-sung Kim ◽

Choong-hyun Sun ◽

Jin-ki Kim ◽

Gwan-su Yi

Keyword(s):

Genome Alignment ◽

Whole Genome ◽

Grid Environment ◽

Whole Genome Alignment

Download Full-text

A toolkit enabling efficient, scalable and reproducible gene tagging in trypanosomatids

Open Biology ◽

10.1098/rsob.140197 ◽

2015 ◽

Vol 5 (1) ◽

pp. 140197 ◽

Cited By ~ 97

Author(s):

Samuel Dean ◽

Jack Sunter ◽

Richard J. Wheeler ◽

Ian Hodkinson ◽

Eva Gluenz ◽

...

Keyword(s):

Trypanosoma Brucei ◽

Large Datasets ◽

Gene Tagging ◽

Deletion Mutants ◽

Whole Genome ◽

Leishmania Mexicana ◽

Protein Coding ◽

Achievable Goal ◽

Endogenous Loci ◽

Pcr Conditions

One of the first steps in understanding a protein's function is to determine its localization; however, the methods for localizing proteins in some systems have not kept pace with the developments in other fields, creating a bottleneck in the analysis of the large datasets that are generated in the post-genomic era. To address this, we developed tools for tagging proteins in trypanosomatids. We made a plasmid that, when coupled with long primer PCR, can be used to produce transgenes at their endogenous loci encoding proteins tagged at either terminus or within the protein coding sequence. This system can also be used to generate deletion mutants to investigate the function of different protein domains. We show that the length of homology required for successful integration precluded long primer PCR tagging in Leishmania mexicana . Hence, we developed plasmids and a fusion PCR approach to create gene tagging amplicons with sufficiently long homologous regions for targeted integration, suitable for use in trypanosomatids with less efficient homologous recombination than Trypanosoma brucei . Importantly, we have automated the primer design, developed universal PCR conditions and optimized the workflow to make this system reliable, efficient and scalable such that whole genome tagging is now an achievable goal.

Download Full-text

Genomic Analysis of Sarcomyxa edulis Reveals the Basis of Its Medicinal Properties and Evolutionary Relationships

Frontiers in Microbiology ◽

10.3389/fmicb.2021.652324 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fenghua Tian ◽

Changtian Li ◽

Yu Li

Keyword(s):

Single Molecule ◽

De Novo ◽

Genomic Analysis ◽

Single Copy ◽

Whole Genome Sequence ◽

Type I ◽

Whole Genome ◽

Uridine Diphosphate ◽

Protein Coding ◽

Medicinal Value

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.

Download Full-text