scholarly journals Simultaneous gene finding in multiple genomes

Author(s):  
Stefanie König ◽  
Lars Romoth ◽  
Lizzy Gerischer ◽  
Mario Stanke

As whole genome sequencing is taking on ever-increasing dimensions, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or – if not – where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on a whole-genome alignment of 12 Drosophila species and its accuracy evaluated on D. melanogaster. The method is being implemented as an extension to the gene finder AUGUSTUS.

Author(s):  
Stefanie König ◽  
Lars Romoth ◽  
Lizzy Gerischer ◽  
Mario Stanke

As whole genome sequencing is taking on ever-increasing dimensions, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or – if not – where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on a whole-genome alignment of 12 Drosophila species and its accuracy evaluated on D. melanogaster. The method is being implemented as an extension to the gene finder AUGUSTUS.


2019 ◽  
Author(s):  
Deepank R Korandla ◽  
Jacob M Wozniak ◽  
Anaamika Campeau ◽  
David J Gonzalez ◽  
Erik S Wright

Abstract Motivation A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. Results Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88–95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. Availability and implementation AssessORF is available as an R package via the Bioconductor package repository. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Roman Kotłowski ◽  
Alicja Nowak-Zaleska ◽  
Grzegorz Węgrzyn

AbstractAn optimized method for bacterial strain differentiation, based on combination of Repeated Sequences and Whole Genome Alignment Differential Analysis (RS&WGADA), is presented in this report. In this analysis, 51 Acinetobacter baumannii multidrug-resistance strains from one hospital environment and patients from 14 hospital wards were classified on the basis of polymorphisms of repeated sequences located in CRISPR region, variation in the gene encoding the EmrA-homologue of E. coli, and antibiotic resistance patterns, in combination with three newly identified polymorphic regions in the genomes of A. baumannii clinical isolates. Differential analysis of two similarity matrices between different genotypes and resistance patterns allowed to distinguish three significant correlations (p < 0.05) between 172 bp DNA insertion combined with resistance to chloramphenicol and gentamycin. Interestingly, 45 and 55 bp DNA insertions within the CRISPR region were identified, and combined during analyses with resistance/susceptibility to trimethoprim/sulfamethoxazole. Moreover, 184 or 1374 bp DNA length polymorphisms in the genomic region located upstream of the GTP cyclohydrolase I gene, associated mainly with imipenem susceptibility, was identified. In addition, considerable nucleotide polymorphism of the gene encoding the gamma/tau subunit of DNA polymerase III, an enzyme crucial for bacterial DNA replication, was discovered. The differentiation analysis performed using the above described approach allowed us to monitor the distribution of A. baumannii isolates in different wards of the hospital in the time frame of several years, indicating that the optimized method may be useful in hospital epidemiological studies, particularly in identification of the source of primary infections.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Hongyan Zhao ◽  
Kejian Tian ◽  
Qing Qiu ◽  
Yu Wang ◽  
Hongyan Zhang ◽  
...  

We screened bacteria that use E2 as its sole source of carbon and energy for growth and identified them as Rhodococcus, and we named them DSSKP-R-001. For a better understanding of the metabolic potential of the strain, whole genome sequencing of Rhodococcus DSSKP-R-001 and annotation of the functional genes were performed. The genomic sketches included a predicted protein-coding gene of approximately 5.4 Mbp with G + C content of 68.72% and 5180. The genome of Rhodococcus strain DSSKP-R-001 consists of three replicons: one chromosome and two plasmids of 5.2, 0.09, and 0.09, respectively. The results showed that there were ten steroid-degrading enzymes distributed in the whole genome of the strain. The existence and expression of estradiol-degrading enzymes were verified by PCR and RTPCR. Finally, comparative genomics was used to compare multiple strains of Rhodococcus. It was found that Rhodococcus DSSKP-R-001 had the highest similarity to Rhodococcus sp. P14 and there were 2070 core genes shared with Rhodococcus sp. P14, Rhodococcus jostii RHA1, Rhodococcus opacus B4, and Rhodococcus equi 103S, showing evolutionary homology. In summary, this study provides a comprehensive understanding of the role of Rhodococcus DSSKP-R-001 in estradiol-efficient degradation of these assays for Rhodococcus. DSSKP-R-001 in bioremediation and evolution within Rhodococcus has important meaning.


Forests ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 444
Author(s):  
Fumio Nakazawa ◽  
Yoshihisa Suyama ◽  
Satoshi Imura ◽  
Hideaki Motoyama

Pollen taxa in sediment samples can be identified based on morphology. However, closely related species do not differ substantially in pollen morphology, and accurate identification is generally limited to genera or families. Because many pollen grains in glaciers contain protoplasm, genetic information obtained from pollen grains should enable the identification of plant taxa at the species level. In the present study, species identification of Pinus pollen grains was attempted using whole-genome amplification (WGA). We used pollen grains extracted from surface snow (depth, 1.8–1.9 m) from the Belukha glacier in the summer of 2003. WGA was performed using a single pollen grain. Some regions of the chloroplast genome were amplified by PCR, and the DNA products were sequenced to identify the pollen grain. Pinus includes approximately 111 recognized species in two subgenera, four sections, and 11 subsections. The tree species Pinus sibirica and P. sylvestris are currently found at the periphery of the glacier. We identified the pollen grains from the Belukha glacier to the level of section or subsection to which P. sibirica and P. sylvestris belong. Moreover, we specifically identified two pollen grains as P. sibirica or P. cembra. Fifteen species, including P. sibirica, were candidates for the remaining pollen grain.


2020 ◽  
Vol 36 (10) ◽  
pp. 3242-3243 ◽  
Author(s):  
Samuel O’Donnell ◽  
Gilles Fischer

Abstract Summary MUM&Co is a single bash script to detect structural variations (SVs) utilizing whole-genome alignment (WGA). Using MUMmer’s nucmer alignment, MUM&Co can detect insertions, deletions, tandem duplications, inversions and translocations greater than 50 bp. Its versatility depends upon the WGA and therefore benefits from contiguous de-novo assemblies generated by third generation sequencing technologies. Benchmarked against five WGA SV-calling tools, MUM&Co outperforms all tools on simulated SVs in yeast, plant and human genomes and performs similarly in two real human datasets. Additionally, MUM&Co is particularly unique in its ability to find inversions in both simulated and real datasets. Lastly, MUM&Co’s primary output is an intuitive tabulated file containing a list of SVs with only necessary genomic details. Availability and implementation https://github.com/SAMtoBAM/MUMandCo. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 41 (10) ◽  
pp. 2751-2763 ◽  
Author(s):  
Nadia Hajji ◽  
Saber Maraoui ◽  
Larbi Chrifi-Alaoui ◽  
Kais Bouzrara

In this paper, a nonlinear distributed model predictive control based on dual decomposition approach is proposed for complex system. The global system can be decomposed into several subsystems and each one will be managed by its own controller. To design the nonlinear predictive control in a distributed fashion, an analytical solution is proposed. The latter is based on the approximation of the error using its expansion of Taylor series. The proposed approach is implemented on the three tank system to control the water levels. Simulation results demonstrate the effectiveness of the proposed approach.


Open Biology ◽  
2015 ◽  
Vol 5 (1) ◽  
pp. 140197 ◽  
Author(s):  
Samuel Dean ◽  
Jack Sunter ◽  
Richard J. Wheeler ◽  
Ian Hodkinson ◽  
Eva Gluenz ◽  
...  

One of the first steps in understanding a protein's function is to determine its localization; however, the methods for localizing proteins in some systems have not kept pace with the developments in other fields, creating a bottleneck in the analysis of the large datasets that are generated in the post-genomic era. To address this, we developed tools for tagging proteins in trypanosomatids. We made a plasmid that, when coupled with long primer PCR, can be used to produce transgenes at their endogenous loci encoding proteins tagged at either terminus or within the protein coding sequence. This system can also be used to generate deletion mutants to investigate the function of different protein domains. We show that the length of homology required for successful integration precluded long primer PCR tagging in Leishmania mexicana . Hence, we developed plasmids and a fusion PCR approach to create gene tagging amplicons with sufficiently long homologous regions for targeted integration, suitable for use in trypanosomatids with less efficient homologous recombination than Trypanosoma brucei . Importantly, we have automated the primer design, developed universal PCR conditions and optimized the workflow to make this system reliable, efficient and scalable such that whole genome tagging is now an achievable goal.


2021 ◽  
Vol 12 ◽  
Author(s):  
Fenghua Tian ◽  
Changtian Li ◽  
Yu Li

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä &amp; G.F. Qin) T. Saito, Tonouchi &amp; T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.


Sign in / Sign up

Export Citation Format

Share Document