average sequence identity
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 6)

H-INDEX

3
(FIVE YEARS 3)

Author(s):  
Alaina Shumate ◽  
Steven L Salzberg

Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and Implementation Liftoff can be installed via bioconda and PyPI. Additionally, the source code for Liftoff is available at https://github.com/agshumate/Liftoff Supplementary information Supplementary data are available at Bioinformatics online.


Mobile DNA ◽  
2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Bo Gao ◽  
Wencheng Zong ◽  
Csaba Miskey ◽  
Numan Ullah ◽  
Mohamed Diaby ◽  
...  

Abstract Background A family of Tc1/mariner transposons with a characteristic DD38E triad of catalytic amino acid residues, named Intruder (IT), was previously discovered in sturgeon genomes, but their evolutionary landscapes remain largely unknown. Results Here, we comprehensively investigated the evolutionary profiles of ITs, and evaluated their cut-and-paste activities in cells. ITs exhibited a narrow taxonomic distribution pattern in the animal kingdom, with invasions into two invertebrate phyla (Arthropoda and Cnidaria) and three vertebrate lineages (Actinopterygii, Agnatha, and Anura): very similar to that of the DD36E/IC family. Some animal orders and species seem to be more hospitable to Tc1/mariner transposons, one order of Amphibia and seven Actinopterygian orders are the most common orders with horizontal transfer events and have been invaded by all four families (DD38E/IT, DD35E/TR, DD36E/IC and DD37E/TRT) of Tc1/mariner transposons, and eight Actinopterygii species were identified as the major hosts of these families. Intact ITs have a total length of 1.5–1.7 kb containing a transposase gene flanked by terminal inverted repeats (TIRs). The phylogenetic tree and sequence identity showed that IT transposases were most closely related to DD34E/Tc1. ITs have been involved in multiple events of horizontal transfer in vertebrates and have invaded most lineages recently (< 5 million years ago) based on insertion age analysis. Accordingly, ITs presented high average sequence identity (86–95%) across most vertebrate species, suggesting that some are putatively active. ITs can transpose in human HeLa cells, and the transposition efficiency of consensus TIRs was higher than that of the TIRs of natural isolates. Conclusions We conclude that DD38E/IT originated from DD34E/Tc1 and can be detected in two invertebrate phyla (Arthropoda and Cnidaria), and in three vertebrate lineages (Actinopterygii, Agnatha and Anura). IT has experienced multiple HT events in animals, dominated by recent amplifications in most species and has high identity among vertebrate taxa. Our reconstructed IT transposon vector designed according to the sequence from the “cat” genome showed high cut-and-paste activity. The data suggest that IT has been acquired recently and is active in many species. This study is meaningful for understanding the evolution of the Tc1/mariner superfamily members and their hosts.


2020 ◽  
Vol 8 (10) ◽  
pp. 1502
Author(s):  
Edit Eszterbauer ◽  
Dóra Sipos ◽  
Győző L. Kaján ◽  
Dóra Szegő ◽  
Ivan Fiala ◽  
...  

We studied the genetic variability of serine protease inhibitors (serpins) of Myxozoa, microscopic endoparasites of fish. Myxozoans affect the health of both farmed and wild fish populations, causing diseases and mortalities. Despite their global impact, no effective protection exists against these parasites. Serpins were reported as important factors for host invasion and immune evasion, and as promising targets for the development of antiparasitic therapies. For the first time, we identified and aligned serpin sequences from high throughput sequencing datasets of ten myxozoan species, and analyzed 146 serpins from this parasite group together with those of other taxa phylogenetically, to explore their relationship and origins. High intra- and interspecific variability was detected among the examined serpins. The average sequence identity was 25–30% only. The conserved domains (i.e., motif and signature) showed taxon-level differences. Serpins clustered according to taxonomy rather than to serpin types, and myxozoan serpins seemed to be highly divergent from that of other taxa. None of them clustered with their closest relative free-living cnidarians. The genetic distinction of myxozoan serpins further strengthens the idea of an independent origin of Myxozoa, and may indicate novel protein functions potentially related to parasitism in this animal group.


Viruses ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 694 ◽  
Author(s):  
Neil T. Parkin ◽  
Santiago Avila-Rios ◽  
David F. Bibby ◽  
Chanson J. Brumme ◽  
Susan H. Eshleman ◽  
...  

Next-generation sequencing (NGS) is increasingly used for HIV-1 drug resistance genotyping. NGS methods have the potential for a more sensitive detection of low-abundance variants (LAV) compared to standard Sanger sequencing (SS) methods. A standardized threshold for reporting LAV that generates data comparable to those derived from SS is needed to allow for the comparability of data from laboratories using NGS and SS. Ten HIV-1 specimens were tested in ten laboratories using Illumina MiSeq-based methods. The consensus sequences for each specimen using LAV thresholds of 5%, 10%, 15%, and 20% were compared to each other and to the consensus of the SS sequences (protease 4–99; reverse transcriptase 38–247). The concordance among laboratories’ sequences at different thresholds was evaluated by pairwise sequence comparisons. NGS sequences generated using the 20% threshold were the most similar to the SS consensus (average 99.6% identity, range 96.1–100%), compared to 15% (99.4%, 88.5–100%), 10% (99.2%, 87.4–100%), or 5% (98.5%, 86.4–100%). The average sequence identity between laboratories using thresholds of 20%, 15%, 10%, and 5% was 99.1%, 98.7%, 98.3%, and 97.3%, respectively. Using the 20% threshold, we observed an excellent agreement between NGS and SS, but significant differences at lower thresholds. Understanding how variation in NGS methods influences sequence quality is essential for NGS-based HIV-1 drug resistance genotyping.


Author(s):  
Alaina Shumate ◽  
Steven L. Salzberg

AbstractImprovements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.4% of human protein-coding genes to a chimpanzee genome assembly with 98.7% sequence identity.AvailabilityThe source code for Liftoff is available at https://github.com/agshumate/Liftoff


2020 ◽  
Vol 117 (11) ◽  
pp. 5907-5912 ◽  
Author(s):  
Lukas Bartonek ◽  
Daniel Braun ◽  
Bojan Zagrovic

Frameshifts in protein coding sequences are widely perceived as resulting in either nonfunctional or even deleterious protein products. Indeed, frameshifts typically lead to markedly altered protein sequences and premature stop codons. By analyzing complete proteomes from all three domains of life, we demonstrate that, in contrast, several key physicochemical properties of protein sequences exhibit significant robustness against +1 and −1 frameshifts. In particular, we show that hydrophobicity profiles of many protein sequences remain largely invariant upon frameshifting. For example, over 2,900 human proteins exhibit a Pearson’s correlation coefficient R between the hydrophobicity profiles of the original and the +1-frameshifted variants greater than 0.7, despite an average sequence identity between the two of only 6.5% in this group. We observe a similar effect for protein sequence profiles of affinity for certain nucleobases as well as protein sequence profiles of intrinsic disorder. Finally, analysis of significance and optimality demonstrates that frameshift stability is embedded in the structure of the universal genetic code and may have contributed to shaping it. Our results suggest that frameshifting may be a powerful evolutionary mechanism for creating new proteins with vastly different sequences, yet similar physicochemical properties to the proteins from which they originate.


1998 ◽  
Vol 180 (7) ◽  
pp. 1951-1954 ◽  
Author(s):  
Mervyn L. de Souza ◽  
Jennifer Seffernick ◽  
Betsy Martinez ◽  
Michael J. Sadowsky ◽  
Lawrence P. Wackett

ABSTRACT Pseudomonas strain ADP metabolizes the herbicide atrazine via three enzymatic steps, encoded by the genesatzABC, to yield cyanuric acid, a nitrogen source for many bacteria. Here, we show that five geographically distinct atrazine-degrading bacteria contain genes homologous toatzA, -B, and -C. The sequence identities of the atz genes from different atrazine-degrading bacteria were greater than 99% in all pairwise comparisons. This differs from bacterial genes involved in the catabolism of other chlorinated compounds, for which the average sequence identity in pairwise comparisons of the known members of a class ranged from 25 to 56%. Our results indicate that globally distributed atrazine-catabolic genes are highly conserved in diverse genera of bacteria.


Sign in / Sign up

Export Citation Format

Share Document