average sequence identity Latest Research Papers

Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and Implementation Liftoff can be installed via bioconda and PyPI. Additionally, the source code for Liftoff is available at https://github.com/agshumate/Liftoff Supplementary information Supplementary data are available at Bioinformatics online.

Intruder (DD38E), a recently evolved sibling family of DD34E/Tc1 transposons in animals

Mobile DNA ◽

10.1186/s13100-020-00227-7 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Bo Gao ◽

Wencheng Zong ◽

Csaba Miskey ◽

Numan Ullah ◽

Mohamed Diaby ◽

...

Keyword(s):

Horizontal Transfer ◽

Amino Acid Residues ◽

Animal Kingdom ◽

Transposase Gene ◽

High Identity ◽

Sequence Identity ◽

Natural Isolates ◽

Average Sequence Identity ◽

Catalytic Amino Acid ◽

Sibling Family

Abstract Background A family of Tc1/mariner transposons with a characteristic DD38E triad of catalytic amino acid residues, named Intruder (IT), was previously discovered in sturgeon genomes, but their evolutionary landscapes remain largely unknown. Results Here, we comprehensively investigated the evolutionary profiles of ITs, and evaluated their cut-and-paste activities in cells. ITs exhibited a narrow taxonomic distribution pattern in the animal kingdom, with invasions into two invertebrate phyla (Arthropoda and Cnidaria) and three vertebrate lineages (Actinopterygii, Agnatha, and Anura): very similar to that of the DD36E/IC family. Some animal orders and species seem to be more hospitable to Tc1/mariner transposons, one order of Amphibia and seven Actinopterygian orders are the most common orders with horizontal transfer events and have been invaded by all four families (DD38E/IT, DD35E/TR, DD36E/IC and DD37E/TRT) of Tc1/mariner transposons, and eight Actinopterygii species were identified as the major hosts of these families. Intact ITs have a total length of 1.5–1.7 kb containing a transposase gene flanked by terminal inverted repeats (TIRs). The phylogenetic tree and sequence identity showed that IT transposases were most closely related to DD34E/Tc1. ITs have been involved in multiple events of horizontal transfer in vertebrates and have invaded most lineages recently (< 5 million years ago) based on insertion age analysis. Accordingly, ITs presented high average sequence identity (86–95%) across most vertebrate species, suggesting that some are putatively active. ITs can transpose in human HeLa cells, and the transposition efficiency of consensus TIRs was higher than that of the TIRs of natural isolates. Conclusions We conclude that DD38E/IT originated from DD34E/Tc1 and can be detected in two invertebrate phyla (Arthropoda and Cnidaria), and in three vertebrate lineages (Actinopterygii, Agnatha and Anura). IT has experienced multiple HT events in animals, dominated by recent amplifications in most species and has high identity among vertebrate taxa. Our reconstructed IT transposon vector designed according to the sequence from the “cat” genome showed high cut-and-paste activity. The data suggest that IT has been acquired recently and is active in many species. This study is meaningful for understanding the evolution of the Tc1/mariner superfamily members and their hosts.

Genetic Diversity of Serine Protease Inhibitors in Myxozoan (Cnidaria, Myxozoa) Fish Parasites

Microorganisms ◽

10.3390/microorganisms8101502 ◽

2020 ◽

Vol 8 (10) ◽

pp. 1502

Author(s):

Edit Eszterbauer ◽

Dóra Sipos ◽

Győző L. Kaján ◽

Dóra Szegő ◽

Ivan Fiala ◽

...

Keyword(s):

Serine Protease ◽

Protease Inhibitors ◽

High Throughput Sequencing ◽

Fish Parasites ◽

Serine Protease Inhibitors ◽

Animal Group ◽

Protein Functions ◽

First Time ◽

Average Sequence Identity ◽

Novel Protein

We studied the genetic variability of serine protease inhibitors (serpins) of Myxozoa, microscopic endoparasites of fish. Myxozoans affect the health of both farmed and wild fish populations, causing diseases and mortalities. Despite their global impact, no effective protection exists against these parasites. Serpins were reported as important factors for host invasion and immune evasion, and as promising targets for the development of antiparasitic therapies. For the first time, we identified and aligned serpin sequences from high throughput sequencing datasets of ten myxozoan species, and analyzed 146 serpins from this parasite group together with those of other taxa phylogenetically, to explore their relationship and origins. High intra- and interspecific variability was detected among the examined serpins. The average sequence identity was 25–30% only. The conserved domains (i.e., motif and signature) showed taxon-level differences. Serpins clustered according to taxonomy rather than to serpin types, and myxozoan serpins seemed to be highly divergent from that of other taxa. None of them clustered with their closest relative free-living cnidarians. The genetic distinction of myxozoan serpins further strengthens the idea of an independent origin of Myxozoa, and may indicate novel protein functions potentially related to parasitism in this animal group.

Multi-Laboratory Comparison of Next-Generation to Sanger-Based Sequencing for HIV-1 Drug Resistance Genotyping

Viruses ◽

10.3390/v12070694 ◽

2020 ◽

Vol 12 (7) ◽

pp. 694 ◽

Cited By ~ 3

Author(s):

Neil T. Parkin ◽

Santiago Avila-Rios ◽

David F. Bibby ◽

Chanson J. Brumme ◽

Susan H. Eshleman ◽

...

Keyword(s):

Drug Resistance ◽

Illumina Miseq ◽

Next Generation ◽

Sequence Comparisons ◽

Consensus Sequences ◽

Drug Resistance Genotyping ◽

Next Generation Sequencing Ngs ◽

Sequence Quality ◽

Average Sequence Identity ◽

Hiv 1

Next-generation sequencing (NGS) is increasingly used for HIV-1 drug resistance genotyping. NGS methods have the potential for a more sensitive detection of low-abundance variants (LAV) compared to standard Sanger sequencing (SS) methods. A standardized threshold for reporting LAV that generates data comparable to those derived from SS is needed to allow for the comparability of data from laboratories using NGS and SS. Ten HIV-1 specimens were tested in ten laboratories using Illumina MiSeq-based methods. The consensus sequences for each specimen using LAV thresholds of 5%, 10%, 15%, and 20% were compared to each other and to the consensus of the SS sequences (protease 4–99; reverse transcriptase 38–247). The concordance among laboratories’ sequences at different thresholds was evaluated by pairwise sequence comparisons. NGS sequences generated using the 20% threshold were the most similar to the SS consensus (average 99.6% identity, range 96.1–100%), compared to 15% (99.4%, 88.5–100%), 10% (99.2%, 87.4–100%), or 5% (98.5%, 86.4–100%). The average sequence identity between laboratories using thresholds of 20%, 15%, 10%, and 5% was 99.1%, 98.7%, 98.3%, and 97.3%, respectively. Using the 20% threshold, we observed an excellent agreement between NGS and SS, but significant differences at lower thresholds. Understanding how variation in NGS methods influences sequence quality is essential for NGS-based HIV-1 drug resistance genotyping.

Liftoff: an accurate gene annotation mapping tool

10.1101/2020.06.24.169680 ◽

2020 ◽

Cited By ~ 10

Author(s):

Alaina Shumate ◽

Steven L. Salzberg

Keyword(s):

Reference Genome ◽

Gene Annotation ◽

Closely Related Species ◽

Protein Coding ◽

Human Reference Genome ◽

Sequence Identity ◽

Mapping Tool ◽

Genome Assemblies ◽

Average Sequence Identity ◽

High Quality Genome

AbstractImprovements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.4% of human protein-coding genes to a chimpanzee genome assembly with 98.7% sequence identity.AvailabilityThe source code for Liftoff is available at https://github.com/agshumate/Liftoff

Frameshifting preserves key physicochemical properties of proteins

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1911203117 ◽

2020 ◽

Vol 117 (11) ◽

pp. 5907-5912 ◽

Cited By ~ 8

Author(s):

Lukas Bartonek ◽

Daniel Braun ◽

Bojan Zagrovic

Keyword(s):

Physicochemical Properties ◽

Protein Sequence ◽

Protein Sequences ◽

Protein Coding ◽

Universal Genetic Code ◽

Altered Protein ◽

Human Proteins ◽

Domains Of Life ◽

Sequence Profiles ◽

Average Sequence Identity

Frameshifts in protein coding sequences are widely perceived as resulting in either nonfunctional or even deleterious protein products. Indeed, frameshifts typically lead to markedly altered protein sequences and premature stop codons. By analyzing complete proteomes from all three domains of life, we demonstrate that, in contrast, several key physicochemical properties of protein sequences exhibit significant robustness against +1 and −1 frameshifts. In particular, we show that hydrophobicity profiles of many protein sequences remain largely invariant upon frameshifting. For example, over 2,900 human proteins exhibit a Pearson’s correlation coefficient R between the hydrophobicity profiles of the original and the +1-frameshifted variants greater than 0.7, despite an average sequence identity between the two of only 6.5% in this group. We observe a similar effect for protein sequence profiles of affinity for certain nucleobases as well as protein sequence profiles of intrinsic disorder. Finally, analysis of significance and optimality demonstrates that frameshift stability is embedded in the structure of the universal genetic code and may have contributed to shaping it. Our results suggest that frameshifting may be a powerful evolutionary mechanism for creating new proteins with vastly different sequences, yet similar physicochemical properties to the proteins from which they originate.

The Atrazine Catabolism Genes atzABC Are Widespread and Highly Conserved

Journal of Bacteriology ◽

10.1128/jb.180.7.1951-1954.1998 ◽

1998 ◽

Vol 180 (7) ◽

pp. 1951-1954 ◽

Cited By ~ 172

Author(s):

Mervyn L. de Souza ◽

Jennifer Seffernick ◽

Betsy Martinez ◽

Michael J. Sadowsky ◽

Lawrence P. Wackett

Keyword(s):

Cyanuric Acid ◽

Pairwise Comparisons ◽

Sequence Identity ◽

Catabolic Genes ◽

Degrading Bacteria ◽

Herbicide Atrazine ◽

Bacterial Genes ◽

Globally Distributed ◽

Average Sequence Identity ◽

Sequence Identities

ABSTRACT Pseudomonas strain ADP metabolizes the herbicide atrazine via three enzymatic steps, encoded by the genesatzABC, to yield cyanuric acid, a nitrogen source for many bacteria. Here, we show that five geographically distinct atrazine-degrading bacteria contain genes homologous toatzA, -B, and -C. The sequence identities of the atz genes from different atrazine-degrading bacteria were greater than 99% in all pairwise comparisons. This differs from bacterial genes involved in the catabolism of other chlorinated compounds, for which the average sequence identity in pairwise comparisons of the known members of a class ranged from 25 to 56%. Our results indicate that globally distributed atrazine-catabolic genes are highly conserved in diverse genera of bacteria.

average sequence identity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Liftoff: accurate mapping of gene annotations

Intruder (DD38E), a recently evolved sibling family of DD34E/Tc1 transposons in animals

Genetic Diversity of Serine Protease Inhibitors in Myxozoan (Cnidaria, Myxozoa) Fish Parasites

Multi-Laboratory Comparison of Next-Generation to Sanger-Based Sequencing for HIV-1 Drug Resistance Genotyping

Liftoff: an accurate gene annotation mapping tool

Frameshifting preserves key physicochemical properties of proteins

The Atrazine Catabolism Genes atzABC Are Widespread and Highly Conserved

Export Citation Format

average sequence identityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Liftoff: accurate mapping of gene annotations

Intruder (DD38E), a recently evolved sibling family of DD34E/Tc1 transposons in animals

Genetic Diversity of Serine Protease Inhibitors in Myxozoan (Cnidaria, Myxozoa) Fish Parasites

Multi-Laboratory Comparison of Next-Generation to Sanger-Based Sequencing for HIV-1 Drug Resistance Genotyping

Liftoff: an accurate gene annotation mapping tool

Frameshifting preserves key physicochemical properties of proteins

The Atrazine Catabolism Genes atzABC Are Widespread and Highly Conserved

average sequence identity
Recently Published Documents