Vertebrate Alpha2,8-Sialyltransferases (ST8Sia): A Teleost Perspective

We identified and analyzed α2,8-sialyltransferases sequences among 71 ray-finned fish species to provide the first comprehensive view of the Teleost ST8Sia repertoire. This repertoire expanded over the course of Vertebrate evolution and was primarily shaped by the whole genome events R1 and R2, but not by the Teleost-specific R3. We showed that duplicated st8sia genes like st8sia7, st8sia8, and st8sia9 have disappeared from Tetrapods, whereas their orthologues were maintained in Teleosts. Furthermore, several fish species specific genome duplications account for the presence of multiple poly-α2,8-sialyltransferases in the Salmonidae (ST8Sia II-r1 and ST8Sia II-r2) and in Cyprinus carpio (ST8Sia IV-r1 and ST8Sia IV-r2). Paralogy and synteny analyses provided more relevant and solid information that enabled us to reconstruct the evolutionary history of st8sia genes in fish genomes. Our data also indicated that, while the mammalian ST8Sia family is comprised of six subfamilies forming di-, oligo-, or polymers of α2,8-linked sialic acids, the fish ST8Sia family, amounting to a total of 10 genes in fish, appears to be much more diverse and shows a patchy distribution among fish species. A focus on Salmonidae showed that (i) the two copies of st8sia2 genes have overall contrasted tissue-specific expressions, with noticeable changes when compared with human co-orthologue, and that (ii) st8sia4 is weakly expressed. Multiple sequence alignments enabled us to detect changes in the conserved polysialyltransferase domain (PSTD) of the fish sequences that could account for variable enzymatic activities. These data provide the bases for further functional studies using recombinant enzymes.

Download Full-text

TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution

10.1101/2020.11.30.405589 ◽

2020 ◽

Author(s):

Chao Zhang ◽

Yiming Zhao ◽

Edward L Braun ◽

Siavash Mirarab

Keyword(s):

Error Detection ◽

Detection Methods ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Rates Of Evolution ◽

Tree Inference ◽

Species Specific ◽

Automatic Error ◽

Automatic Algorithm

AbstractErroneous data can creep into sequence datasets for reasons ranging from contamination to annotation and alignment mistakes. These errors can reduce the accuracy of downstream analyses such as tree inference and will diminish the confidence of the community in the results even when they do not impact the analysis. As datasets keep getting larger, it has become difficult to visually check for errors, and thus, automatic error detection methods are needed more than ever before. Alignment masking methods, which are widely used, completely remove entire aligned sites. Therefore, they may reduce signal as much as or more than they reduce the noise. An alternative is designing targeted methods that look for errors in small species-specific stretches of the alignment by detecting outliers. Crucially, such a method should attempt to distinguish the real heterogeneity, which includes signal, from errors. This type of error filtering is surprisingly under-explored. In this paper, we introduce TAPER, an automatic algorithm that looks for small stretches of error in sequence alignments. Our results show that TAPER removes very little data yet finds much of the error and cleans up the alignments.

Download Full-text

SAliBASE: A Database of Simulated Protein Alignments

Evolutionary Bioinformatics ◽

10.1177/1176934318821080 ◽

2019 ◽

Vol 15 ◽

pp. 117693431882108 ◽

Cited By ~ 2

Author(s):

Muhammad Tariq Pervez ◽

Hayat Ali Shah ◽

Masroor Ellahi Babar ◽

Nasir Naveed ◽

Muhammad Shoaib

Keyword(s):

Phylogenetic Trees ◽

Evolutionary History ◽

Sequence Length ◽

End User ◽

Major Focus ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Deletion Rate ◽

Protein Alignments

Simulated alignments are alternatives to manually constructed multiple sequence alignments for evaluating performance of multiple sequence alignment tools. The importance of simulated sequences is recognized because their true evolutionary history is known, which is very helpful for reconstructing accurate phylogenetic trees and alignments. However, generating simulated alignments require expertise to use bioinformatics tools and consume several hours for reconstructing even a few hundreds of simulated sequences. It becomes a tedious job for an end user who needs a few datasets of variety of simulated sequences. Currently, there is no databank available which may help researchers to download simulated sequences/alignments for their study. Major focus of our study was to develop a database of simulated protein sequences (SAliBASE) based on different varying parameters such as insertion rate, deletion rate, sequence length, number of sequences, and indel size. Each dataset has corresponding alignment as well. This repository is very useful for evaluating multiple alignment methods.

Download Full-text

phylogatR: Phylogeographic data aggregation and repurposing

10.1101/2021.10.11.461680 ◽

2021 ◽

Author(s):

Tara A Pelletier ◽

Danielle Parsons ◽

Sydney Decker ◽

Stephanie Crouch ◽

Eric Franz ◽

...

Keyword(s):

Evolutionary Biology ◽

Genetic Data ◽

Sequence Alignments ◽

Multiple Sequence ◽

Web Based ◽

Multiple Sequence Alignments ◽

History Of ◽

Data Points ◽

Meta Analyses ◽

Existing Data

Patterns of genetic diversity within species contain information about the history of that species, including how they have responded to historical climate change and how easily the organism is able to disperse across its habitat. More than 40,000 phylogeographic and population genetic investigations have been published to date, each collecting genetic data from hundreds of samples. Despite these millions of data points, meta-analyses are challenging because the synthesis of results across hundreds of studies, each using different methods and forms of analysis, is a daunting and time-consuming task. It is more efficient to proceed by repurposing existing data and using automated data analysis. To facilitate data repurposing, we created a database (phylogatR) that aggregates data from different sources and conducts automated multiple sequence alignments and data curation to provide users with nearly ready-to-analyze sets of data for thousands of species. Two types of scientific research will be made easier by phylogatR, large meta-analyses of thousands of species that can address classic questions in evolutionary biology and ecology and student- or citizen- science based investigations that will introduce a broad range of people to the analysis of genetic data. phylogatR enhances the value of existing data via the creation of software and web-based tools that enable these data to be recycled and reanalyzed and increase accessibility to big data for research labs and classroom instructors with limited computational expertise and resources.

Download Full-text

Comparison of Arctic and Antarctic teleost haemoglobins: primary structure, function and phytogeny

Antarctic Science ◽

10.1017/s0954102004001828 ◽

2004 ◽

Vol 16 (1) ◽

pp. 59-69 ◽

Cited By ~ 6

Author(s):

CINZIA VERDE ◽

ELIO PARISI ◽

GUIDO DI PRISCO

Keyword(s):

Structure Function ◽

Cold Adaptation ◽

Evolutionary History ◽

The Arctic ◽

Sequence Alignments ◽

Multiple Sequence ◽

Root Effect ◽

Evolutionary Pathway ◽

History Of ◽

The Antarctic

Organisms living in the Arctic and Antarctic are exposed to strong environmental constraints, especially temperature. Consequently, haemoglobin evolution has included adaptations with implications at the biochemical, physiological and molecular levels. The northern and southern polar oceans have very different oceanographic characteristics. Within the study of the molecular bases of cold adaptation in fish inhabiting polar habitats, and taking advantage of the information available on haemoglobin structure and function, we analysed the evolutionary history of the α and β globins of Antarctic and Arctic haemoglobins, under the assumption of the molecular-clock hypothesis, as a basis for reconstructing the phylogenetic relationships between species. Temperate fish, including two non-Antarctic notothenioids of special evolutionary interest, were also considered. Phylogenetic analysis was performed on the multiple sequence alignments constructed with the programme Clustal X. Tree topologies indicate that the chains of Antarctic major and minor haemoglobins cluster in two well separated groups and diverged prior to cold adaptation, forming a monophyletic group. In Arctic haemoglobins, the structure/function relationship reveals important differences in comparison with Antarctic ones, indicating a distinct evolutionary pathway. The Arctic ichthyofauna (unlike the Antarctic, dominated by one taxonomically uniform group) is characterized by high diversity, reflected in the phylogeny of a given trait. The constant physico-chemical conditions of the Antarctic waters are matched by a clear grouping of fish globin sequences, whereas the variability typical of the Arctic Ocean corresponds to high sequence variation, reflected in the trees by scattered intermediate positions between the Antarctic and non-Antarctic clades. The evolutionary history of the Root effect, an important physiological feature of fish haemoglobin, was investigated. Analysis of the fate of the residues of the β chains suggested to be correlated with the Root effect indicate that they should rather be regarded as ancestral characters, inherited by some species but not by others.

Download Full-text

Faculty Opinions recommendation of Evolutionary profiles from the QR factorization of multiple sequence alignments.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1024515.296730 ◽

2005 ◽

Author(s):

Anne-Catherine Dock-Bregeon

Keyword(s):

Qr Factorization ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Faculty Opinions recommendation of Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732011981.793542976 ◽

2018 ◽

Author(s):

Chandra Verma ◽

Suryani Lukman

Keyword(s):

Machine Learning ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments

Download Full-text

Positive natural selection in primate genes of the type I interferon response

BMC Ecology and Evolution ◽

10.1186/s12862-021-01783-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Elena N. Judd ◽

Alison R. Gilchrist ◽

Nicholas R. Meyerson ◽

Sara L. Sawyer

Keyword(s):

Natural Selection ◽

Positive Selection ◽

Type I Interferon ◽

Interferon Response ◽

Type I ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Interferon Stimulated Genes ◽

Interferon Induction

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.

Download Full-text

SNN-SB: Combining Partial Alignment Using Modified SNN Algorithm with Segment-Based for Multiple Sequence Alignments

Journal of Physics Conference Series ◽

10.1088/1742-6596/1962/1/012048 ◽

2021 ◽

Vol 1962 (1) ◽

pp. 012048

Author(s):

Aziz Nasser Boraik Ali ◽

Hassan Pyar Ali Hassan ◽

Hesham Bahamish

Keyword(s):

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Partial Alignment

Download Full-text

DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

Scientific Reports ◽

10.1038/s41598-021-91827-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Farhan Quadir ◽

Raj S. Roy ◽

Randal Halfmann ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Tertiary Structure ◽

Protein Complexes ◽

Complex Structure ◽

Great Success ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Residue Contacts ◽

Evolutionary Features

AbstractDeep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.

Download Full-text

Exploratory analysis of multiple sequence alignments using phylogenies

Bioinformatics ◽

10.1093/bioinformatics/10.3.243 ◽

1994 ◽

Vol 10 (3) ◽

pp. 243-247

Author(s):

Brian Golding

Keyword(s):

Exploratory Analysis ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text