protein alignments
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 15)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Vol 17 (10) ◽  
pp. e1009541
Author(s):  
Petar I. Penev ◽  
Claudia Alvarez-Carreño ◽  
Eric Smith ◽  
Anton S. Petrov ◽  
Loren Dean Williams

We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a ‘cost’ of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life.


2021 ◽  
Author(s):  
Marie Morel ◽  
Frederic Lemoine ◽  
Olivier Gascuel

Evolutionary convergences are observed at all levels, from phenotype to DNA and protein sequences, and the changes observed at these different levels tend to be strongly correlated. Here we propose a simulation-based method to detect positions under convergent evolution in large protein alignments, without prior knowledge on the phenotype and environmental constraints. A phylogeny is inferred from the data and used in simulations to estimate the expected number of amino-acid changes in stable evolutionary constraints (null model) for each position. Similarly, we count the number of mutations towards the same amino acid in the data and test if they are occurring more often than expected. We applied our method to two real datasets: HIV reverse transcriptase and fish rhodopsin, and to HIV-like simulated data. On the latter, with known convergent events and substitution model, we detected on average two third of these events, with a low fraction of false positives. With HIV data, one knows that drug resistance mutations (DRMs) are convergent. Even without any knowledge of patient treatment status, we retrieved more than 70% of positions corresponding to known DRMs. On the rhodopsin dataset, four substitutions are supposed to be convergent, as they change the maximum wavelength absorption of the photoreceptor and occurred several times independently during evolution. We detected three of them. These results demonstrate the potential of the method to target specific mutations to be further studied experimentally or, for example, using a nonsynonymous/synonymous rate ratio approach. Our software named ConDor is available at http://condor.pasteur.cloud.


2021 ◽  
Vol 18 (4) ◽  
pp. 366-368
Author(s):  
Benjamin Buchfink ◽  
Klaus Reuter ◽  
Hajk-Georg Drost

AbstractWe are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.


Mobile DNA ◽  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jessica Storer ◽  
Robert Hubley ◽  
Jeb Rosen ◽  
Travis J. Wheeler ◽  
Arian F. Smit

AbstractDfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0–3.3 releases of Dfam (https://dfam.org) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam’s new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.


2020 ◽  
Author(s):  
Jessica Storer ◽  
Robert Hubley ◽  
Jeb Rosen ◽  
Travis Wheeler ◽  
Arian F.A. Smit

Abstract The 3.0-3.2 releases of Dfam (https://dfam.org) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam’s new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.


2020 ◽  
Vol 101 (7) ◽  
pp. 735-745 ◽  
Author(s):  
Elizabeth C. Scherbatskoy ◽  
Kuttichantran Subramaniam ◽  
Lowia Al-Hussinee ◽  
Kamonchai Imnoi ◽  
Patrick M. Thompson ◽  
...  

Over the last decade, a number of USA aquaculture facilities have experienced periodic mortality events of unknown aetiology in their clownfish (Amphiprion ocellaris). Clinical signs of affected individuals included lethargy, altered body coloration, reduced body condition, tachypnea, and abnormal positioning in the water column. Samples from outbreaks were processed for routine parasitological, bacteriological, and virological diagnostic testing, but no consistent parasitic or bacterial infections were observed. Histopathological evaluation revealed individual cell necrosis and mononuclear cell inflammation in the branchial cavity, pharynx, oesophagus and/or stomach of four examined clownfish, and large basophilic inclusions within the pharyngeal mucosal epithelium of one fish. Homogenates from pooled external and internal tissues from these outbreaks were inoculated onto striped snakehead (SSN-1) cells for virus isolation and cytopathic effects were observed, resulting in monolayer lysis in the initial inoculation and upon repassage. Transmission electron microscopy of infected SSN-1 cells revealed small round particles (mean diameter=20.0–21.7 nm) within the cytoplasm, consistent with the ultrastructure of a picornavirus. Full-genome sequencing of the purified virus revealed a novel picornavirus most closely related to the bluegill picornavirus and other members of the genus Limnipivirus. Additionally, pairwise protein alignments between the clownfish picornavirus (CFPV) and other known members of the genus Limnipivirus yielded results in accordance with the current International Committee on Taxonomy of Viruses criteria for members of the same genus. Thus, CFPV represents a proposed new limnipivirus species. Future experimental challenge studies are needed to determine the role of CFPV in disease.


Author(s):  
Bui Quang Minh ◽  
Cuong Cao Dang ◽  
Le Sy Vinh ◽  
Robert Lanfear

AbstractAmino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models, however, they are typically complicated and slow. In this paper, we propose QMaker, a new ML method to estimate a general time-reversible Q matrix from a large protein dataset consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.


Author(s):  
David Cavanaugh ◽  
Krishnan Chittur

AbstractThe identification of proteins of similar structure using sequence alignment is an important problem in bioinformatics. We decribe TMATCH, a basic dynamic programming alignment algorithm which can rapidly identify proteins of similar structure from a database. TMATCH was developed to utilize an optimal hydrophobicity metric for alignments traceable to fundamental properties of amino-acids. Standard alignment algorithms use affine gap penalties as contrasted with the TMATCH algorithm adaptation of local alignment score reinforcement of favorable diagonal paths (transitions) and punishment of unfavorable transitions paired with fixed gap opening penalties. The TMATCH algorithm is especially designed to take advantage of the extra information available within the hydrophobicity scale to detect homologies, as opposed to the probabilities derived from raw percent identities.


Sign in / Sign up

Export Citation Format

Share Document