AlignmentViewer: Sequence Analysis of Large Protein Families

AlignmentViewer is a web-based tool to view and analyze multiple sequence alignments of protein families. The particular strengths of AlignmentViewer include flexible visualization at different scales as well as analysis of conservation patterns and of the distribution of proteins in sequence space. The tool is directly accessible in web browsers without the need for software installation. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, e.g. via EVcouplings.org.

Download Full-text

AlignmentViewer: Sequence Analysis of Large Protein Families

10.1101/269720 ◽

2018 ◽

Cited By ~ 1

Author(s):

Roc Reguant ◽

Yevgeniy Antipin ◽

Rob Sheridan ◽

Augustin Luna ◽

Chris Sander

Keyword(s):

Open Source Software ◽

Source Code ◽

Web Browsers ◽

Protein Families ◽

Large Protein ◽

Multiple Sequence ◽

Internet Connection ◽

Visualization Analysis ◽

Link Type ◽

Evolutionary Coupling

AbstractSummaryAlignmentViewer is multiple sequence alignment viewer for protein families with flexible visualization, analysis tools and links to protein family databases. It is directly accessible in web browsers without the need for software installation, as it is implemented in JavaScript, and does not require an internet connection to function. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, facilitating the computation of protein 3D structures and the detection of functionally constrained interactions.Availability and ImplementationAlignmentViewer is open source software under the MIT license. The viewer is at http://alignmentviewer.org and the source code, documentation and issue tracking, for co-development, are at https://github.com/dfci/[email protected], reaches all authors

Download Full-text

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Algorithms for Molecular Biology ◽

10.1186/1748-7188-5-24 ◽

2010 ◽

Vol 5 (1) ◽

pp. 24 ◽

Cited By ~ 4

Author(s):

Darío Guerrero ◽

Rocío Bautista ◽

David P Villalobos ◽

Francisco R Cantón ◽

M Gonzalo Claros

Keyword(s):

Sequence Alignments ◽

Multiple Sequence ◽

Web Based ◽

Conserved Sequences ◽

Multiple Sequence Alignments

Download Full-text

Neural Potts Model

10.1101/2021.04.08.439084 ◽

2021 ◽

Author(s):

Tom Sercu ◽

Robert Verkuil ◽

Joshua Meier ◽

Brandon Amos ◽

Zeming Lin ◽

...

Keyword(s):

Potts Model ◽

Optimization Problem ◽

Energy Landscapes ◽

Protein Families ◽

Single Model ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Multiple Protein ◽

Ablation Experiment

We propose the Neural Potts Model objective as an amortized optimization problem. The objective enables training a single model with shared parameters to explicitly model energy landscapes across multiple protein families. Given a protein sequence as input, the model is trained to predict a pairwise coupling matrix for a Potts model energy function describing the local evolutionary landscape of the sequence. Couplings can be predicted for novel sequences. A controlled ablation experiment assessing unsupervised contact prediction on sets of related protein families finds a gain from amortization for low-depth multiple sequence alignments; the result is then confirmed on a database with broad coverage of protein sequences.

Download Full-text

JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures

Bioinformatics ◽

10.1093/bioinformatics/btr688 ◽

2011 ◽

Vol 28 (4) ◽

pp. 584-586 ◽

Cited By ~ 10

Author(s):

T. Muth ◽

J. A. Garcia-Martin ◽

A. Rausell ◽

D. Juan ◽

A. Valencia ◽

...

Keyword(s):

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Conservation Patterns

Download Full-text

NX4: a web-based visualization of large multiple sequence alignments

Bioinformatics ◽

10.1093/bioinformatics/btz457 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4800-4802

Author(s):

A Solano-Roman ◽

C Cruz-Castillo ◽

D Offenhuber ◽

A Colubri

Keyword(s):

Large Scale ◽

Supplementary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

High Genetic Diversity ◽

Web Based ◽

Multiple Sequence Alignments ◽

Line Chart ◽

Sequence Logos ◽

Scalable Analysis

Abstract Summary Multiple Sequence Alignments (MSAs) are a fundamental operation in genome analysis. However, MSA visualizations such as sequence logos and matrix representations have changed little since the nineties and are not well suited for displaying large-scale alignments. We propose a novel, web-based MSA visualization tool called NX4, which can handle genome alignments comprising thousands of sequences. NX4 calculates the frequency of each nucleotide along the alignment and visually summarizes the results using a color-blind friendly palette that helps identifying regions of high genetic diversity. NX4 also provides the user with additional assistance in finding these regions with a ‘focus + context’ mechanism that uses a line chart of the Shannon entropy across the alignment. The tool offers geneticists an easy-to-use and scalable analysis for large MSA studies. Availability and implementation NX4 is freely available at https://www.nx4.io, and its source code at https://github.com/NX4/nx4. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

phylogatR: Phylogeographic data aggregation and repurposing

10.1101/2021.10.11.461680 ◽

2021 ◽

Author(s):

Tara A Pelletier ◽

Danielle Parsons ◽

Sydney Decker ◽

Stephanie Crouch ◽

Eric Franz ◽

...

Keyword(s):

Evolutionary Biology ◽

Genetic Data ◽

Sequence Alignments ◽

Multiple Sequence ◽

Web Based ◽

Multiple Sequence Alignments ◽

History Of ◽

Data Points ◽

Meta Analyses ◽

Existing Data

Patterns of genetic diversity within species contain information about the history of that species, including how they have responded to historical climate change and how easily the organism is able to disperse across its habitat. More than 40,000 phylogeographic and population genetic investigations have been published to date, each collecting genetic data from hundreds of samples. Despite these millions of data points, meta-analyses are challenging because the synthesis of results across hundreds of studies, each using different methods and forms of analysis, is a daunting and time-consuming task. It is more efficient to proceed by repurposing existing data and using automated data analysis. To facilitate data repurposing, we created a database (phylogatR) that aggregates data from different sources and conducts automated multiple sequence alignments and data curation to provide users with nearly ready-to-analyze sets of data for thousands of species. Two types of scientific research will be made easier by phylogatR, large meta-analyses of thousands of species that can address classic questions in evolutionary biology and ecology and student- or citizen- science based investigations that will introduce a broad range of people to the analysis of genetic data. phylogatR enhances the value of existing data via the creation of software and web-based tools that enable these data to be recycled and reanalyzed and increase accessibility to big data for research labs and classroom instructors with limited computational expertise and resources.

Download Full-text

Faculty Opinions recommendation of Evolutionary profiles from the QR factorization of multiple sequence alignments.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1024515.296730 ◽

2005 ◽

Author(s):

Anne-Catherine Dock-Bregeon

Keyword(s):

Qr Factorization ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Faculty Opinions recommendation of Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732011981.793542976 ◽

2018 ◽

Author(s):

Chandra Verma ◽

Suryani Lukman

Keyword(s):

Machine Learning ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments

Download Full-text

Positive natural selection in primate genes of the type I interferon response

BMC Ecology and Evolution ◽

10.1186/s12862-021-01783-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Elena N. Judd ◽

Alison R. Gilchrist ◽

Nicholas R. Meyerson ◽

Sara L. Sawyer

Keyword(s):

Natural Selection ◽

Positive Selection ◽

Type I Interferon ◽

Interferon Response ◽

Type I ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Interferon Stimulated Genes ◽

Interferon Induction

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.

Download Full-text