scholarly journals AlignmentViewer: Sequence Analysis of Large Protein Families

2018 ◽  
Author(s):  
Roc Reguant ◽  
Yevgeniy Antipin ◽  
Rob Sheridan ◽  
Augustin Luna ◽  
Chris Sander

AbstractSummaryAlignmentViewer is multiple sequence alignment viewer for protein families with flexible visualization, analysis tools and links to protein family databases. It is directly accessible in web browsers without the need for software installation, as it is implemented in JavaScript, and does not require an internet connection to function. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, facilitating the computation of protein 3D structures and the detection of functionally constrained interactions.Availability and ImplementationAlignmentViewer is open source software under the MIT license. The viewer is at http://alignmentviewer.org and the source code, documentation and issue tracking, for co-development, are at https://github.com/dfci/[email protected], reaches all authors

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 213 ◽  
Author(s):  
Roc Reguant ◽  
Yevgeniy Antipin ◽  
Rob Sheridan ◽  
Christian Dallago ◽  
Drew Diamantoukos ◽  
...  

AlignmentViewer is a web-based tool to view and analyze multiple sequence alignments of protein families. The particular strengths of AlignmentViewer include flexible visualization at different scales as well as analysis of conservation patterns and of the distribution of proteins in sequence space. The tool is directly accessible in web browsers without the need for software installation. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, e.g. via EVcouplings.org.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 213
Author(s):  
Roc Reguant ◽  
Yevgeniy Antipin ◽  
Rob Sheridan ◽  
Christian Dallago ◽  
Drew Diamantoukos ◽  
...  

AlignmentViewer is a web-based tool to view and analyze multiple sequence alignments of protein families. The particular strengths of AlignmentViewer include flexible visualization at different scales as well as analysis of conservation patterns and of the distribution of proteins in sequence space. The tool is directly accessible in web browsers without the need for software installation. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, e.g. via EVcouplings.org.


2019 ◽  
Author(s):  
Ammar Tareen ◽  
Justin B. Kinney

AbstractSequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA, and protein sequences, yet it is currently difficult to generate such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from any matrix-like array of numbers. Logos are rendered as vector graphics that are easy to stylize using standard matplotlib functions. Methods for creating logos from multiple-sequence alignments are also included.Availability and ImplementationLogomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Source code is available athttp://github.com/jbkinney/logomaker.Supplemental InformationDocumentation is provided athttp://[email protected].


2021 ◽  
Author(s):  
Christopher Pockrandt ◽  
Martin Steinegger ◽  
Steven L. Salzberg

AbstractSummaryPhyloCSF++ is an efficient and parallelized C++ implementation of the popular PhyloCSF method to distinguish protein-coding and non-coding regions in a genome based on multiple sequence alignments. It can score alignments or produce browser tracks for entire genomes in the wig file format. Additionally, PhyloCSF++ annotates coding sequences in GFF/GTF files using precomputed tracks or computes and scores multiple sequence alignments on the fly with MMseqs.AvailabilityPhyloCSF++ is released under the AGPLv3 license. Binaries and source code are available at https://github.com/cpockrandt/PhyloCSFpp. The software can be installed through bioconda. A variety of tracks can be accessed through ftp://ftp.ccb.jhu.edu/pub/software/phylocsf++/[email protected], [email protected]


2021 ◽  
Author(s):  
A.A. Korzhenkov

AbstractWhole genome sequencing (WGS) became a routine method in modern days and may be applied to study a wide spectrum of scientific problems. Despite increasing availability of genome sequencing by itself, genome assembly and annotation could be a challenge for an inexperienced researcher. To solve this problem, a bioinformatic pipeline was developed to conduct a user from raw sequencing reads to annotated bacterial or archaeal genome ready for deposition to any INSDC database as NCBI, ENA or DDBJ. The pipeline is fully automated and doesn’t require internet connection after installation which prevents data leakage and premature publication of genome sequences. The source code of the pipeline is freely available at https://github.com/laxeye/zga/. The software may be installed from popular repositories: Anaconda Cloud (https://anaconda.org/bioconda/zga/) and PyPI (https://pypi.org/project/zga/).


2017 ◽  
Author(s):  
Daniele Granata ◽  
Luca Ponzoni ◽  
Cristian Micheletti ◽  
Vincenzo Carnevale

Amino acids interactions within protein families are so optimized that the sole analysis of evolutionary co-mutations can identify pairs of contacting residues. It is also known that evolution conserves functional dynamics, i.e., the concerted motion or displacement of large protein regions or domains. Is it, therefore, possible to use a pure sequence-based analysis to identify these dynamical domains? To address this question, we introduce here a general co-evolutionary coupling analysis strategy and apply it to a curated sequence database of hundreds of protein families. For most families, the sequence-based method partitions amino acids into few clusters. When viewed in the context of the native structure, these clusters have the signature characteristics of viable protein domains: they are spatially separated but individually compact. They have a direct functional bearings too, as shown for various reference cases. We conclude that even large-scale structural and functionally-related properties can be recovered from inference methods applied to evolutionary-related sequences. The method introduced here is available as a software package and web server (http://spectrus.sissa.it/spectrus-evo_webserver).


2016 ◽  
Author(s):  
Marcin J. Skwark ◽  
Mirco Michel ◽  
David Menéndez Hurtado ◽  
Magnus Ekeberg ◽  
Arne Elofsson

Protein structure prediction was for decades one of the grand unsolved challenges in bioinformatics. A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it was possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for existing contact prediction methods. To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, an improved method for protein contact predictions that can be used for families with as little as 100 effective sequence members. We estimate that PconsC3 provides accurate contact predictions for up to 4646 Pfam domain families. In addition, PconsC3 outperforms previous methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. This improvement translates into improved de-novo prediction of three-dimensional structures. PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2.


Solid Earth ◽  
2011 ◽  
Vol 2 (1) ◽  
pp. 53-63 ◽  
Author(s):  
S. Tavani ◽  
P. Arbues ◽  
M. Snidero ◽  
N. Carrera ◽  
J. A. Muñoz

Abstract. In this work we present the Open Plot Project, an open-source software for structural data analysis, including a 3-D environment. The software includes many classical functionalities of structural data analysis tools, like stereoplot, contouring, tensorial regression, scatterplots, histograms and transect analysis. In addition, efficient filtering tools are present allowing the selection of data according to their attributes, including spatial distribution and orientation. This first alpha release represents a stand-alone toolkit for structural data analysis. The presence of a 3-D environment with digitalising tools allows the integration of structural data with information extracted from georeferenced images to produce structurally validated dip domains. This, coupled with many import/export facilities, allows easy incorporation of structural analyses in workflows for 3-D geological modelling. Accordingly, Open Plot Project also candidates as a structural add-on for 3-D geological modelling software. The software (for both Windows and Linux O.S.), the User Manual, a set of example movies (complementary to the User Manual), and the source code are provided as Supplement. We intend the publication of the source code to set the foundation for free, public software that, hopefully, the structural geologists' community will use, modify, and implement. The creation of additional public controls/tools is strongly encouraged.


2020 ◽  
Author(s):  
Xun Zhu ◽  
Ti-Cheng Chang ◽  
Richard Webby ◽  
Gang Wu

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.


2020 ◽  
Author(s):  
N Goonasekera ◽  
A Mahmoud ◽  
J Chilton ◽  
E Afgan

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone


Sign in / Sign up

Export Citation Format

Share Document