AlignmentViewer: Sequence Analysis of Large Protein Families

F1000Research ◽

10.12688/f1000research.22242.2 ◽

2020 ◽

Vol 9 ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Roc Reguant ◽

Yevgeniy Antipin ◽

Rob Sheridan ◽

Christian Dallago ◽

Drew Diamantoukos ◽

...

Keyword(s):

Web Browsers ◽

Protein Families ◽

Large Protein ◽

Sequence Alignments ◽

Coupling Analysis ◽

Multiple Sequence ◽

Web Based ◽

Multiple Sequence Alignments ◽

Conservation Patterns ◽

Evolutionary Coupling

AlignmentViewer is a web-based tool to view and analyze multiple sequence alignments of protein families. The particular strengths of AlignmentViewer include flexible visualization at different scales as well as analysis of conservation patterns and of the distribution of proteins in sequence space. The tool is directly accessible in web browsers without the need for software installation. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, e.g. via EVcouplings.org.

AlignmentViewer: Sequence Analysis of Large Protein Families

F1000Research ◽

10.12688/f1000research.22242.1 ◽

2020 ◽

Vol 9 ◽

pp. 213

Author(s):

Roc Reguant ◽

Yevgeniy Antipin ◽

Rob Sheridan ◽

Christian Dallago ◽

Drew Diamantoukos ◽

...

Keyword(s):

Web Browsers ◽

Protein Families ◽

Large Protein ◽

Sequence Alignments ◽

Coupling Analysis ◽

Multiple Sequence ◽

Web Based ◽

Multiple Sequence Alignments ◽

Conservation Patterns ◽

Evolutionary Coupling

AlignmentViewer is a web-based tool to view and analyze multiple sequence alignments of protein families. The particular strengths of AlignmentViewer include flexible visualization at different scales as well as analysis of conservation patterns and of the distribution of proteins in sequence space. The tool is directly accessible in web browsers without the need for software installation. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, e.g. via EVcouplings.org.

Logomaker: Beautiful sequence logos in python

10.1101/635029 ◽

2019 ◽

Cited By ~ 10

Author(s):

Ammar Tareen ◽

Justin B. Kinney

Keyword(s):

Source Code ◽

Biological Properties ◽

Programming Environment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Link Type ◽

Sequence Logos ◽

Python Programming ◽

Publication Quality

AbstractSequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA, and protein sequences, yet it is currently difficult to generate such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from any matrix-like array of numbers. Logos are rendered as vector graphics that are easy to stylize using standard matplotlib functions. Methods for creating logos from multiple-sequence alignments are also included.Availability and ImplementationLogomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Source code is available athttp://github.com/jbkinney/logomaker.Supplemental InformationDocumentation is provided athttp://[email protected].

PhyloCSF++: A fast and user-friendly implementation of PhyloCSF with annotation tools

10.1101/2021.03.10.434297 ◽

2021 ◽

Author(s):

Christopher Pockrandt ◽

Martin Steinegger ◽

Steven L. Salzberg

Keyword(s):

Source Code ◽

File Format ◽

Sequence Alignments ◽

Multiple Sequence ◽

Protein Coding ◽

Multiple Sequence Alignments ◽

Coding Regions ◽

Link Type ◽

A Genome ◽

User Friendly

AbstractSummaryPhyloCSF++ is an efficient and parallelized C++ implementation of the popular PhyloCSF method to distinguish protein-coding and non-coding regions in a genome based on multiple sequence alignments. It can score alignments or produce browser tracks for entire genomes in the wig file format. Additionally, PhyloCSF++ annotates coding sequences in GFF/GTF files using precomputed tracks or computes and scores multiple sequence alignments on the fly with MMseqs.AvailabilityPhyloCSF++ is released under the AGPLv3 license. Binaries and source code are available at https://github.com/cpockrandt/PhyloCSFpp. The software can be installed through bioconda. A variety of tracks can be accessed through ftp://ftp.ccb.jhu.edu/pub/software/phylocsf++/[email protected], [email protected]

ZGA: a flexible pipeline for read processing, de novo assembly and annotation of prokaryotic genomes

10.1101/2021.04.27.441618 ◽

2021 ◽

Author(s):

A.A. Korzhenkov

Keyword(s):

Genome Sequencing ◽

De Novo ◽

Wide Spectrum ◽

Source Code ◽

Routine Method ◽

Genome Sequences ◽

Bioinformatic Pipeline ◽

Internet Connection ◽

Link Type ◽

Prokaryotic Genomes

AbstractWhole genome sequencing (WGS) became a routine method in modern days and may be applied to study a wide spectrum of scientific problems. Despite increasing availability of genome sequencing by itself, genome assembly and annotation could be a challenge for an inexperienced researcher. To solve this problem, a bioinformatic pipeline was developed to conduct a user from raw sequencing reads to annotated bacterial or archaeal genome ready for deposition to any INSDC database as NCBI, ENA or DDBJ. The pipeline is fully automated and doesn’t require internet connection after installation which prevents data leakage and premature publication of genome sequences. The source code of the pipeline is freely available at https://github.com/laxeye/zga/. The software may be installed from popular repositories: Anaconda Cloud (https://anaconda.org/bioconda/zga/) and PyPI (https://pypi.org/project/zga/).

From Sequence to Function: Coevolving Amino Acids Encode Structural and Functional Domains

10.1101/109397 ◽

2017 ◽

Author(s):

Daniele Granata ◽

Luca Ponzoni ◽

Cristian Micheletti ◽

Vincenzo Carnevale

Keyword(s):

Amino Acids ◽

Large Scale ◽

Sequence Database ◽

Native Structure ◽

Protein Families ◽

Large Protein ◽

Functional Dynamics ◽

Analysis Strategy ◽

Inference Methods ◽

Evolutionary Coupling

Amino acids interactions within protein families are so optimized that the sole analysis of evolutionary co-mutations can identify pairs of contacting residues. It is also known that evolution conserves functional dynamics, i.e., the concerted motion or displacement of large protein regions or domains. Is it, therefore, possible to use a pure sequence-based analysis to identify these dynamical domains? To address this question, we introduce here a general co-evolutionary coupling analysis strategy and apply it to a curated sequence database of hundreds of protein families. For most families, the sequence-based method partitions amino acids into few clusters. When viewed in the context of the native structure, these clusters have the signature characteristics of viable protein domains: they are spatially separated but individually compact. They have a direct functional bearings too, as shown for various reference cases. We conclude that even large-scale structural and functionally-related properties can be recovered from inference methods applied to evolutionary-related sequences. The method introduced here is available as a software package and web server (http://spectrus.sissa.it/spectrus-evo_webserver).

Accurate contact predictions for thousands of protein families using PconsC3

10.1101/079673 ◽

2016 ◽

Cited By ~ 1

Author(s):

Marcin J. Skwark ◽

Mirco Michel ◽

David Menéndez Hurtado ◽

Magnus Ekeberg ◽

Arne Elofsson

Keyword(s):

Structure Prediction ◽

De Novo ◽

Pfam Domain ◽

Three Dimensional ◽

Improved Method ◽

Protein Families ◽

Large Protein ◽

Multiple Sequence ◽

Residue Contact ◽

Contact Predictions

Protein structure prediction was for decades one of the grand unsolved challenges in bioinformatics. A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it was possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for existing contact prediction methods. To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, an improved method for protein contact predictions that can be used for families with as little as 100 effective sequence members. We estimate that PconsC3 provides accurate contact predictions for up to 4646 Pfam domain families. In addition, PconsC3 outperforms previous methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. This improvement translates into improved de-novo prediction of three-dimensional structures. PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2.

Open Plot Project: an open-source toolkit for 3-D structural data analysis

Solid Earth ◽

10.5194/se-2-53-2011 ◽

2011 ◽

Vol 2 (1) ◽

pp. 53-63 ◽

Cited By ~ 18

Author(s):

S. Tavani ◽

P. Arbues ◽

M. Snidero ◽

N. Carrera ◽

J. A. Muñoz

Keyword(s):

Spatial Distribution ◽

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Source Code ◽

Structural Data ◽

Geological Modelling ◽

Analysis Tools ◽

Transect Analysis ◽

Selection Of

Abstract. In this work we present the Open Plot Project, an open-source software for structural data analysis, including a 3-D environment. The software includes many classical functionalities of structural data analysis tools, like stereoplot, contouring, tensorial regression, scatterplots, histograms and transect analysis. In addition, efficient filtering tools are present allowing the selection of data according to their attributes, including spatial distribution and orientation. This first alpha release represents a stand-alone toolkit for structural data analysis. The presence of a 3-D environment with digitalising tools allows the integration of structural data with information extracted from georeferenced images to produce structurally validated dip domains. This, coupled with many import/export facilities, allows easy incorporation of structural analyses in workflows for 3-D geological modelling. Accordingly, Open Plot Project also candidates as a structural add-on for 3-D geological modelling software. The software (for both Windows and Linux O.S.), the User Manual, a set of example movies (complementary to the User Manual), and the source code are provided as Supplement. We intend the publication of the source code to set the foundation for free, public software that, hopefully, the structural geologists' community will use, modify, and implement. The creation of additional public controls/tools is strongly encouraged.

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

GalaxyCloudRunner: enhancing scalable computing for Galaxy

10.1101/2020.05.28.121772 ◽

2020 ◽

Author(s):

N Goonasekera ◽

A Mahmoud ◽

J Chilton ◽

E Afgan

Keyword(s):

Source Code ◽

Supplementary Information ◽

Scalable Computing ◽

Link Type ◽

Cloud Providers ◽

Galaxy Server ◽

Cloud Resources

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone