AlphaFold at CASP13

Mohammed AlQuraishi

doi:10.1093/bioinformatics/btz422

AlphaFold at CASP13

Bioinformatics ◽

10.1093/bioinformatics/btz422 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4862-4865 ◽

Cited By ~ 48

Author(s):

Mohammed AlQuraishi

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Computational Prediction ◽

Data Bank ◽

Academic Community ◽

Physical Contact ◽

Evolutionary Analysis ◽

History Of ◽

First Time

Abstract Summary: Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.

Download Full-text

Genetic characterization of a novel picornavirus in Algerian bats: co-evolution analysis of bat-related picornaviruses

Scientific Reports ◽

10.1038/s41598-019-52209-2 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Safia Zeghbib ◽

Róbert Herczeg ◽

Gábor Kemenesi ◽

Brigitta Zana ◽

Kornélia Kurucz ◽

...

Keyword(s):

Geographical Location ◽

Data Sets ◽

Evolutionary Analysis ◽

Evolutionary Patterns ◽

Host Genus ◽

Evolution Analysis ◽

Zoonotic Viruses ◽

History Of ◽

First Time

Abstract Bats are reservoirs of numerous zoonotic viruses. The Picornaviridae family comprises important pathogens which may infect both humans and animals. In this study, a bat-related picornavirus was detected from Algerian Minioptreus schreibersii bats for the first time in the country. Molecular analyses revealed the new virus originates to the Mischivirus genus. In the operational use of the acquired sequence and all available data regarding bat picornaviruses, we performed a co-evolutionary analysis of mischiviruses and their hosts, to authentically reveal evolutionary patterns within this genus. Based on this analysis, we enlarged the dataset, and examined the co-evolutionary history of all bat-related picornaviruses including their hosts, to effectively compile all possible species jumping events during their evolution. Furthermore, we explored the phylogeny association with geographical location, host-genus and host-species in both data sets.

Download Full-text

LZerD Protein-Protein Docking Webserver Enhanced With de novo Structure Prediction

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.724947 ◽

2021 ◽

Vol 8 ◽

Author(s):

Charles Christoffer ◽

Vijay Bharadwaj ◽

Ryan Luu ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Protein Complexes ◽

Protein Sequences ◽

Data Bank ◽

Protein Docking ◽

Functional Mechanisms ◽

Established Technique

Protein-protein docking is a useful tool for modeling the structures of protein complexes that have yet to be experimentally determined. Understanding the structures of protein complexes is a key component for formulating hypotheses in biophysics regarding the functional mechanisms of complexes. Protein-protein docking is an established technique for cases where the structures of the subunits have been determined. While the number of known structures deposited in the Protein Data Bank is increasing, there are still many cases where the structures of individual proteins that users want to dock are not determined yet. Here, we have integrated the AttentiveDist method for protein structure prediction into our LZerD webserver for protein-protein docking, which enables users to simply submit protein sequences and obtain full-complex atomic models, without having to supply any structure themselves. We have further extended the LZerD docking interface with a symmetrical homodimer mode. The LZerD server is available at https://lzerd.kiharalab.org/.

Download Full-text

Motif Discovery in Protein Structure Databases

Pattern Discovery in Biomolecular Data ◽

10.1093/oso/9780195119404.003.0011 ◽

1999 ◽

Author(s):

Janice Glasgow ◽

Evan Steeg

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Sequence Motifs ◽

Computational Molecular Biology ◽

X Ray ◽

X Ray Crystallography ◽

Efficiency And Effectiveness ◽

Structure Prediction Program ◽

Automated Discovery

The field of knowledge discovery is concerned with the theory and processes involved in the representation and extraction of patterns or motifs from large databases. Discovered patterns can be used to group data into meaningful classes, to summarize data, or to reveal deviant entries. Motifs stored in a database can be brought to bear on difficult instances of structure prediction or determination from X-ray crystallography or nuclear magnetic resonance (NMR) experiments. Automated discovery techniques are central to understanding and analyzing the rapidly expanding repositories of protein sequence and structure data. This chapter deals with the discovery of protein structure motifs. A motif is an abstraction over a set of recurring patterns observed in a dataset; it captures the essential features shared by a set of similar or related objects. In many domains, such as computer vision and speech recognition, there exist special regularities that permit such motif abstraction. In the protein science domain, the regularities derive from evolutionary and biophysical constraints on amino acid sequences and structures. The identification of a known pattern in a new protein sequence or structure permits the immediate retrieval and application of knowledge obtained from the analysis of other proteins. The discovery and manipulation of motifs—in DNA, RNA, and protein sequences and structures—is thus an important component of computational molecular biology and genome informatics. In particular, identifying protein structure classifications at varying levels of abstraction allows us to organize and increase our understanding of the rapidly growing protein structure datasets. Discovered motifs are also useful for improving the efficiency and effectiveness of X-ray crystallographic studies of proteins, for drug design, for understanding protein evolution, and ultimately for predicting the structure of proteins from sequence data. Motifs may be designed by hand, based on expert knowledge. For example, the Chou-Fasman protein secondary structure prediction program (Chou and Fasman, 1978), which dominated the field for many years, depended on the recognition of predefined, user-encoded sequence motifs for α-helices and β-sheets. Several hundred sequence motifs have been cataloged in PROSITE (Bairoch, 1992); the identification of one of these motifs in a novel protein often allows for immediate function interpretation.

Download Full-text

Improved computational methods of protein sequence alignment, model selection and tertiary structure prediction

10.32469/10355/46126 ◽

2013 ◽

Author(s):

◽

Xin Deng

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Model Selection ◽

Sequence Alignment ◽

Protein Sequence ◽

Structure Prediction ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Relative Solvent Accessibility ◽

Tertiary Structure Prediction

Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our groupâ€™s MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition.

Download Full-text

A Constant Proportion in Protein Structure

10.1101/500025 ◽

2018 ◽

Author(s):

Francisco Javier Lobo-Cabrera

Keyword(s):

Protein Structure ◽

Spatial Clustering ◽

A Priori ◽

Data Bank ◽

X Ray ◽

Protein Size ◽

Protein Prediction ◽

Quality Check ◽

First Time ◽

Secondary Structure Composition

The principles governing protein structure are largely unknown. Here, a structural proportion universal (R2 = 0.978) among proteins is reported. The model variance is shown to be independent from protein size, secondary structure composition, compactness or relative surface area. The structural characteristic under study --named here QUILLO-- quantifies residue-type spatial clustering. In this way, polar, hydrophobic, acidic and basic residues are evaluated individually and their values added up. For the analysis, all X-Ray currently determined structures deposited in the Protein Data Bank were studied. The QUILLO proportion offers for the first time an a priori protein prediction quality-check. Indeed, predictions with unexpected proportion values correspond to low ranks in the CASP12 experiment. The reason behind a specific, constant rule for protein folding remains unknown.

Download Full-text

Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2014-0024 ◽

2015 ◽

Vol 11 (1) ◽

pp. 1-7 ◽

Cited By ~ 4

Author(s):

Michal Brylinski

Keyword(s):

Protein Structure ◽

Protein Data Bank ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Structural Information ◽

Three Dimensional ◽

Data Bank ◽

Prediction Problem ◽

Three Dimensional Models ◽

Protein Structure Prediction Problem

AbstractThe Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.

Download Full-text

Population Analysis and Evolution of Saccharomyces cerevisiae Mitogenomes

Microorganisms ◽

10.3390/microorganisms8071001 ◽

2020 ◽

Vol 8 (7) ◽

pp. 1001

Author(s):

Daniel Vieira ◽

Soraia Esteves ◽

Carolina Santiago ◽

Eduardo Conde-Sousa ◽

Ticiana Fernandes ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

Yeast Species ◽

Phenotypic Diversity ◽

Population Analysis ◽

Nuclear Genome ◽

Reference Sequence ◽

Evolutionary Analysis ◽

Novel Approach ◽

History Of ◽

First Time

The study of mitogenomes allows the unraveling of some paths of yeast evolution that are often not exposed when analyzing the nuclear genome. Although both nuclear and mitochondrial genomes are known to determine phenotypic diversity and fitness, no concordance has yet established between the two, mainly regarding strains’ technological uses and/or geographical distribution. In the current work, we proposed a new method to align and analyze yeast mitogenomes, overcoming current difficulties that make it impossible to obtain comparable mitogenomes for a large number of isolates. To this end, 12,016 mitogenomes were considered, and we developed a novel approach consisting of the design of a reference sequence intended to be comparable between all mitogenomes. Subsequently, the population structure of 6646 Saccharomyces cerevisiae mitogenomes was assessed. Results revealed the existence of particular clusters associated with the technological use of the strains, in particular regarding clinical isolates, laboratory strains, and yeasts used for wine-associated activities. As far as we know, this is the first time that a positive concordance between nuclear and mitogenomes has been reported for S. cerevisiae, in terms of strains’ technological applications. The results obtained highlighted the importance of including the mtDNA genome in evolutionary analysis, in order to clarify the origin and history of yeast species.

Download Full-text

State-of-the-art web services for de novo protein structure prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbaa139 ◽

2020 ◽

Cited By ~ 1

Author(s):

Luciano A Abriata ◽

Matteo Dal Peraro

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

De Novo ◽

State Of The Art ◽

Data Bank ◽

End Users ◽

Model Quality ◽

Uncharacterized Protein

Abstract Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.

Download Full-text

CONFOLD2: Improved contact-driven ab initio protein structure modeling

10.1101/228460 ◽

2017 ◽

Cited By ~ 1

Author(s):

Badri Adhikari ◽

Jianlin Cheng

Keyword(s):

Protein Structure ◽

Ab Initio ◽

Protein Structure Prediction ◽

Protein Sequence ◽

Structure Prediction ◽

Structural Models ◽

Prediction Methods ◽

Structure Modeling ◽

Residue Contact ◽

Protein Structure Modeling

AbstractBackgroundContact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support the contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed.ResultsWe develop an improved contact-driven protein modeling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain top five models. CONFOLD2 is benchmarked on various datasets including CASP11 and 12 datasets with publicly available predicted contacts and yields better performance than the popular CONFOLD method.ConclusionCONFOLD2 allows to quickly generate top five structural models for a protein sequence, when its secondary structures and contacts predictions at hand. CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/.

Download Full-text

Protein structure prediction and biomolecular recognition: From protein sequence to peptidomimetic design with the human β 3 integrin

SAR and QSAR in Environmental Research ◽

10.1080/10629360290015961 ◽

2002 ◽

Vol 13 (3-4) ◽

pp. 473-486 ◽

Cited By ~ 1

Author(s):

R. Casadio ◽

M. Compiani ◽

A. Facchiano ◽

P. Fariselli ◽

P. Martelli ◽

...

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Protein Sequence ◽

Structure Prediction ◽

Biomolecular Recognition

Download Full-text