protein sequence
Recently Published Documents


TOTAL DOCUMENTS

2276
(FIVE YEARS 422)

H-INDEX

108
(FIVE YEARS 12)

2022 ◽  
Author(s):  
Edward P Harvey ◽  
Jung-Eun Shin ◽  
Meredith A Skiba ◽  
Genevieve R Nemeth ◽  
Joseph D Hurley ◽  
...  

Antibodies are essential biological research tools and important therapeutic agents, but some exhibit non-specific binding to off-target proteins and other biomolecules. Such polyreactive antibodies compromise screening pipelines, lead to incorrect and irreproducible experimental results, and are generally intractable for clinical development. We designed a set of experiments using a diverse naive synthetic camelid antibody fragment ('nanobody') library to enable machine learning models to accurately assess polyreactivity from protein sequence (AUC > 0.8). Moreover, our models provide quantitative scoring metrics that predict the effect of amino acid substitutions on polyreactivity. We experimentally tested our model's performance on three independent nanobody scaffolds, where over 90% of predicted substitutions successfully reduced polyreactivity. Importantly, the model allowed us to diminish the polyreactivity of an angiotensin II type I receptor antagonist nanobody, without compromising its pharmacological properties. We provide a companion web-server that provides a straightforward means of predicting polyreactivity and polyreactivity-reducing mutations for any given nanobody sequence.


2022 ◽  
Author(s):  
Xinhao Shao ◽  
Christopher Grams ◽  
Yu Gao

Protein structure is connected with its function and interaction and plays an extremely important role in protein characterization. As one of the most important analytical methods for protein characterization, Proteomics is widely used to determine protein composition, quantitation, interaction, and even structures. However, due to the gap between identified proteins by proteomics and available 3D structures, it was very challenging, if not impossible, to visualize proteomics results in 3D and further explore the structural aspects of proteomics experiments. Recently, two groups of researchers from DeepMind and Baker lab have independently published protein structure prediction tools that can help us obtain predicted protein structures for the whole human proteome. Although there is still debate on the validity of some of the predicted structures, it is no doubt that these represent the most accurate predictions to date. More importantly, this enabled us to visualize the majority of human proteins for the first time. To help other researchers best utilize these protein structure predictions, we present the Sequence Coverage Visualizer (SCV), http://scv.lab.gy, a web application for protein sequence coverage 3D visualization. Here we showed a few possible usages of the SCV, including the labeling of post-translational modifications and isotope labeling experiments. These results highlight the usefulness of such 3D visualization for proteomics experiments and how SCV can turn a regular result list into structural insights. Furthermore, when used together with limited proteolysis, we demonstrated that SCV can help validate and compare different protein structures, including predicted ones and existing PDB entries. By performing limited proteolysis on native proteins at various time points, SCV can visualize the progress of the digestion. This time-series data further allowed us to compare the predicted structure and existing PDB entries. Although not deterministic, these comparisons could be used to refine current predictions further and represent an important step towards a complete and correct protein structure database. Overall, SCV is a convenient and powerful tool for visualizing proteomics results.


2022 ◽  
Author(s):  
Lev I. Levitsky ◽  
Ksenia Kuznetsova ◽  
Anna A. Kliuchnikova ◽  
Irina Y. Ilina ◽  
Anton O. Goncharov ◽  
...  

Mass spectrometry-based proteome analysis usually implies matching mass spectra of proteolytic peptides to amino acid sequences predicted from nucleic acid sequences. At the same time, due to the stochastic nature of the method when it comes to proteome-wide analysis, in which only a fraction of peptides are selected for sequencing, the completeness of protein sequence identification is undermined. Likewise, the reliability of peptide variant identification in proteogenomic studies is suffering. We propose a way to interpret shotgun proteomics results, specifically in data-dependent acquisition mode, as protein sequence coverage by multiple reads, just as it is done in the field of nucleic acid sequencing for the calling of single nucleotide variants. Multiple reads for each position in a sequence could be provided by overlapping distinct peptides, thus, confirming the presence of certain amino acid residues in the overlapping stretch with much lower false discovery rate than conventional 1%. The source of overlapping distinct peptides are, first, miscleaved tryptic peptides in combination with their properly cleaved counterparts, and, second, peptides generated by several proteases with different specificities after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease proteomic datasets and our own data generated for HEK-293 cell line digests obtained using trypsin, LysC and GluC proteases. From 5000 to 8000 protein groups are identified for each digest corresponding to up to 30% of the whole proteome coverage. Most of this coverage was provided by a single read, while up to 7% of the observed protein sequences were covered two-fold and more. The proteogenomic analysis of HEK-293 cell line revealed 36 peptide variants associated with SNP, seven of which were supported by multiple reads. The efficiency of the multiple reads approach depends strongly on the depth of proteome analysis, the digesting features such as the level of miscleavages, and will increase with the number of different proteases used in parallel proteome digestion.


Author(s):  
S. Dinesh

Abstract: Homology detection plays a major role in bioinformatics. Different type of methods is used for Homology detection. Here we extract the information from protein sequences and then uses the various algorithm to predict the similarity between protein families. SVM most commonly used the algorithm in homology detection. Classification techniques are not suitable for homology detection because theyare not suitable for high dimensional datasets. Soreducing the higher dimensionality is very important than easily can predict the similarity of protein families. Keywords: Homology detection, Protein, Sequence, Reducing dimensionality, BLAST, SCOP.


2021 ◽  
Vol 27 (4) ◽  
pp. 231-238
Author(s):  
Sang Moon Kang ◽  
Yong-Seung Joun ◽  
Kee-Young Lee ◽  
Hyun Kang ◽  
Sung-Gyu Lee

Author(s):  
Roma Chandra

Protein structure prediction is one of the important goals in the area of bioinformatics and biotechnology. Prediction methods include structure prediction of both secondary and tertiary structures of protein. Protein secondary structure prediction infers knowledge related to presence of helixes, sheets and coils in a polypeptide chain whereas protein tertiary structure prediction infers knowledge related to three dimensional structures of proteins. Protein secondary structures represent the possible motifs or regular expressions represented as patterns that are predicted from primary protein sequence in the form of alpha helix, betastr and and coils. The secondary structure prediction is useful as it infers information related to the structure and function of unknown protein sequence. There are various secondary structure prediction methods used to predict about helixes, sheets and coils. Based on these methods there are various prediction tools under study. This study includes prediction of hemoglobin using various tools. The results produced inferred knowledge with reference to percentage of amino acids participating to produce helices, sheets and coils. PHD and DSC produced the best of the results out of all the tools used.


Research ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-4
Author(s):  
Qiangzhen Yang ◽  
Ali Alamdar Shah Syed ◽  
Aamir Fahira ◽  
Yongyong Shi

The spread of the latest SARS-CoV-2 variant Omicron is particularly concerning because of the large number of mutations present in its genome and lack of knowledge about how these mutations would affect the current SARS-CoV-2 vaccines and treatments. Here, by performing phylogenetic analysis using the Omicron spike (S) protein sequence, we found that the Omicron S protein presented the longest evolutionary distance in relation to the other SARS-CoV-2 variants. We predicted the structures of S, M, and N proteins of the Omicron variant using AlphaFold2 and investigated how the mutations have affected the S protein and its parts, S1 NTD and RBD, in detail. We found many amino acids on RBD were mutated, which may influence the interactions between the RBD and ACE2, while also showing the S309 antibody could still be capable of neutralizing Omicron RBD. The Omicron S1 NTD structures display significant differences from the original strain, which could lead to reduced recognition by antibodies resulting in potential immune escape and decreased effectiveness of the existing vaccines. However, this study of the Omicron variant was mainly limited to structural predictions, and these findings should be explored and verified by subsequent experiments. This study provided basic data of the Omicron protein structures that lay the groundwork for future studies related to the SARS-CoV-2 Omicron variant.


2021 ◽  
Author(s):  
Bahrad A Sokhansanj ◽  
Zhengqiao Zhao ◽  
Gail L Rosen

As the COVID-19 pandemic continues, the SARS-CoV-2 virus continues to rapidly mutate and change in ways that impact virulence, transmissibility, and immune evasion. Genome sequencing is a critical tool, as other biological techniques can be more costly, time-consuming, and difficult. However, the rapid and complex evolution of SARS-CoV-2 challenges conventional sequence analysis methods like phylogenetic analysis. The virus picks up and loses mutations independently in multiple subclades, often in novel or unexpected combinations, and, as for the newly emerged Omicron variant, sometimes with long explained branches. We propose interpretable deep sequence models trained by machine learning to complement conventional methods. We apply Transformer-based neural network models developed for natural language processing to analyze protein sequences. We add network layers to generate sample embeddings and sequence-wide attention to interpret models and visualize multiscale patterns. We demonstrate and validate our framework by modeling SARS-CoV-2 and coronavirus taxonomy. We then develop an interpretable predictive model of disease severity that integrates SARS-CoV-2 spike protein sequence and patient demographic variables, using publicly available data from the GISAID database. We also apply our model to Omicron. Based on knowledge prior to the availability of empirical data for Omicron, we predict: 1) reduced neutralization antibody activity (15-50 fold) greater than any previously characterized variant, varying between Omicron sublineages, and 2) reduced risk of severe disease (by 35-40%) relative to Delta. Both predictions are in accord with recent epidemiological and experimental data.


Plants ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 52
Author(s):  
Jie Luo ◽  
Junhao Chen ◽  
Wenlei Guo ◽  
Zhengfu Yang ◽  
Kean-Jin Lim ◽  
...  

Due to its peculiar morphological characteristics, there is dispute as to whether the genus of Annamocarya sinensis, a species of Juglandaceae, is Annamocarya or Carya. Most morphologists believe it should be distinguished from the Carya genus while genomicists suggest that A. sinensis belongs to the Carya genus. To explore the taxonomic status of A. sinensis using chloroplast genes, we collected chloroplast genomes of 16 plant species and assembled chloroplast genomes of 10 unpublished Carya species. We analyzed all 26 species’ chloroplast genomes through two analytical approaches (concatenation and coalescence), using the entire and unique chloroplast coding sequence (CDS) and entire and protein sequences. Our results indicate that the analysis of the CDS and protein sequences or unique CDS and unique protein sequence of chloroplast genomes shows that A. sinensis indeed belongs to the Carya genus. In addition, our analysis shows that, compared to single chloroplast genes, the phylogeny trees constructed using numerous genes showed higher consistency. Moreover, the phylogenetic analysis calculated with the coalescence method and unique gene sequences was more robust than that done with the concatenation method, particularly for analyzing phylogenetically controversial species. Through the analysis, our results concluded that A. sinensis should be called C. sinensis.


2021 ◽  
Vol 22 (24) ◽  
pp. 13555
Author(s):  
Mohammad Madani ◽  
Kaixiang Lin ◽  
Anna Tarakanova

Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design.


Sign in / Sign up

Export Citation Format

Share Document