sequence profiles
Recently Published Documents


TOTAL DOCUMENTS

81
(FIVE YEARS 25)

H-INDEX

24
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Jaspreet Singh ◽  
Kuldip Paliwal ◽  
Jaswinder Singh ◽  
Yaoqi Zhou

Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a combination of traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) allows a leap in accuracy over single-sequence based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers. This large improvement leads to an accuracy comparable to or better than the current state-of-the-art techniques for predicting these 1D structural properties based on sequence profiles generated from multiple sequence alignments. The high-accuracy prediction in both secondary and tertiary structural properties indicates that it is possible to make highly accurate prediction of protein structures without homologous sequences, the remaining obstacle in the post AlphaFold2 era.


2021 ◽  
Author(s):  
Negin Sadat Babaiha ◽  
Rosa Aghdam ◽  
Changiz Eslahchi

AbstractLocalization of messenger RNAs (mRNA) as a widely observed phenomenon is considered as an efficient way to target proteins to a specific region of a cell and is also known as a strategy for gene regulation. The importance of correct intracellular RNA placement in the development of embryonic and neural dendrites has long been demonstrated in former studies. Improper localization of RNA in the cell, which has been shown to occur due to a variety of reasons, including mutations in trans-regulatory elements, is also associated with the occurrence of some neuromuscular diseases as well as cancer. We propose NN-RNALoc, a neural network-based model to predict the cellular location of mRNAs. The features extracted from mRNA sequences along with the information gathered from their proteins are fed to this prediction model. We introduce a novel distance-based sub-sequence profile for representation of RNA sequences which is more memory and time efficient and comparying to the k-mer frequencies, can possibly better encode sequences when the distance k increases. The performance of NN-RNALoc on the following benchmark datsets CeFra-seq and RNALocate, is compared to the results achieved by two powerful prediction models that were proposed in former studies named as mRNALoc and RNATracker The results reveal that the employment of protein-protein interaction information, which plays a crucial role in many biological functions, together with the novel distance-based sub-sequence profiles of mRNA sequences, leads to a more accurate prediction model. Besides, NN-RNALoc significantly reduces the required computing time compared to previous studies. Source code and data used in this study are available at: https://github.com/NeginBabaiha/NN-RNALoc


Biomolecules ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 1337
Author(s):  
Ruiyang Song ◽  
Baixin Cao ◽  
Zhenling Peng ◽  
Christopher J. Oldfield ◽  
Lukasz Kurgan ◽  
...  

Non-synonymous single nucleotide polymorphisms (nsSNPs) may result in pathogenic changes that are associated with human diseases. Accurate prediction of these deleterious nsSNPs is in high demand. The existing predictors of deleterious nsSNPs secure modest levels of predictive performance, leaving room for improvements. We propose a new sequence-based predictor, DMBS, which addresses the need to improve the predictive quality. The design of DMBS relies on the observation that the deleterious mutations are likely to occur at the highly conserved and functionally important positions in the protein sequence. Correspondingly, we introduce two innovative components. First, we improve the estimates of the conservation computed from the multiple sequence profiles based on two complementary databases and two complementary alignment algorithms. Second, we utilize putative annotations of functional/binding residues produced by two state-of-the-art sequence-based methods. These inputs are processed by a random forests model that provides favorable predictive performance when empirically compared against five other machine-learning algorithms. Empirical results on four benchmark datasets reveal that DMBS achieves AUC > 0.94, outperforming current methods, including protein structure-based approaches. In particular, DMBS secures AUC = 0.97 for the SNPdbe and ExoVar datasets, compared to AUC = 0.70 and 0.88, respectively, that were obtained by the best available methods. Further tests on the independent HumVar dataset shows that our method significantly outperforms the state-of-the-art method SNPdryad. We conclude that DMBS provides accurate predictions that can effectively guide wet-lab experiments in a high-throughput manner.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Gang Hu ◽  
Akila Katuwawala ◽  
Kui Wang ◽  
Zhonghua Wu ◽  
Sina Ghadermarzi ◽  
...  

AbstractIdentification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn’s webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Javier Lopez-Ibañez ◽  
Florencio Pazos ◽  
Monica Chagoyen

Abstract Background Assignment of chemical compounds to biological pathways is a crucial step to understand the relationship between the chemical repertory of an organism and its biology. Protein sequence profiles are very successful in capturing the main structural and functional features of a protein family, and can be used to assign new members to it based on matching of their sequences against these profiles. In this work, we extend this idea to chemical compounds, constructing a profile-inspired model for a set of related metabolites (those in the same biological pathway), based on a fragment-based vectorial representation of their chemical structures. Results We use this representation to predict the biological pathway of a chemical compound with good overall accuracy (AUC 0.74–0.90 depending on the database tested), and analyzed some factors that affect performance. The approach, which is compared with equivalent methods, can in addition detect those molecular fragments characteristic of a pathway. Conclusions The method is available as a graphical interactive web server http://csbg.cnb.csic.es/iFragMent.


2021 ◽  
Vol 9 (4) ◽  
pp. 855
Author(s):  
Tesfaye Rufael Chibssa ◽  
Yang Liu ◽  
Melaku Sombo ◽  
Jacqueline Kasiiti Lichoti ◽  
Janchivdorj Erdenebaatar ◽  
...  

Goatpox virus (GTPV) belongs to the genus Capripoxvirus, together with sheeppox virus (SPPV) and lumpy skin disease virus (LSDV). GTPV primarily affects sheep, goats and some wild ruminants. Although GTPV is only present in Africa and Asia, the recent spread of LSDV in Europe and Asia shows capripoxviruses could escape their traditional geographical regions to cause severe outbreaks in new areas. Therefore, it is crucial to develop effective source tracing of capripoxvirus infections. Earlier, conventional phylogenetic methods, based on limited samples, identified three different nucleotide sequence profiles in the G-protein-coupled chemokine receptor (GPCR) gene of GTPVs. However, this method did not differentiate GTPV strains by their geographical origins. We have sequenced the GPCR gene of additional GTPVs and analyzed them with publicly available sequences, using conventional alignment-based methods and an alignment-free approach exploiting k-mer frequencies. Using the alignment-free method, we can now classify GTPVs based on their geographical origin: African GTPVs and Asian GTPVs, which further split into Western and Central Asian (WCA) GTPVs and Eastern and Southern Asian (ESA) GTPVs. This approach will help determine the source of introduction in GTPV emergence in disease-free regions and detect the importation of additional strains in disease-endemic areas.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lijun Mei ◽  
Jiaoli Zhou ◽  
Yimo Su ◽  
Kunhong Mao ◽  
Jing Wu ◽  
...  

Abstract Background Irritable bowel syndrome (IBS) is common and difficult to treat and its pathogenesis is closely related to gut microbiota. However, differences in gut microbiota of patients in different regions make it more difficult to elucidate the mechanism of IBS. We performed an analysis of gut microbiota composition and functional prediction in Chinese patients with diarrhea-predominant IBS (IBS-D). Methods Fecal samples were obtained from 30 IBS-D patients and 30 healthy controls (HCs) in Nanchang, China. Using 16S gene sequence profiles, we analyzed the abundance of dominant microbiota at different taxonomy levels. Based on 16S information, Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was used to predicting the function of gut microbiota. Results Compared to HCs, gut microbiota richness but not diversity was decreased in IBS-D patients. The abundant phyla Firmicutes, Fusobacteria and Actinobacteria decreased significantly, and Proteobacteria increased significantly in IBS-D patients. PICRUSt indicated that function expression of gut microbiota in IBS-D patients was up-regulated in metabolism of cofactors and vitamins, xenobiotics biodegradation and metabolism, and down-regulated in environmental adaptation, cell growth and death. Conclusions Compared with the normal population in China, IBS-D patients are characterized by complex and unstable gut microbiota, which may influence inflammation and metabolism of the host.


2021 ◽  
Author(s):  
Javier Lopez-Ibañez ◽  
Florencio Pazos ◽  
Monica Chagoyen

AbstractAssignment of chemical compounds to biological pathways is a crucial step to understand the relationship between the chemical repertory of an organism and its biology. Protein sequence profiles are very successful in capturing the main structural and functional features of a protein family, and can be used to assign new members to it based on matching of their sequences against these profiles. In this work, we extend this idea to chemical compounds, constructing a profile-inspired model for a set of related metabolites (those in the same biological pathway), based on a fragment-based vectorial representation of their chemical structures. We use this representation to predict the biological pathway of a chemical compound with good overall accuracy (AUC 0.74-0.90 depending on the database tested), and analyzed some factors that affect performance. The approach, which is compared with equivalent methods, can in addition detect those molecular fragments characteristic of a pathway. The method is available as a graphical interactive web server http://csbg.cnb.csic.es/iFragMent


Author(s):  
Danilo Gullotto

Abstract In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.


Sign in / Sign up

Export Citation Format

Share Document