scholarly journals CATH functional families predict functional sites in proteins

Author(s):  
Sayoni Das ◽  
Harry M Scholes ◽  
Neeladri Sen ◽  
Christine Orengo

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Sayoni Das ◽  
Harry M. Scholes ◽  
Christine A. Orengo

AbstractMotivationIdentification of functional sites in proteins is essential for functional characterisation, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams).ResultsFunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed all publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites.AvailabilityThe datasets and prediction models are available on [email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Joshua M Toth ◽  
Paul J DePietro ◽  
Juergen Haas ◽  
William A McLaughlin

Abstract Motivation Methods to assess the quality of protein structure models are needed for user applications. To aid with the selection of structure models and further inform the development of structure prediction techniques, we describe the ResiRole method for the assessment of the quality of structure models. Results Structure prediction techniques are ranked according to the results of round-robin, head-to-head comparisons using difference scores. Each difference score was defined as the absolute value of the cumulative probability for a functional site prediction made with the FEATURE program for the reference structure minus that for the structure model. Overall, the difference scores correlate well with other model quality metrics; and based on benchmarking studies with NaïveBLAST, they are found to detect additional local structural similarities between the structure models and reference structures. Availabilityand implementation Automated analyses of models addressed in CAMEO are available via the ResiRole server, URL http://protein.som.geisinger.edu/ResiRole/. Interactive analyses with user-provided models and reference structures are also enabled. Code is available at github.com/wamclaughlin/ResiRole. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (9) ◽  
pp. 2690-2696
Author(s):  
Jarkko Toivonen ◽  
Pratyush K Das ◽  
Jussi Taipale ◽  
Esko Ukkonen

Abstract Motivation Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. Availability and implementation Software implementation is available from https://github.com/jttoivon/moder2. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Lekha Patel ◽  
David Williamson ◽  
Dylan M Owen ◽  
Edward A K Cohen

Abstract Motivation Many recent advancements in single-molecule localization microscopy exploit the stochastic photoswitching of fluorophores to reveal complex cellular structures beyond the classical diffraction limit. However, this same stochasticity makes counting the number of molecules to high precision extremely challenging, preventing key insight into the cellular structures and processes under observation. Results Modelling the photoswitching behaviour of a fluorophore as an unobserved continuous time Markov process transitioning between a single fluorescent and multiple dark states, and fully mitigating for missed blinks and false positives, we present a method for computing the exact probability distribution for the number of observed localizations from a single photoswitching fluorophore. This is then extended to provide the probability distribution for the number of localizations in a direct stochastic optical reconstruction microscopy experiment involving an arbitrary number of molecules. We demonstrate that when training data are available to estimate photoswitching rates, the unknown number of molecules can be accurately recovered from the posterior mode of the number of molecules given the number of localizations. Finally, we demonstrate the method on experimental data by quantifying the number of adapter protein linker for activation of T cells on the cell surface of the T-cell immunological synapse. Availability and implementation Software and data available at https://github.com/lp1611/mol_count_dstorm. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
C Thomas ◽  
J Westwood ◽  
G F Butt

Abstract Background YouTube is increasingly used as a source of healthcare information. This study evaluated the quality of videos on YouTube about cochlear implants. Methods YouTube was searched using the phrase ‘cochlear implant’. The first 60 results were screened by two independent reviewers. A modified Discern tool was used to evaluate the quality of each video. Results Forty-seven videos were analysed. The mean overall Discern score was 2.0 out of 5.0. Videos scored higher for describing positive elements such as the benefits of a cochlear implant (mean score of 3.4) and scored lower for negative elements such as the risks of cochlear implant surgery (mean score of 1.3). Conclusion The quality of information regarding cochlear implant surgery on YouTube is highly variable. These results demonstrated a bias towards the positive attributes of cochlear implants, with little mention of the risks or uncertainty involved. Although videos may be useful as supplementary information, critical elements required to make an informed decision are lacking. This is of particular importance when patients are considering surgery.


AMB Express ◽  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chunmiao Jiang ◽  
Gongbo Lv ◽  
Jinxin Ge ◽  
Bin He ◽  
Zhe Zhang ◽  
...  

AbstractGATA transcription factors (TFs) are involved in the regulation of growth processes and various environmental stresses. Although GATA TFs involved in abiotic stress in plants and some fungi have been analyzed, information regarding GATA TFs in Aspergillusoryzae is extremely poor. In this study, we identified and functionally characterized seven GATA proteins from A.oryzae 3.042 genome, including a novel AoSnf5 GATA TF with 20-residue between the Cys-X2-Cys motifs which was found in Aspergillus GATA TFs for the first time. Phylogenetic analysis indicated that these seven A. oryzae GATA TFs could be classified into six subgroups. Analysis of conserved motifs demonstrated that Aspergillus GATA TFs with similar motif compositions clustered in one subgroup, suggesting that they might possess similar genetic functions, further confirming the accuracy of the phylogenetic relationship. Furthermore, the expression patterns of seven A.oryzae GATA TFs under temperature and salt stresses indicated that A. oryzae GATA TFs were mainly responsive to high temperature and high salt stress. The protein–protein interaction network of A.oryzae GATA TFs revealed certain potentially interacting proteins. The comprehensive analysis of A. oryzae GATA TFs will be beneficial for understanding their biological function and evolutionary features and provide an important starting point to further understand the role of GATA TFs in the regulation of distinct environmental conditions in A.oryzae.


2020 ◽  
Vol 36 (12) ◽  
pp. 3913-3915
Author(s):  
Hemi Luan ◽  
Xingen Jiang ◽  
Fenfen Ji ◽  
Zhangzhang Lan ◽  
Zongwei Cai ◽  
...  

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.


2006 ◽  
Vol 11 (7) ◽  
pp. 854-863 ◽  
Author(s):  
Maxwell D. Cummings ◽  
Michael A. Farnum ◽  
Marina I. Nelen

The genomics revolution has unveiled a wealth of poorly characterized proteins. Scientists are often able to produce milligram quantities of proteins for which function is unknown or hypothetical, based only on very distant sequence homology. Broadly applicable tools for functional characterization are essential to the illumination of these orphan proteins. An additional challenge is the direct detection of inhibitors of protein-protein interactions (and allosteric effectors). Both of these research problems are relevant to, among other things, the challenge of finding and validating new protein targets for drug action. Screening collections of small molecules has long been used in the pharmaceutical industry as 1 method of discovering drug leads. Screening in this context typically involves a function-based assay. Given a sufficient quantity of a protein of interest, significant effort may still be required for functional characterization, assay development, and assay configuration for screening. Increasingly, techniques are being reported that facilitate screening for specific ligands for a protein of unknown function. Such techniques also allow for function-independent screening with better characterized proteins. ThermoFluor®, a screening instrument based on monitoring ligand effects on temperature-dependent protein unfolding, can be applied when protein function is unknown. This technology has proven useful in the decryption of an essential bacterial enzyme and in the discovery of a series of inhibitors of a cancer-related, protein-protein interaction. The authors review some of the tools relevant to these research problems in drug discovery, and describe our experiences with 2 different proteins.


2021 ◽  
Vol 69 (4) ◽  
pp. 297-306
Author(s):  
Julius Krause ◽  
Maurice Günder ◽  
Daniel Schulz ◽  
Robin Gruna

Abstract The selection of training data determines the quality of a chemometric calibration model. In order to cover the entire parameter space of known influencing parameters, an experimental design is usually created. Nevertheless, even with a carefully prepared Design of Experiment (DoE), redundant reference analyses are often performed during the analysis of agricultural products. Because the number of possible reference analyses is usually very limited, the presented active learning approaches are intended to provide a tool for better selection of training samples.


Sign in / Sign up

Export Citation Format

Share Document