CATH functional families predict functional sites in proteins

Bioinformatics ◽

10.1093/bioinformatics/btaa937 ◽

2020 ◽

Author(s):

Sayoni Das ◽

Harry M Scholes ◽

Neeladri Sen ◽

Christine Orengo

Keyword(s):

Functional Characterization ◽

Functional Site ◽

Training Data ◽

Supplementary Information ◽

Conserved Residues ◽

Functional Sites ◽

Protein Protein Interaction ◽

Evolutionary Features ◽

Functional Families

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CATH functional families predict protein functional sites

10.1101/2020.03.23.003012 ◽

2020 ◽

Author(s):

Sayoni Das ◽

Harry M. Scholes ◽

Christine A. Orengo

Keyword(s):

Prediction Models ◽

Functional Site ◽

Training Data ◽

Supplementary Information ◽

Functional Sites ◽

Protein Protein Interaction ◽

Functional Characterisation ◽

Evolutionary Features ◽

Functional Families

AbstractMotivationIdentification of functional sites in proteins is essential for functional characterisation, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams).ResultsFunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed all publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites.AvailabilityThe datasets and prediction models are available on [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

ResiRole: residue-level functional site predictions to gauge the accuracies of protein structure prediction techniques

Bioinformatics ◽

10.1093/bioinformatics/btaa712 ◽

2020 ◽

Author(s):

Joshua M Toth ◽

Paul J DePietro ◽

Juergen Haas ◽

William A McLaughlin

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Functional Site ◽

Supplementary Information ◽

Cumulative Probability ◽

Reference Structure ◽

Model Quality ◽

Difference Scores ◽

Prediction Techniques

Abstract Motivation Methods to assess the quality of protein structure models are needed for user applications. To aid with the selection of structure models and further inform the development of structure prediction techniques, we describe the ResiRole method for the assessment of the quality of structure models. Results Structure prediction techniques are ranked according to the results of round-robin, head-to-head comparisons using difference scores. Each difference score was defined as the absolute value of the cumulative probability for a functional site prediction made with the FEATURE program for the reference structure minus that for the structure model. Overall, the difference scores correlate well with other model quality metrics; and based on benchmarking studies with NaïveBLAST, they are found to detect additional local structural similarities between the structure models and reference structures. Availabilityand implementation Automated analyses of models addressed in CAMEO are available via the ResiRole server, URL http://protein.som.geisinger.edu/ResiRole/. Interactive analyses with user-provided models and reference structures are also enabled. Code is available at github.com/wamclaughlin/ResiRole. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

Bioinformatics ◽

10.1093/bioinformatics/btaa045 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2690-2696

Author(s):

Jarkko Toivonen ◽

Pratyush K Das ◽

Jussi Taipale ◽

Esko Ukkonen

Keyword(s):

Markov Models ◽

Expectation Maximization Algorithm ◽

Software Tool ◽

Specific Weight ◽

Training Data ◽

Supplementary Information ◽

Markov Modeling ◽

Binding Motifs ◽

The Difference ◽

Probability Matrices

Abstract Motivation Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. Availability and implementation Software implementation is available from https://github.com/jttoivon/moder2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Blinking statistics and molecular counting in direct stochastic reconstruction microscopy (dSTORM)

Bioinformatics ◽

10.1093/bioinformatics/btab136 ◽

2021 ◽

Author(s):

Lekha Patel ◽

David Williamson ◽

Dylan M Owen ◽

Edward A K Cohen

Keyword(s):

Probability Distribution ◽

Single Molecule ◽

Immunological Synapse ◽

Training Data ◽

Supplementary Information ◽

Cellular Structures ◽

Exact Probability ◽

Stochastic Optical Reconstruction Microscopy ◽

Exact Probability Distribution ◽

Optical Reconstruction

Abstract Motivation Many recent advancements in single-molecule localization microscopy exploit the stochastic photoswitching of fluorophores to reveal complex cellular structures beyond the classical diffraction limit. However, this same stochasticity makes counting the number of molecules to high precision extremely challenging, preventing key insight into the cellular structures and processes under observation. Results Modelling the photoswitching behaviour of a fluorophore as an unobserved continuous time Markov process transitioning between a single fluorescent and multiple dark states, and fully mitigating for missed blinks and false positives, we present a method for computing the exact probability distribution for the number of observed localizations from a single photoswitching fluorophore. This is then extended to provide the probability distribution for the number of localizations in a direct stochastic optical reconstruction microscopy experiment involving an arbitrary number of molecules. We demonstrate that when training data are available to estimate photoswitching rates, the unknown number of molecules can be accurately recovered from the posterior mode of the number of molecules given the number of localizations. Finally, we demonstrate the method on experimental data by quantifying the number of adapter protein linker for activation of T cells on the cell surface of the T-cell immunological synapse. Availability and implementation Software and data available at https://github.com/lp1611/mol_count_dstorm. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Qualitative assessment of YouTube videos as a source of patient information for cochlear implant surgery

The Journal of Laryngology & Otology ◽

10.1017/s0022215121001390 ◽

2021 ◽

pp. 1-4

Author(s):

C Thomas ◽

J Westwood ◽

G F Butt

Keyword(s):

Cochlear Implant ◽

Cochlear Implants ◽

Qualitative Assessment ◽

Supplementary Information ◽

Implant Surgery ◽

Critical Elements ◽

Positive Elements ◽

Cochlear Implant Surgery ◽

Youtube Videos

Abstract Background YouTube is increasingly used as a source of healthcare information. This study evaluated the quality of videos on YouTube about cochlear implants. Methods YouTube was searched using the phrase ‘cochlear implant’. The first 60 results were screened by two independent reviewers. A modified Discern tool was used to evaluate the quality of each video. Results Forty-seven videos were analysed. The mean overall Discern score was 2.0 out of 5.0. Videos scored higher for describing positive elements such as the benefits of a cochlear implant (mean score of 3.4) and scored lower for negative elements such as the risks of cochlear implant surgery (mean score of 1.3). Conclusion The quality of information regarding cochlear implant surgery on YouTube is highly variable. These results demonstrated a bias towards the positive attributes of cochlear implants, with little mention of the risks or uncertainty involved. Although videos may be useful as supplementary information, critical elements required to make an informed decision are lacking. This is of particular importance when patients are considering surgery.

Download Full-text

Genome-wide identification of the GATA transcription factor family and their expression patterns under temperature and salt stress in Aspergillus oryzae

AMB Express ◽

10.1186/s13568-021-01212-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chunmiao Jiang ◽

Gongbo Lv ◽

Jinxin Ge ◽

Bin He ◽

Zhe Zhang ◽

...

Keyword(s):

Salt Stress ◽

Expression Patterns ◽

Interaction Network ◽

Protein Protein Interaction ◽

Genome Wide ◽

Gata Transcription Factor ◽

Starting Point ◽

Evolutionary Features ◽

First Time

AbstractGATA transcription factors (TFs) are involved in the regulation of growth processes and various environmental stresses. Although GATA TFs involved in abiotic stress in plants and some fungi have been analyzed, information regarding GATA TFs in Aspergillusoryzae is extremely poor. In this study, we identified and functionally characterized seven GATA proteins from A.oryzae 3.042 genome, including a novel AoSnf5 GATA TF with 20-residue between the Cys-X2-Cys motifs which was found in Aspergillus GATA TFs for the first time. Phylogenetic analysis indicated that these seven A. oryzae GATA TFs could be classified into six subgroups. Analysis of conserved motifs demonstrated that Aspergillus GATA TFs with similar motif compositions clustered in one subgroup, suggesting that they might possess similar genetic functions, further confirming the accuracy of the phylogenetic relationship. Furthermore, the expression patterns of seven A.oryzae GATA TFs under temperature and salt stresses indicated that A. oryzae GATA TFs were mainly responsive to high temperature and high salt stress. The protein–protein interaction network of A.oryzae GATA TFs revealed certain potentially interacting proteins. The comprehensive analysis of A. oryzae GATA TFs will be beneficial for understanding their biological function and evolutionary features and provide an important starting point to further understand the role of GATA TFs in the regulation of distinct environmental conditions in A.oryzae.

Download Full-text

CPVA: a web-based metabolomic tool for chromatographic peak visualization and annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa200 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3913-3915

Author(s):

Hemi Luan ◽

Xingen Jiang ◽

Fenfen Ji ◽

Zhangzhang Lan ◽

Zongwei Cai ◽

...

Keyword(s):

False Positive ◽

Supplementary Information ◽

Liquid Chromatography Mass Spectrometry ◽

Targeted Metabolomics ◽

Metabolomics Data ◽

Web Based ◽

Tremendous Amount ◽

Chromatographic Peaks ◽

User Friendly

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Universal Screening Methods and Applications of ThermoFluor®

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057106292746 ◽

2006 ◽

Vol 11 (7) ◽

pp. 854-863 ◽

Cited By ~ 124

Author(s):

Maxwell D. Cummings ◽

Michael A. Farnum ◽

Marina I. Nelen

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Protein Unfolding ◽

Direct Detection ◽

Functional Characterization ◽

Screening Methods ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Bacterial Enzyme ◽

Research Problems

The genomics revolution has unveiled a wealth of poorly characterized proteins. Scientists are often able to produce milligram quantities of proteins for which function is unknown or hypothetical, based only on very distant sequence homology. Broadly applicable tools for functional characterization are essential to the illumination of these orphan proteins. An additional challenge is the direct detection of inhibitors of protein-protein interactions (and allosteric effectors). Both of these research problems are relevant to, among other things, the challenge of finding and validating new protein targets for drug action. Screening collections of small molecules has long been used in the pharmaceutical industry as 1 method of discovering drug leads. Screening in this context typically involves a function-based assay. Given a sufficient quantity of a protein of interest, significant effort may still be required for functional characterization, assay development, and assay configuration for screening. Increasingly, techniques are being reported that facilitate screening for specific ligands for a protein of unknown function. Such techniques also allow for function-independent screening with better characterized proteins. ThermoFluor®, a screening instrument based on monitoring ligand effects on temperature-dependent protein unfolding, can be applied when protein function is unknown. This technology has proven useful in the decryption of an essential bacterial enzyme and in the discovery of a series of inhibitors of a cancer-related, protein-protein interaction. The authors review some of the tools relevant to these research problems in drug discovery, and describe our experiences with 2 different proteins.

Download Full-text

How the quantity and quality of training data impacts re-identification of smart meter users?

2015 IEEE International Conference on Smart Grid Communications (SmartGridComm) ◽

10.1109/smartgridcomm.2015.7436272 ◽

2015 ◽

Cited By ~ 1

Author(s):

Mustafa Faisal ◽

Alvaro A. Cardenas ◽

Daisuke Mashima

Keyword(s):

Training Data ◽

Smart Meter

Download Full-text

New active learning algorithms for near-infrared spectroscopy in agricultural applications

at - Automatisierungstechnik ◽

10.1515/auto-2020-0143 ◽

2021 ◽

Vol 69 (4) ◽

pp. 297-306

Author(s):

Julius Krause ◽

Maurice Günder ◽

Daniel Schulz ◽

Robin Gruna

Keyword(s):

Active Learning ◽

Near Infrared ◽

Agricultural Products ◽

Training Data ◽

Calibration Model ◽

Learning Approaches ◽

Training Samples ◽

Agricultural Applications ◽

Selection Of

Abstract The selection of training data determines the quality of a chemometric calibration model. In order to cover the entire parameter space of known influencing parameters, an experimental design is usually created. Nevertheless, even with a carefully prepared Design of Experiment (DoE), redundant reference analyses are often performed during the analysis of agricultural products. Because the number of possible reference analyses is usually very limited, the presented active learning approaches are intended to provide a tool for better selection of training samples.

Download Full-text