scholarly journals New analysis pipeline for high-throughput domain–peptide affinity experiments improves SH2 interaction data

2020 ◽  
Vol 295 (32) ◽  
pp. 11346-11363
Author(s):  
Tom Ronan ◽  
Roman Garnett ◽  
Kristen M. Naegle

Protein domain interactions with short linear peptides, such as those of the Src homology 2 (SH2) domain with phosphotyrosine-containing peptide motifs (pTyr), are ubiquitous and important to many biochemical processes of the cell. The desire to map and quantify these interactions has resulted in the development of high-throughput (HTP) quantitative measurement techniques, such as microarray or fluorescence polarization assays. For example, in the last 15 years, experiments have progressed from measuring single interactions to covering 500,000 of the 5.5 million possible SH2–pTyr interactions in the human proteome. However, high variability in affinity measurements and disagreements about positive interactions between published data sets led us here to reevaluate the analysis methods and raw data of published SH2–pTyr HTP experiments. We identified several opportunities for improving the identification of positive and negative interactions and the accuracy of affinity measurements. We implemented model-fitting techniques that are more statistically appropriate for the nonlinear SH2–pTyr interaction data. We also developed a method to account for protein concentration errors due to impurities and degradation or protein inactivity and aggregation. Our revised analysis increases the reported affinity accuracy, reduces the false-negative rate, and increases the amount of useful data by adding reliable true-negative results. We demonstrate improvement in classification of binding versus nonbinding when using machine-learning techniques, suggesting improved coherence in the reanalyzed data sets. We present revised SH2–pTyr affinity results and propose a new analysis pipeline for future HTP measurements of domain–peptide interactions.

2020 ◽  
Author(s):  
Tom Ronan ◽  
Roman Garnett ◽  
Kristen Naegle

ABSTRACTProtein domain interactions with short linear peptides, such as Src homology 2 (SH2) domain interactions with phosphotyrosine-containing peptide motifs (pTyr), are ubiquitous and important to many biochemical processes of the cell. The desire to map and quantify these interactions has resulted in the development of high-throughput (HTP) quantitative measurement techniques, such as microarray or fluorescence polarization assays. For example, in the last 15 years, experiments have progressed from measuring single interactions to covering 500,000 of the 5.5 million possible SH2-pTyr interactions in the human proteome. However, high variability in affinity measurements and disagreements about positive interactions between published datasets led us to re-evaluate the analysis methods and raw data of published SH2-pTyr HTP experiments. We identified several opportunities for improving the identification of positive and negative interactions, and the accuracy of affinity measurements. We implemented model fitting techniques that are more statistically appropriate for the non-linear SH2-pTyr interaction data. We developed a novel method to account for protein concentration errors due to impurities and degradation, as well as addressing protein inactivity and aggregation. Our revised analysis increases reported affinity accuracy, reduces the false negative rate, and results in an increase in useful data due to the addition of reliable true negative results. We demonstrate improvement in classification of binding vs non-binding when using machine learning techniques, suggesting improved coherence in the reanalyzed datasets. We present revised SH2-pTyr affinity results, and propose a new analysis pipeline for future HTP measurements of domain-peptide interactions.


2019 ◽  
Author(s):  
Simon Artzet ◽  
Tsu-Wei Chen ◽  
Jérôme Chopard ◽  
Nicolas Brichet ◽  
Michael Mielewczik ◽  
...  

AbstractIn the era of high-throughput visual plant phenotyping, it is crucial to design fully automated and flexible workflows able to derive quantitative traits from plant images. Over the last years, several software supports the extraction of architectural features of shoot systems. Yet currently no end-to-end systems are able to extract both 3D shoot topology and geometry of plants automatically from images on large datasets and a large range of species. In particular, these software essentially deal with dicotyledons, whose architecture is comparatively easier to analyze than monocotyledons. To tackle these challenges, we designed the Phenomenal software featured with: (i) a completely automatic workflow system including data import, reconstruction of 3D plant architecture for a range of species and quantitative measurements on the reconstructed plants; (ii) an open source library for the development and comparison of new algorithms to perform 3D shoot reconstruction and (iii) an integration framework to couple workflow outputs with existing models towards model-assisted phenotyping. Phenomenal analyzes a large variety of data sets and species from images of high-throughput phenotyping platform experiments to published data obtained in different conditions and provided in a different format. Phenomenal has been validated both on manual measurements and synthetic data simulated by 3D models. It has been also tested on other published datasets to reproduce a published semi-automatic reconstruction workflow in an automatic way. Phenomenal is available as an open-source software on a public repository.


2002 ◽  
Vol 7 (4) ◽  
pp. 341-351 ◽  
Author(s):  
Michael F.M. Engels ◽  
Luc Wouters ◽  
Rudi Verbeeck ◽  
Greet Vanhoof

A data mining procedure for the rapid scoring of high-throughput screening (HTS) compounds is presented. The method is particularly useful for monitoring the quality of HTS data and tracking outliers in automated pharmaceutical or agrochemical screening, thus providing more complete and thorough structure-activity relationship (SAR) information. The method is based on the utilization of the assumed relationship between the structure of the screened compounds and the biological activity on a given screen expressed on a binary scale. By means of a data mining method, a SAR description of the data is developed that assigns probabilities of being a hit to each compound of the screen. Then, an inconsistency score expressing the degree of deviation between the adequacy of the SAR description and the actual biological activity is computed. The inconsistency score enables the identification of potential outliers that can be primed for validation experiments. The approach is particularly useful for detecting false-negative outliers and for identifying SAR-compliant hit/nonhit borderline compounds, both of which are classes of compounds that can contribute substantially to the development and understanding of robust SARs. In a first implementation of the method, one- and two-dimensional descriptors are used for encoding molecular structure information and logistic regression for calculating hits/nonhits probability scores. The approach was validated on three data sets, the first one from a publicly available screening data set and the second and third from in-house HTS screening campaigns. Because of its simplicity, robustness, and accuracy, the procedure is suitable for automation.


2011 ◽  
Vol 10 (12) ◽  
pp. M111.012500 ◽  
Author(s):  
Xueping Yu ◽  
Joseph Ivanic ◽  
Vesna Memišević ◽  
Anders Wallqvist ◽  
Jaques Reifman

2021 ◽  
Author(s):  
Michael Piechotta ◽  
Qi Wang ◽  
Janine Altmueller ◽  
Christoph Dieterich

A whole series of high-throughput antibody-free methods for RNA modification detection from sequencing data emerged lately. We present JACUSA2 as a versatile software solution and comprehensive analysis framework for RNA modification detection assays that are based on either the Illumina or Nanopore platform. Importantly, JACUSA2 can integrate information from multiple experiments (e.g. replicates and different conditions) and different library types (e.g. first- or secondstrand libraries). We demonstrate its utility by example, showing three analysis workflows for m6A detection on published data sets: 1) MazF m6a-sensitive RNA digestion (FTO+ vs FTO-), 2) DART-seq (YTHwt vs YTHmut) and 3) Nanopore profiling (METTL3 +/+ vs -/-). All assays have been conducted in HEK293 cells and complement one another.


2014 ◽  
Vol 19 (9) ◽  
pp. 1314-1320 ◽  
Author(s):  
Chand S. Mangat ◽  
Amrita Bharat ◽  
Sebastian S. Gehrke ◽  
Eric D. Brown

High-throughput screening (HTS) of chemical and microbial strain collections is an indispensable tool for modern chemical and systems biology; however, HTS data sets have inherent systematic and random error, which may lead to false-positive or false-negative results. Several methods of normalization of data exist; nevertheless, due to the limitations of each, no single method has been universally adopted. Here, we present a method of data visualization and normalization that is effective, intuitive, and easy to implement in a spreadsheet program. For each plate, the data are ordered by ascending values and a plot thereof yields a curve that is a signature of the plate data. Curve shape characteristics provide intuitive visualization of the frequency and strength of inhibitors, activators, and noise on the plate, allowing potentially problematic plates to be flagged. To reduce plate-to-plate variation, the data can be normalized by the mean of the middle 50% of ordered values, also called the interquartile mean (IQM) or the 50% trimmed mean of the plate. Positional effects due to bias in columns, rows, or wells can be corrected using the interquartile mean of each well position across all plates (IQMW) as a second level of normalization. We illustrate the utility of this method using data sets from biochemical and phenotypic screens.


Author(s):  
William A Freyman ◽  
Kimberly F McManus ◽  
Suyash S Shringarpure ◽  
Ethan M Jewett ◽  
Katarzyna Bryc ◽  
...  

Abstract Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repository https://github.com/23andMe/phasedibd.


Genetics ◽  
1997 ◽  
Vol 147 (4) ◽  
pp. 1855-1861 ◽  
Author(s):  
Montgomery Slatkin ◽  
Bruce Rannala

Abstract A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.


2011 ◽  
Vol 61 (2) ◽  
pp. 225-238 ◽  
Author(s):  
Wen Bo Liao ◽  
Zhi Ping Mi ◽  
Cai Quan Zhou ◽  
Ling Jin ◽  
Xian Han ◽  
...  

AbstractComparative studies of the relative testes size in animals show that promiscuous species have relatively larger testes than monogamous species. Sperm competition favours the evolution of larger ejaculates in many animals – they give bigger testes. In the view, we presented data on relative testis mass for 17 Chinese species including 3 polyandrous species. We analyzed relative testis mass within the Chinese data set and combining those data with published data sets on Japanese and African frogs. We found that polyandrous foam nesting species have relatively large testes, suggesting that sperm competition was an important factor affecting the evolution of relative testes size. For 4 polyandrous species testes mass is positively correlated with intensity (males/mating) but not with risk (frequency of polyandrous matings) of sperm competition.


Sign in / Sign up

Export Citation Format

Share Document