scholarly journals Analysing protein post-translational modform regions by linear programming

2018 ◽  
Author(s):  
Deepesh Agarwal ◽  
Ryan T. Fellers ◽  
Bryan P. Early ◽  
Dan Lu ◽  
Caroline J. DeHart ◽  
...  

Post-translational modifications (PTMs) at multiple sites can collectively influence protein function but the scope of such PTM coding has been challenging to determine. The number of potential combinatorial patterns of PTMs on a single molecule increases exponentially with the number of modification sites and a population of molecules exhibits a distribution of such “modforms”. Estimating these “modform distributions” is central to understanding how PTMs influence protein function. Although mass-spectrometry (MS) has made modforms more accessible, we have previously shown that current MS technology cannot recover the modform distribution of heavily modified proteins. However, MS data yield linear equations for modform amounts, which constrain the distribution within a high-dimensional, polyhedral “modform region”. Here, we show that linear programming (LP) can efficiently determine a range within which each modform value must lie, thereby approximating the modform region. We use this method on simulated data for mitogen-activated protein kinase 1 with the 7 phosphorylations reported on UniProt, giving a modform region in a 128 dimensional space. The exact dimension of the region is determined by the number of linearly independent equations but its size and shape depend on the data. The average modform range, which is a measure of size, reduces when data from bottom-up (BU) MS, in which proteins are first digested into peptides, is combined with data from top-down (TD) MS, in which whole proteins are analysed. Furthermore, when the modform distribution is structured, as might be expected of real distributions, the modform region for BU and TD combined has a more intricate polyhedral shape and is substantially more constrained than that of a random distribution. These results give the first insights into high-dimensional modform regions and confirm that fast LP methods can be used to analyse them. We discuss the problems of using modform regions with real data, when the actual modform distribution will not be known.

2020 ◽  
Vol 21 (S9) ◽  
Author(s):  
Qingyang Zhang ◽  
Thy Dao

Abstract Background Compositional data refer to the data that lie on a simplex, which are common in many scientific domains such as genomics, geology and economics. As the components in a composition must sum to one, traditional tests based on unconstrained data become inappropriate, and new statistical methods are needed to analyze this special type of data. Results In this paper, we consider a general problem of testing for the compositional difference between K populations. Motivated by microbiome and metagenomics studies, where the data are often over-dispersed and high-dimensional, we formulate a well-posed hypothesis from a Bayesian point of view and suggest a nonparametric test based on inter-point distance to evaluate statistical significance. Unlike most existing tests for compositional data, our method does not rely on any data transformation, sparsity assumption or regularity conditions on the covariance matrix, but directly analyzes the compositions. Simulated data and two real data sets on the human microbiome are used to illustrate the promise of our method. Conclusions Our simulation studies and real data applications demonstrate that the proposed test is more sensitive to the compositional difference than the mean-based method, especially when the data are over-dispersed or zero-inflated. The proposed test is easy to implement and computationally efficient, facilitating its application to large-scale datasets.


2003 ◽  
Vol 01 (01) ◽  
pp. 41-69 ◽  
Author(s):  
JING LI ◽  
TAO JIANG

We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called block-extension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomial-time exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2. By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the block-extension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.


2007 ◽  
Vol 35 (4) ◽  
pp. 807-810 ◽  
Author(s):  
S.A. Moschos ◽  
A.E. Williams ◽  
M.A. Lindsay

The therapeutic application of siRNA (short interfering RNA) shows promise as an alternative approach to small-molecule inhibitors for the treatment of human disease. However, the major obstacle to its use has been the difficulty in delivering these large anionic molecules in vivo. A potential approach to solving this problem is the chemical conjugation of siRNA to the cationic CPPs (cell-penetrating peptides), Tat-(48–60) (transactivator of transcription) and penetratin, which have been shown previously to mediate protein and peptide delivery in a host of animal models. In this transaction, we review recent studies on the utility of siRNA for the investigation of protein function in the airways/lung. We show that, despite previous studies showing the utility of cationic CPPs in vitro, conjugation of siRNA to Tat-(48–60) and penetratin failed to increase residual siRNA-mediated knockdown of p38 MAPK (mitogen-activated protein kinase) (MAPK14) mRNA in mouse lung in vivo. Significantly, we will also discuss potential non-specific actions and the induction of immunological responses by CPPs and their conjugates and how this might limit their application for siRNA-mediated delivery in vivo.


Author(s):  
V. A. Sizov ◽  
A. A. Drozhkin

In the field of information security it is always necessary to support a required level of protection, which can cause serious financial costs. The more information you should protect, the more money you should spend on it. The expert on information security shall both develop methods aimed at improvement of information system condition and save funds. To resolve these tasks different mathematic models can be used, such as methods of linear programming. Linear programming is a mathematic subject dealing with theory and methods of resolving extreme tasks on sets of n-dimensional space given by systems of linear equations and inequalities. These mathematic models should be used after assessing a possible damage after information leakage. In the article the authors study the task of company cost optimization. In the majority of cases total costs can be written as a linear equation. At the same time there are some restrictions on a certain root of equation. Simplex-method provides an opportunity to find the solution without using considerable technical capacities of computing.


2018 ◽  
Vol 115 (29) ◽  
pp. 7533-7538 ◽  
Author(s):  
Brian Munsky ◽  
Guoliang Li ◽  
Zachary R. Fox ◽  
Douglas P. Shepherd ◽  
Gregor Neuert

Despite substantial experimental and computational efforts, mechanistic modeling remains more predictive in engineering than in systems biology. The reason for this discrepancy is not fully understood. One might argue that the randomness and complexity of biological systems are the main barriers to predictive understanding, but these issues are not unique to biology. Instead, we hypothesize that the specific shapes of rare single-molecule event distributions produce substantial yet overlooked challenges for biological models. We demonstrate why modern statistical tools to disentangle complexity and stochasticity, which assume normally distributed fluctuations or enormous datasets, do not apply to the discrete, positive, and nonsymmetric distributions that characterize mRNA fluctuations in single cells. As an example, we integrate single-molecule measurements and advanced computational analyses to explore mitogen-activated protein kinase induction of multiple stress response genes. Through systematic analyses of different metrics to compare the same model to the same data, we elucidate why standard modeling approaches yield nonpredictive models for single-cell gene regulation. We further explain how advanced tools recover precise, reproducible, and predictive understanding of transcription regulation mechanisms, including gene activation, polymerase initiation, elongation, mRNA accumulation, spatial transport, and decay.


2008 ◽  
Vol 29 (5) ◽  
pp. 1306-1320 ◽  
Author(s):  
Alexey E. Granovsky ◽  
Matthew C. Clark ◽  
Dan McElheny ◽  
Gary Heil ◽  
Jia Hong ◽  
...  

ABSTRACT Raf kinase inhibitory protein (RKIP/PEBP1), a member of the phosphatidylethanolamine binding protein family that possesses a conserved ligand-binding pocket, negatively regulates the mammalian mitogen-activated protein kinase (MAPK) signaling cascade. Mutation of a conserved site (P74L) within the pocket leads to a loss or switch in the function of yeast or plant RKIP homologues. However, the mechanism by which the pocket influences RKIP function is unknown. Here we show that the pocket integrates two regulatory signals, phosphorylation and ligand binding, to control RKIP inhibition of Raf-1. RKIP association with Raf-1 is prevented by RKIP phosphorylation at S153. The P74L mutation increases kinase interaction and RKIP phosphorylation, enhancing Raf-1/MAPK signaling. Conversely, ligand binding to the RKIP pocket inhibits kinase interaction and RKIP phosphorylation by a noncompetitive mechanism. Additionally, ligand binding blocks RKIP association with Raf-1. Nuclear magnetic resonance studies reveal that the pocket is highly dynamic, rationalizing its capacity to interact with distinct partners and be involved in allosteric regulation. Our results show that RKIP uses a flexible pocket to integrate ligand binding- and phosphorylation-dependent interactions and to modulate the MAPK signaling pathway. This mechanism is an example of an emerging theme involving the regulation of signaling proteins and their interaction with effectors at the level of protein dynamics.


2020 ◽  
Author(s):  
Bo Kang ◽  
Darío García García ◽  
Jefrey Lijffijt ◽  
Raúl Santos-Rodríguez ◽  
Tijl De Bie

AbstractDimensionality reduction and manifold learning methods such as t-distributed stochastic neighbor embedding (t-SNE) are frequently used to map high-dimensional data into a two-dimensional space to visualize and explore that data. Going beyond the specifics of t-SNE, there are two substantial limitations of any such approach: (1) not all information can be captured in a single two-dimensional embedding, and (2) to well-informed users, the salient structure of such an embedding is often already known, preventing that any real new insights can be obtained. Currently, it is not known how to extract the remaining information in a similarly effective manner. We introduce conditional t-SNE (ct-SNE), a generalization of t-SNE that discounts prior information in the form of labels. This enables obtaining more informative and more relevant embeddings. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining an elegant method with a single integrated objective. We show how to efficiently optimize the objective and study the effects of the extra parameter that ct-SNE has over t-SNE. Qualitative and quantitative empirical results on synthetic and real data show ct-SNE is scalable, effective, and achieves its goal: it allows complementary structure to be captured in the embedding and provided new insights into real data.


2021 ◽  
Author(s):  
Kehinde Olobatuyi

Abstract Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of ”Curse of dimensionality” on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the ”FlexCWM” R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.


Energies ◽  
2019 ◽  
Vol 12 (7) ◽  
pp. 1360
Author(s):  
Fan Yang ◽  
Robert C. Qiu ◽  
Zenan Ling ◽  
Xing He ◽  
Haosen Yang

Multiple event detection and analysis in real time is a challenge for a modern grid as its features are usually non-identifiable. This paper, based on high-dimensional factor models, proposes a data-driven approach to gain insight into the constituent components of a multiple event via the high-resolution phasor measurement unit (PMU) data, such that proper actions can be taken before any sporadic fault escalates to cascading blackouts. Under the framework of random matrix theory, the proposed approach maps the raw data into a high-dimensional space with two parts: (1) factors (spikes, mapping faults); (2) residuals (a bulk, mapping white/non-Gaussian noises or normal fluctuations). As for the factors, we employ their number as a spatial indicator to estimate the number of constituent components in a multiple event. Simultaneously, the autoregressive rate of the noises is utilized to measure the variation of the temporal correlation of the residuals for tracking the system movement. Taking the spatial-temporal correlation into account, this approach allows for detection, decomposition and temporal localization of multiple events. Case studies based on simulated data and real 34-PMU data verify the effectiveness of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document