scholarly journals RobNorm: Model-Based Robust Normalization Method for Labeled Quantitative Mass Spectrometry Proteomics Data

2019 ◽  
Author(s):  
Meng Wang ◽  
Lihua Jiang ◽  
Ruiqi Jian ◽  
Joanne Y. Chan ◽  
Qing Liu ◽  
...  

AbstractMotivationData normalization is an important step in processing proteomics data generated in mass spectrometry (MS) experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity.MethodsTo robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work of (Windham, 1995, Fujisawa and Eguchi, 2008) to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm.ResultsIn simulation studies and analysis of real data from the genotype-tissue expression (GTEx) project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples.Availabilityhttps://github.com/mwgrassgreen/[email protected] and [email protected]

Author(s):  
Meng Wang ◽  
Lihua Jiang ◽  
Ruiqi Jian ◽  
Joanne Y Chan ◽  
Qing Liu ◽  
...  

Abstract Motivation Data normalization is an important step in processing proteomics data generated in mass spectrometry experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity. Results To robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm. In simulation studies and analysis of real data from the genotype-tissue expression project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples. Availabilityand implementation https://github.com/mwgrassgreen/RobNorm. Supplementary information Supplementary data are available at Bioinformatics online.


Molbank ◽  
10.3390/m1140 ◽  
2020 ◽  
Vol 2020 (2) ◽  
pp. M1140
Author(s):  
Jack Bennett ◽  
Paul Murphy

(2S,3R,6R)-2-[(R)-1-Hydroxyallyl]-4,4-dimethoxy-6-methyltetrahydro-2H-pyran-3-ol was isolated in 18% after treating the glucose derived (5R,6S,7R)-5,6,7-tris[(triethylsilyl)oxy]nona-1,8-dien-4-one with (1S)-(+)-10-camphorsulfonic acid (CSA). The one-pot formation of the title compound involved triethylsilyl (TES) removal, alkene isomerization, intramolecular conjugate addition and ketal formation. The compound was characterized by 1H and 13C NMR spectroscopy, ESI mass spectrometry and IR spectroscopy. NMR spectroscopy was used to establish the product structure, including the conformation of its tetrahydropyran ring.


2021 ◽  
pp. 1-11
Author(s):  
Velichka Traneva ◽  
Stoyan Tranev

Analysis of variance (ANOVA) is an important method in data analysis, which was developed by Fisher. There are situations when there is impreciseness in data In order to analyze such data, the aim of this paper is to introduce for the first time an intuitionistic fuzzy two-factor ANOVA (2-D IFANOVA) without replication as an extension of the classical ANOVA and the one-way IFANOVA for a case where the data are intuitionistic fuzzy rather than real numbers. The proposed approach employs the apparatus of intuitionistic fuzzy sets (IFSs) and index matrices (IMs). The paper also analyzes a unique set of data on daily ticket sales for a year in a multiplex of Cinema City Bulgaria, part of Cineworld PLC Group, applying the two-factor ANOVA and the proposed 2-D IFANOVA to study the influence of “ season ” and “ ticket price ” factors. A comparative analysis of the results, obtained after the application of ANOVA and 2-D IFANOVA over the real data set, is also presented.


Genetics ◽  
2003 ◽  
Vol 165 (4) ◽  
pp. 2269-2282
Author(s):  
D Mester ◽  
Y Ronin ◽  
D Minkov ◽  
E Nevo ◽  
A Korol

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.


2005 ◽  
Vol 11 (5) ◽  
pp. 535-546 ◽  
Author(s):  
Anna Kondakov ◽  
Buko Lindner

Bacterial glycolipids are complex amphiphilic molecules which are, on the one hand, of utmost importance for the organization and function of bacterial membranes and which, on the other hand, play a major role in the activation of cells of the innate and adaptive immune system of the host. Already small alterations to their chemical structure may influence the biological activity tremendously. Due to their intrinsic biological heterogeneity [number and type of fatty acids, saccharide structures and substitution with for example, phosphate ( P), 2-aminoethyl-(pyro)phosphate groups ( P-Etn) or 4-amino-4-deoxyarabinose (Ara4N)], separation of the different components are a prerequisite for unequivocal chemical and nuclear magnetic resonance structural analyses. In this contribution, the structural information which can be obtained from heterogenous samples of glycolipids by Fourier transform (FT) ion cyclotron resonance mass spectrometric methods is described. By means of recently analysed complex biological samples, the possibilities of high-resolution electrospray ionization FT-MS are demonstrated. Capillary skimmer dissociation, as well as tandem mass spectrometry (MS/MS) analysis utilizing collision-induced dissociation and infrared multiphoton dissociation, are compared and their advantages in providing structural information of diagnostic importance are discussed.


2018 ◽  
Vol 90 (21) ◽  
pp. 13112-13117 ◽  
Author(s):  
Lindsay K. Pino ◽  
Brian C. Searle ◽  
Eric L. Huang ◽  
William Stafford Noble ◽  
Andrew N. Hoofnagle ◽  
...  

Proteomes ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 3 ◽  
Author(s):  
Zhujia Ye ◽  
Sasikiran Reddy Sangireddy ◽  
Chih-Li Yu ◽  
Dafeng Hui ◽  
Kevin Howe ◽  
...  

Switchgrass plants were grown in a Sandwich tube system to induce gradual drought stress by withholding watering. After 29 days, the leaf photosynthetic rate decreased significantly, compared to the control plants which were watered regularly. The drought-treated plants recovered to the same leaf water content after three days of re-watering. The root tip (1cm basal fragment, designated as RT1 hereafter) and the elongation/maturation zone (the next upper 1 cm tissue, designated as RT2 hereafter) tissues were collected at the 29th day of drought stress treatment, (named SDT for severe drought treated), after one (D1W) and three days (D3W) of re-watering. The tandem mass tags mass spectrometry-based quantitative proteomics analysis was performed to identify the proteomes, and drought-induced differentially accumulated proteins (DAPs). From RT1 tissues, 6156, 7687, and 7699 proteins were quantified, and 296, 535, and 384 DAPs were identified in the SDT, D1W, and D3W samples, respectively. From RT2 tissues, 7382, 7255, and 6883 proteins were quantified, and 393, 587, and 321 proteins DAPs were identified in the SDT, D1W, and D3W samples. Between RT1 and RT2 tissues, very few DAPs overlapped at SDT, but the number of such proteins increased during the recovery phase. A large number of hydrophilic proteins and stress-responsive proteins were induced during SDT and remained at a higher level during the recovery stages. A large number of DAPs in RT1 tissues maintained the same expression pattern throughout drought treatment and the recovery phases. The DAPs in RT1 tissues were classified in cell proliferation, mitotic cell division, and chromatin modification, and those in RT2 were placed in cell wall remodeling and cell expansion processes. This study provided information pertaining to root zone-specific proteome changes during drought and recover phases, which will allow us to select proteins (genes) as better defined targets for developing drought tolerant plants. The mass spectrometry proteomics data are available via ProteomeXchange with identifier PXD017441.


2015 ◽  
Author(s):  
Lisa M. Breckels ◽  
Sean Holden ◽  
David Wojnar ◽  
Claire M. Mulvey ◽  
Andy Christoforou ◽  
...  

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry


Sign in / Sign up

Export Citation Format

Share Document