Bayesian Mass Spectra Peak Alignment from Mass Charge Ratios

Proteomics studies based on mass spectrometry (MS) are gaining popular applications in biomedical research for protein identification/quantification and biomarker discovery, especially for potential early diagnosis and prognosis of severe disease before the occurrence of symptoms. However, MS data collected using current technologies are very noisy and appropriate data preprocessing is critical for successful applications of MS-based approaches. Among various data preprocessing steps, peak alignment from multiple spectra based on detected peak sample locations presents special statistical challenges when effective experimental calibration is not feasible due to relatively large peak location variation. To avoid intensive tuning parameter optimization, we propose a simple novel Bayesian algorithm “random grafting-pruning Markov chain Monte Carlo (RGPMCMC)” that can be applied to global MS peak alignment and to follow certain modelbased sample classification criterion for using aligned peaks to classify spectrum samples. The usefulness of our approach is demonstrated through simulation study by making extensive comparison with other algorithms in the literature. Its application to an ovarian cancer MALDI-MS data set achieves a smaller 10-fold cross validation error rate than other current large scale methodologies.

Download Full-text

Shotgun Proteomics and Biomarker Discovery

Disease Markers ◽

10.1155/2002/505397 ◽

2002 ◽

Vol 18 (2) ◽

pp. 99-105 ◽

Cited By ~ 194

Author(s):

W. Hayes McDonald ◽

John R. Yates

Keyword(s):

Large Scale ◽

Protein Identification ◽

Biomarker Discovery ◽

Dynamic Range ◽

Shotgun Proteomics ◽

Sequence Information ◽

Protein Biomarkers ◽

Post Translational Modifications ◽

Multidimensional Protein Identification Technology ◽

Shotgun Approach

Coupling large-scale sequencing projects with the amino acid sequence information that can be gleaned from tandem mass spectrometry (MS/MS) has made it much easier to analyze complex mixtures of proteins. The limits of this “shotgun” approach, in which the protein mixture is proteolytically digested before separation, can be further expanded by separating the resulting mixture of peptides prior to MS/MS analysis. Both single dimensional high pressure liquid chromatography (LC) and multidimensional LC (LC/LC) can be directly interfaced with the mass spectrometer to allow for automated collection of tremendous quantities of data. While there is no single technique that addresses all proteomic challenges, the shotgun approaches, especially LC/LC-MS/MS-based techniques such as MudPIT (multidimensional protein identification technology), show advantages over gel-based techniques in speed, sensitivity, scope of analysis, and dynamic range. Advances in the ability to quantitate differences between samples and to detect for an array of post-translational modifications allow for the discovery of classes of protein biomarkers that were previously unassailable.

Download Full-text

Design and Comparative Analysis of New Personalized Recommender Algorithms with Specific Features for Large Scale Datasets

Mathematics ◽

10.3390/math8071106 ◽

2020 ◽

Vol 8 (7) ◽

pp. 1106

Author(s):

S. Bhaskaran ◽

Raja Marappan ◽

B. Santhi

Keyword(s):

Large Scale ◽

Real Life ◽

Optimization Methods ◽

Tuning Parameter ◽

Computational Time ◽

Data Set ◽

Significant Difference ◽

Minimum Number ◽

Tremendous Amount ◽

The Given

Nowadays, because of the tremendous amount of information that humans and machines produce every day, it has become increasingly hard to choose the more relevant content across a broad range of choices. This research focuses on the design of two different intelligent optimization methods using Artificial Intelligence and Machine Learning for real-life applications that are used to improve the process of generation of recommenders. In the first method, the modified cluster based intelligent collaborative filtering is applied with the sequential clustering that operates on the values of dataset, user′s neighborhood set, and the size of the recommendation list. This strategy splits the given data set into different subsets or clusters and the recommendation list is extracted from each group for constructing the better recommendation list. In the second method, the specific features-based customized recommender that works in the training and recommendation steps by applying the split and conquer strategy on the problem datasets, which are clustered into a minimum number of clusters and the better recommendation list, is created among all the clusters. This strategy automatically tunes the tuning parameter λ that serves the role of supervised learning in generating the better recommendation list for the large datasets. The quality of the proposed recommenders for some of the large scale datasets is improved compared to some of the well-known existing methods. The proposed methods work well when λ = 0.5 with the size of the recommendation list, |L| = 30 and the size of the neighborhood, |S| < 30. For a large value of |S|, the significant difference of the root mean square error becomes smaller in the proposed methods. For large scale datasets, simulation of the proposed methods when varying the user sizes and when the user size exceeds 500, the experimental results show that better values of the metrics are obtained and the proposed method 2 performs better than proposed method 1. The significant differences are obtained in these methods because the structure of computation of the methods depends on the number of user attributes, λ, the number of bipartite graph edges, and |L|. The better values of the (Precision, Recall) metrics obtained with size as 3000 for the large scale Book-Crossing dataset in the proposed methods are (0.0004, 0.0042) and (0.0004, 0.0046) respectively. The average computational time of the proposed methods takes <10 seconds for the large scale datasets and yields better performance compared to the well-known existing methods.

Download Full-text

Development of a Comprehensive Antibody Staining Database using a Standardized Analytics Pipeline

10.1101/563742 ◽

2019 ◽

Cited By ~ 1

Author(s):

El-ad David Amir ◽

Brian Lee ◽

Paul Badoual ◽

Martin Gordon ◽

Xinzheng V. Guo ◽

...

Keyword(s):

Large Scale ◽

Biomarker Discovery ◽

Staining Intensity ◽

Immune Monitoring ◽

Mass Cytometry ◽

Data Set ◽

Mait Cells ◽

Antibody Staining ◽

Online Resource ◽

Analysis Platform

AbstractLarge-scale immune monitoring experiments (such as clinical trials) are a promising direction for biomarker discovery and responder stratification in immunotherapy. Mass cytometry is one of the tools in the immune monitoring arsenal. We propose a standardized workflow for the acquisition and analysis of large-scale mass cytometry experiments. The workflow includes two-tiered barcoding, a broad lyophilized panel, and the incorporation of a fully automated, cloud-based analysis platform. We applied the workflow to a large antibody staining screen using the LEGENDScreen kit, resulting in single-cell data for 350 antibodies over 71 profiling subsets. The screen recapitulates many known trends in the immune system and reveals potential markers for delineating MAIT cells. Additionally, we examine the effect of fixation on staining intensity and identify several markers where fixation leads to either gain or loss of signal. The standardized workflow can be seamlessly integrated into existing trials. Finally, the antibody staining data set is available as an online resource for researchers who are designing mass cytometry experiments in suspension and tissue.

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Financial distress determinants among SMEs: empirical evidence from Sweden

Journal of Economic Studies ◽

10.1108/jes-01-2019-0030 ◽

2020 ◽

Vol 47 (3) ◽

pp. 547-560 ◽

Cited By ~ 1

Author(s):

Darush Yazdanfar ◽

Peter Öhman

Keyword(s):

Financial Crisis ◽

Financial Distress ◽

Large Scale ◽

Global Financial Crisis ◽

Binary Logistic Regression ◽

Data Availability ◽

Cross Sectional ◽

Data Set ◽

Content Type ◽

The Global Financial Crisis

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.

Download Full-text

Relationship between large-scale ionospheric field-aligned currents and electron/ion precipitations: DMSP observations

Earth Planets and Space ◽

10.1186/s40623-020-01286-z ◽

2020 ◽

Vol 72 (1) ◽

Author(s):

Chao Xiong ◽

Claudia Stolle ◽

Patrick Alken ◽

Jan Rauberg

Keyword(s):

Time Distribution ◽

Large Scale ◽

Particle Flux ◽

Particle Energy ◽

Lower Latitude ◽

Particle Precipitation ◽

Data Set ◽

The Mean ◽

Two Parameters ◽

Meteorological Satellite

Abstract In this study, we have derived field-aligned currents (FACs) from magnetometers onboard the Defense Meteorological Satellite Project (DMSP) satellites. The magnetic latitude versus local time distribution of FACs from DMSP shows comparable dependences with previous findings on the intensity and orientation of interplanetary magnetic field (IMF) By and Bz components, which confirms the reliability of DMSP FAC data set. With simultaneous measurements of precipitating particles from DMSP, we further investigate the relation between large-scale FACs and precipitating particles. Our result shows that precipitation electron and ion fluxes both increase in magnitude and extend to lower latitude for enhanced southward IMF Bz, which is similar to the behavior of FACs. Under weak northward and southward Bz conditions, the locations of the R2 current maxima, at both dusk and dawn sides and in both hemispheres, are found to be close to the maxima of the particle energy fluxes; while for the same IMF conditions, R1 currents are displaced further to the respective particle flux peaks. Largest displacement (about 3.5°) is found between the downward R1 current and ion flux peak at the dawn side. Our results suggest that there exists systematic differences in locations of electron/ion precipitation and large-scale upward/downward FACs. As outlined by the statistical mean of these two parameters, the FAC peaks enclose the particle energy flux peaks in an auroral band at both dusk and dawn sides. Our comparisons also found that particle precipitation at dawn and dusk and in both hemispheres maximizes near the mean R2 current peaks. The particle precipitation flux maxima closer to the R1 current peaks are lower in magnitude. This is opposite to the known feature that R1 currents are on average stronger than R2 currents.

Download Full-text

The combined detection of Amphiregulin, Cyclin A1 and DDX20/Gemin3 expression predicts aggressive forms of oral squamous cell carcinoma

British Journal of Cancer ◽

10.1038/s41416-021-01491-x ◽

2021 ◽

Author(s):

Ekaterina Bourova-Flin ◽

Samira Derakhshan ◽

Afsaneh Goudarzi ◽

Tao Wang ◽

Anne-Laure Vitte ◽

...

Keyword(s):

Gene Expression ◽

Squamous Cell ◽

Large Scale ◽

Biomarker Discovery ◽

Gene Expression Signature ◽

Predictive Biomarkers ◽

Specific Gene ◽

Specific Expression ◽

Intrinsic Nature ◽

Independent Cohort

Abstract Background Large-scale genetic and epigenetic deregulations enable cancer cells to ectopically activate tissue-specific expression programmes. A specifically designed strategy was applied to oral squamous cell carcinomas (OSCC) in order to detect ectopic gene activations and develop a prognostic stratification test. Methods A dedicated original prognosis biomarker discovery approach was implemented using genome-wide transcriptomic data of OSCC, including training and validation cohorts. Abnormal expressions of silent genes were systematically detected, correlated with survival probabilities and evaluated as predictive biomarkers. The resulting stratification test was confirmed in an independent cohort using immunohistochemistry. Results A specific gene expression signature, including a combination of three genes, AREG, CCNA1 and DDX20, was found associated with high-risk OSCC in univariate and multivariate analyses. It was translated into an immunohistochemistry-based test, which successfully stratified patients of our own independent cohort. Discussion The exploration of the whole gene expression profile characterising aggressive OSCC tumours highlights their enhanced proliferative and poorly differentiated intrinsic nature. Experimental targeting of CCNA1 in OSCC cells is associated with a shift of transcriptomic signature towards the less aggressive form of OSCC, suggesting that CCNA1 could be a good target for therapeutic approaches.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

IEEE Transactions on Computational Social Systems ◽

10.1109/tcss.2021.3051189 ◽

2021 ◽

pp. 1-13

Author(s):

Usman Naseem ◽

Imran Razzak ◽

Matloob Khushi ◽

Peter W. Eklund ◽

Jinman Kim

Keyword(s):

Sentiment Analysis ◽

Large Scale ◽

Data Set ◽

Twitter Data

Download Full-text