Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models

Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme —to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables —and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.

Download Full-text

Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting

Korean Journal of Applied Statistics ◽

10.5351/kjas.2008.21.2.275 ◽

2008 ◽

Vol 21 (2) ◽

pp. 275-290

Keyword(s):

Gene Expression ◽

Least Squares ◽

Partial Least Squares ◽

Gene Expression Data ◽

Time Course ◽

Partial Least Squares Regression ◽

Missing Values ◽

Expression Data ◽

Least Squares Regression

Download Full-text

Quantitation of Oxcarbazepine Clinically in Plasma Using Surfaced-Enhanced Raman Spectroscopy (SERS) Coupled with Chemometrics

Applied Spectroscopy ◽

10.1177/0003702819845389 ◽

2019 ◽

Vol 73 (7) ◽

pp. 801-809

Author(s):

Manhua Liu ◽

Yangyang Wang ◽

Yueping Jiang ◽

Haitao Liu ◽

Jingjing Chen ◽

...

Keyword(s):

Raman Spectroscopy ◽

Least Squares ◽

Spectral Data ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Cross Validation ◽

Human Blood Plasma ◽

Least Squares Regression ◽

Alternative Scheme ◽

Analytical Instruments

Nondestructive, sensitive, near-real-time quantitative analysis approaches are gaining popularity and attention, especially in clinical diagnosis and detection. There is a need to propose an alternative scheme using surface-enhanced Raman spectroscopy (SERS) assisted by chemometrics to improve some defects existing using other analytical instruments to meet clinical demands. In this study, clinical drug oxcarbazepine (OXC) in human blood plasma has been quantified and detected using this method. Partial least squares regression (PLSR) modeling was employed to assess the relationship between full SERS spectral data and OXC concentration. The calibration set's correlation coefficient of the model is > 0.9, the result suggests that this method is favorable and feasible. Furthermore, other multivariate calibration algorithms like Monte Carlo cross-validation (MCCV) sample set partitioning based on joint XY distances (SPXY), adaptive iteratively reweighted penalized least squares (AIR–PLS), moving window partial least squares regression (MWPLS), and leave-one-out cross-validation were used to handle these spectral data to obtain an accurate predictive model. The results achieved in this study provide a possibility and availability for us to apply SERS in combination with chemometrics to diagnosis detection.

Download Full-text

Missing Values Estimation in Microarray Data with Partial Least Squares Regression

Computational Science – ICCS 2006 - Lecture Notes in Computer Science ◽

10.1007/11758525_90 ◽

2006 ◽

pp. 662-669 ◽

Cited By ~ 9

Author(s):

Kun Yang ◽

Jianzhong Li ◽

Chaokun Wang

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Microarray Data ◽

Partial Least Squares Regression ◽

Missing Values ◽

Least Squares Regression

Download Full-text

Performances of full cross-validation partial least squares regression models developed using Raman spectral data for the prediction of bull beef sensory attributes

Data in Brief ◽

10.1016/j.dib.2018.04.056 ◽

2018 ◽

Vol 19 ◽

pp. 1355-1360 ◽

Cited By ~ 3

Author(s):

Ming Zhao ◽

Yingqun Nian ◽

Paul Allen ◽

Gerard Downey ◽

Joseph P. Kerry ◽

...

Keyword(s):

Least Squares ◽

Spectral Data ◽

Partial Least Squares ◽

Regression Models ◽

Partial Least Squares Regression ◽

Cross Validation ◽

Sensory Attributes ◽

Least Squares Regression ◽

Raman Spectral Data ◽

Raman Spectral

Download Full-text

Particles Counting in Intracellular Images by Partial Least Squares Regression and HLAC Feature between Multiple Features

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.135.236 ◽

2015 ◽

Vol 135 (2) ◽

pp. 236-243

Author(s):

Shohei Kumagai ◽

Kazuhiro Hotta

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Multiple Features

Download Full-text

Use of reflectance spectroscopy to estimate the organic carbon and CaCO3 contents of soils

Agrokémia és Talajtan ◽

10.1556/agrokem.60.2012.2.5 ◽

2012 ◽

Vol 61 (2) ◽

pp. 277-290 ◽

Cited By ~ 1

Author(s):

Ádám Csorba ◽

Vince Láng ◽

László Fenyvesi ◽

Erika Michéli

Keyword(s):

Organic Carbon ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Mean Squared Error ◽

Reflectance Spectroscopy ◽

Least Squares Regression ◽

Root Mean Squared Error ◽

Squared Error

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text