HB-PLS: An algorithm for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression

AbstractGene expression data features high dimensionality, multicollinearity, and the existence of outlier or non-Gaussian distribution noise, which make the identification of true regulatory genes controlling a biological process or pathway difficult. In this study, we embedded the Huber-Berhu (HB) regression into the partial least squares (PLS) framework and created a new method called HB-PLS for predicting biological process or pathway regulators through construction of regulatory networks. PLS is an alternative to ordinary least squares (OLS) for handling multicollinearity in high dimensional data. The Huber loss is more robust to outliers than square loss, and the Berhu penalty can obtain a better balance between the ℓ2 penalty and the ℓ1 penalty. HB-PLS therefore inherits the advantages of the Huber loss, the Berhu penalty, and PLS. To solve the Huber-Berhu regression, a fast proximal gradient descent method was developed; the HB regression runs much faster than CVX, a Matlab-based modeling system for convex optimization. Implementation of HB-PLS to real transcriptomic data from Arabidopsis and maize led to the identification of many pathway regulators that had previously been identified experimentally. In terms of its efficiency in identifying positive biological process or pathway regulators, HB-PLS is comparable to sparse partial least squares (SPLS), a very efficient method developed for variable selection and dimension reduction in handling multicollinearity in high dimensional genomic data. However, HB-PLS is able to identify some distinct regulators, and in one case identify more positive regulators at the top of output list, which can reduce the burden for experimental test of the identified candidate targets. Our study suggests that HB-PLS is instrumental for identifying biological process and pathway genes.

Download Full-text

A Calibration Tutorial for Spectral Data. Part 1: Data Pretreatment and Principal Component Regression Using Matlab

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.93 ◽

1996 ◽

Vol 4 (1) ◽

pp. 225-242 ◽

Cited By ~ 6

Author(s):

Paul Geladi ◽

Harald Martens

Keyword(s):

Least Squares ◽

Spectral Data ◽

Partial Least Squares ◽

Regression Models ◽

Principal Component Regression ◽

Principal Component ◽

Ordinary Least Squares ◽

Least Squares Regression ◽

Statistical Parameters ◽

Calibration Samples

Regression and calibration play an important role in analytical chemistry. All analytical instrumentation is dependent on a calibration that uses some regression model for a set of calibration samples. The ordinary least squares (OLS) method of building a multivariate linear regression (MLR) model has strict limitations. Therefore, biased or regularised regression models have been introduced. Some selected ones are ridge regression (RR), principal component regression (PCR) and partial least squares regression (PLS or PLSR). Also, artificial neural networks (ANN) based on back-propagation can be used as regression models. In order to understand regression models more is needed than just a set of statistical parameters. A deeper understanding of the underlying chemistry and physics is always equally important. For spectral data this means that a basic understanding of spectra and their errors is useful and that spectral representation should be included in judging the usefulness of the data treatment. A “constructed” spectrometric example is introduced. It consists of real spectrometric measurements in the range 408–1176 nm for 26 calibration samples and 10 test samples. The main response variable is litmus concentration, but other constituents such as bromocresolgreen and ZnO are added as interferents and also the pH is changed. The example is introduced as a tutorial. All calculations are shown in detail in Matlab. This makes it easy for the reader to follow and understand the calculations. It also makes the calculations completely traceable. The raw data are available as a file. In Part 1, the emphasis is on pretreatment of the data and on visualisation in different stages of the calculations. Part 1 ends with principal component regression calculations. Partial least squares calculations and some ANN results are presented in Part 2.

Download Full-text

Multivariate Methods Based Soft Measurement for Wine Quality Evaluation

Abstract and Applied Analysis ◽

10.1155/2014/740754 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Shen Yin ◽

Lei Liu ◽

Xin Gao ◽

Hamid Reza Karimi

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Principal Component ◽

Ordinary Least Squares ◽

Multivariate Methods ◽

Least Squares Regression ◽

Wine Quality ◽

Soft Measurement

Soft measurement is a new, developing, and promising industry technology and has been widely used in the industry nowadays. This technology plays a significant role especially in the case where some key variables are difficult to be measured by traditional measurement methods. In this paper, the quality of the wine is evaluated given the wine physicochemical indexes according to multivariate methods based soft measurement. The multivariate methods used in this paper include ordinary least squares regression (OLSR), principal component regression (PCR), partial least squares regression (PLSR), and modified partial least squares regression (MPLSR). By comparing the performance of the four methods, the MPLSR prediction model shows superior results than the others. In general, to determine the quality of the wine, experienced wine tasters are hired to taste the wine and make a decision. However, since the physicochemical indexes of wine can to some extent reflect the quality of wine, the multivariate statistical methods based soft measure can help the oenologist in wine evaluation.

Download Full-text

Algorithm for Inference of Gene Regulatory Networks Using Partial Least Squares Regression and Mutual Information

Emerging Technologies in Data Mining and Information Security - Lecture Notes in Networks and Systems ◽

10.1007/978-981-15-9774-9_81 ◽

2021 ◽

pp. 891-897

Author(s):

Nimrita Koul ◽

Sunilkumar S. Manvi

Keyword(s):

Mutual Information ◽

Least Squares ◽

Partial Least Squares ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Gene Regulatory

Download Full-text

Robust Nonlinear Partial Least Squares Regression Using the BACON Algorithm

Journal of Applied Mathematics ◽

10.1155/2018/7696302 ◽

2018 ◽

Vol 2018 ◽

pp. 1-5

Author(s):

Abdelmounaim Kerkri ◽

Jelloul Allal ◽

Zoubir Zarrouk

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Latent Variables ◽

Partial Least Squares Regression ◽

Chemical Engineering ◽

Simulated Data ◽

Ordinary Least Squares ◽

Least Squares Regression ◽

Computationally Efficient ◽

The Difference

Partial least squares regression (PLS regression) is used as an alternative for ordinary least squares regression in the presence of multicollinearity. This occurrence is common in chemical engineering problems. In addition to the linear form of PLS, there are other versions that are based on a nonlinear approach, such as the quadratic PLS (QPLS2). The difference between QPLS2 and the regular PLS algorithm is the use of quadratic regression instead of OLS regression in the calculations of latent variables. In this paper we propose a robust version of QPLS2 to overcome sensitivity to outliers using the Blocked Adaptive Computationally Efficient Outlier Nominators (BACON) algorithm. Our hybrid method is tested on both real and simulated data.

Download Full-text

HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression

Forestry Research ◽

10.48130/fr-2021-0006 ◽

2021 ◽

Vol 1 (0) ◽

pp. 1-13

Author(s):

Wenping Deng ◽

◽

Kui Zhang ◽

Cheng He ◽

Sanzhen Liu ◽

...

Keyword(s):

Least Squares ◽

Statistical Method ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Biological Process ◽

Least Squares Regression

Download Full-text

Predicting predictive accuracy :||performance of fixed-weight decision models compared to ordinary least squares regression

10.32469/10355/44715 ◽

2013 ◽

Author(s):

Nicholas Robert Brown

Keyword(s):

Least Squares ◽

Predictive Accuracy ◽

Ordinary Least Squares ◽

Decision Models ◽

Least Squares Regression ◽

Ordinary Least Squares Regression ◽

Fixed Weight ◽

Accuracy Performance

Download Full-text

Particles Counting in Intracellular Images by Partial Least Squares Regression and HLAC Feature between Multiple Features

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.135.236 ◽

2015 ◽

Vol 135 (2) ◽

pp. 236-243

Author(s):

Shohei Kumagai ◽

Kazuhiro Hotta

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Multiple Features

Download Full-text

Use of reflectance spectroscopy to estimate the organic carbon and CaCO3 contents of soils

Agrokémia és Talajtan ◽

10.1556/agrokem.60.2012.2.5 ◽

2012 ◽

Vol 61 (2) ◽

pp. 277-290 ◽

Cited By ~ 1

Author(s):

Ádám Csorba ◽

Vince Láng ◽

László Fenyvesi ◽

Erika Michéli

Keyword(s):

Organic Carbon ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Mean Squared Error ◽

Reflectance Spectroscopy ◽

Least Squares Regression ◽

Root Mean Squared Error ◽

Squared Error

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text

Algorithm and BASIC program for ordinary least-squares regression in two and three dimensions

Open-File Report ◽

10.3133/ofr78876 ◽

1978 ◽

Author(s):

G.R. Olhoeft

Keyword(s):

Least Squares ◽

Ordinary Least Squares ◽

Basic Program ◽

Three Dimensions ◽

Least Squares Regression ◽

Ordinary Least Squares Regression

Download Full-text