Sparse Logistic Regression: Comparison of Regularization and Bayesian Implementations

In knowledge-based systems, besides obtaining good output prediction accuracy, it is crucial to understand the subset of input variables that have most influence on the output, with the goal of gaining deeper insight into the underlying process. These requirements call for logistic model estimation techniques that provide a sparse solution, i.e., where coefficients associated with non-important variables are set to zero. In this work we compare the performance of two methods: the first one is based on the well known Least Absolute Shrinkage and Selection Operator (LASSO) which involves regularization with an ℓ 1 norm; the second one is the Relevance Vector Machine (RVM) which is based on a Bayesian implementation of the linear logistic model. The two methods are extensively compared in this paper, on real and simulated datasets. Results show that, in general, the two approaches are comparable in terms of prediction performance. RVM outperforms the LASSO both in term of structure recovery (estimation of the correct non-zero model coefficients) and prediction accuracy when the dimensionality of the data tends to increase. However, LASSO shows comparable performance to RVM when the dimensionality of the data is much higher than number of samples that is p > > n .

Download Full-text

Changes in Students' Science Ability Produced by Multimedia Learning Environments: Application of the Linear Logistic Model for Change

School Science and Mathematics ◽

10.1111/j.1949-8594.2002.tb18192.x ◽

2002 ◽

Vol 102 (1) ◽

pp. 15-24 ◽

Cited By ~ 1

Author(s):

Dimiter M. Dimitrov ◽

Steven McGee ◽

Bruce C. Howard

Keyword(s):

Learning Environments ◽

Logistic Model ◽

Multimedia Learning ◽

Linear Logistic Model

Download Full-text

Utilization of Classification Techniques for the Determination of Liquefaction Susceptibility of Soils

Advances in Computational Intelligence and Robotics - Handbook of Research on Advanced Hybrid Intelligent Techniques and Applications ◽

10.4018/978-1-4666-9474-3.ch005 ◽

2016 ◽

pp. 124-160 ◽

Cited By ~ 1

Author(s):

J. Jagan ◽

Prabhakar Gundlapalli ◽

Pijush Samui

Keyword(s):

Support Vector Machine ◽

Earthquake Engineering ◽

Relevance Vector Machine ◽

Least Square ◽

Support Vector ◽

Ground Acceleration ◽

Geotechnical Earthquake Engineering ◽

Liquefaction Susceptibility ◽

Input Variables

The determination of liquefaction susceptibility of soil is a paramount project in geotechnical earthquake engineering. This chapter adopts Support Vector Machine (SVM), Relevance Vector Machine (RVM) and Least Square Support Vector Machine (LSSVM) for determination of liquefaction susceptibility based on Cone Penetration Test (CPT) from Chi-Chi earthquake. Input variables of SVM, RVM and LSSVM are Cone Resistance (qc) and Peak Ground Acceleration (amax/g). SVM, RVM and LSSVM have been used as classification tools. The developed SVM, RVM and LSSVM give equations for determination of liquefaction susceptibility of soil. The comparison between the developed models has been carried out. The results show that SVM, RVM and LSSVM are the robust models for determination of liquefaction susceptibility of soil.

Download Full-text

Improving the predictive ability of the signal-averaged electrocardiogram with a linear logistic model incorporating clinical variables.

Circulation ◽

10.1161/01.cir.81.3.797 ◽

1990 ◽

Vol 81 (3) ◽

pp. 797-804 ◽

Cited By ~ 15

Author(s):

P J Vatterott ◽

K R Bailey ◽

S C Hammill

Keyword(s):

Logistic Model ◽

Predictive Ability ◽

Clinical Variables ◽

Linear Logistic Model

Download Full-text

Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted elastic net regression methods

International Journal of Advanced Statistics and Probability ◽

10.14419/ijasp.v3i1.4364 ◽

2015 ◽

Vol 3 (1) ◽

pp. 93

Author(s):

Pascalis Kadaro Matthew ◽

Abubakar Yahaya

Keyword(s):

Linear Regression ◽

Prediction Accuracy ◽

Penalized Regression ◽

Ordinary Least Squares ◽

Complex Model ◽

Elastic Net ◽

Data Set ◽

Regression Methods ◽

Regression Techniques ◽

Selection Operator

<p>Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.</p>

Download Full-text

Reservoir Evaporation Prediction Modeling Based on Artificial Intelligence Methods

Water ◽

10.3390/w11061226 ◽

2019 ◽

Vol 11 (6) ◽

pp. 1226 ◽

Cited By ~ 2

Author(s):

Mohammed Falah Allawi ◽

Faridah Binti Othman ◽

Haitham Abdulmohsin Afan ◽

Ali Najah Ahmed ◽

Md. Shabbir Hossain ◽

...

Keyword(s):

Artificial Intelligence ◽

Prediction Accuracy ◽

Evaporation Rate ◽

Climatic Conditions ◽

Support Vector ◽

Time Increment ◽

Accuracy Level ◽

Increment Rate ◽

Input Variables ◽

The Impact

The current study explored the impact of climatic conditions on predicting evaporation from a reservoir. Several models have been developed for evaporation prediction under different scenarios, with artificial intelligence (AI) methods being the most popular. However, the existing models rely on several climatic parameters as inputs to achieve an acceptable accuracy level, some of which have been unavailable in certain case studies. In addition, the existing AI-based models for evaporation prediction have paid less attention to the influence of the time increment rate on the prediction accuracy level. This study investigated the ability of the radial basis function neural network (RBF-NN) and support vector regression (SVR) methods to develop an evaporation rate prediction model for a tropical area at the Layang Reservoir, Johor River, Malaysia. Two scenarios for input architecture were explored in order to examine the effectiveness of different input variable patterns on the model prediction accuracy. For the first scenario, the input architecture considered only the historical evaporation rate time series, while the mean temperature and evaporation rate were used as input variables for the second scenario. For both scenarios, three time-increment series (daily, weekly, and monthly) were considered.

Download Full-text

Knowledge-based classification of fine-grained immune cell types in single-cell RNA-Seq data with ImmClassifier

10.1101/2020.03.23.002758 ◽

2020 ◽

Author(s):

Xuan Liu ◽

Sara J.C. Gosline ◽

Lance T. Pflieger ◽

Pierre Wallet ◽

Archana Iyer ◽

...

Keyword(s):

T Cells ◽

Single Cell ◽

Immune Cells ◽

Immune Cell ◽

Cell Types ◽

Effector Memory ◽

Immune Cell Population ◽

Fine Grained ◽

Knowledge Based ◽

Comparable Performance

AbstractSingle-cell RNA sequencing is an emerging strategy for characterizing the immune cell population in diverse environments including blood, tumor or healthy tissues. While this has traditionally been done with flow or mass cytometry targeting protein expression, scRNA-Seq has several established and potential advantages in that it can profile immune cells and non-immune cells (e.g. cancer cells) in the same sample, identify cell types that lack precise markers for flow cytometry, or identify a potentially larger number of immune cell types and activation states than is achievable in a single flow assay. However, scRNA-Seq is currently limited due to the need to identify the types of each immune cell from its transcriptional profile, which is not only time-consuming but also requires a significant knowledge of immunology. While recently developed algorithms accurately annotate coarse cell types (e.g. T cells vs macrophages), making fine distinctions has turned out to be a difficult challenge. To address this, we developed a machine learning classifier called ImmClassifier that leverages a hierarchical ontology of cell type. We demonstrate that ImmClassifier outperforms other tools (+20% recall, +14% precision) in distinguishing fine-grained cell types (e.g. CD8+ effector memory T cells) with comparable performance on coarse ones. Thus, ImmClassifier can be used to explore more deeply the heterogeneity of the immune system in scRNA-Seq experiments.

Download Full-text

Modeling and Optimizing a Chiller System Using a Machine Learning Algorithm

Energies ◽

10.3390/en12152860 ◽

2019 ◽

Vol 12 (15) ◽

pp. 2860 ◽

Cited By ~ 8

Author(s):

Jee-Heon Kim ◽

Nam-Chul Seong ◽

Wonchang Choi

Keyword(s):

Machine Learning ◽

Energy Consumption ◽

Prediction Accuracy ◽

Learning Algorithm ◽

Training Data ◽

Machine Learning Algorithm ◽

Air Conditioning System ◽

Energy Consumption Model ◽

Consumption Model ◽

Input Variables

This study was conducted to develop an energy consumption model of a chiller in a heating, ventilation, and air conditioning system using a machine learning algorithm based on artificial neural networks. The proposed chiller energy consumption model was evaluated for accuracy in terms of input layers that include the number of input variables, amount (proportion) of training data, and number of neurons. A standardized reference building was also modeled to generate operational data for the chiller system during extended cooling periods (warm weather months). The prediction accuracy of the chiller’s energy consumption was improved by increasing the number of input variables and adjusting the proportion of training data. By contrast, the effect of the number of neurons on the prediction accuracy was insignificant. The developed chiller model was able to predict energy consumption with 99.07% accuracy based on eight input variables, 60% training data, and 12 neurons.

Download Full-text

Using JAGS for Bayesian Cognitive Diagnosis Modeling: A Tutorial

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619826040 ◽

2019 ◽

Vol 44 (4) ◽

pp. 473-503 ◽

Cited By ~ 11

Author(s):

Peida Zhan ◽

Hong Jiao ◽

Kaiwen Man ◽

Lijun Wang

Keyword(s):

Gibbs Sampler ◽

Logistic Model ◽

Structural Model ◽

Higher Order ◽

Unified Model ◽

Cognitive Diagnosis ◽

Cognitive Diagnosis Models ◽

Log Linear ◽

Linear Logistic Model ◽

Or Gate

In this article, we systematically introduce the just another Gibbs sampler (JAGS) software program to fit common Bayesian cognitive diagnosis models (CDMs) including the deterministic inputs, noisy “and” gate model; the deterministic inputs, noisy “or” gate model; the linear logistic model; the reduced reparameterized unified model; and the log-linear CDM (LCDM). Further, we introduce the unstructured latent structural model and the higher order latent structural model. We also show how to extend these models to consider polytomous attributes, the testlet effect, and longitudinal diagnosis. Finally, we present an empirical example as a tutorial to illustrate how to use JAGS codes in R.

Download Full-text

Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model

Protein Science ◽

10.1002/pro.2991 ◽

2016 ◽

Vol 25 (10) ◽

pp. 1825-1833 ◽

Cited By ~ 22

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zhu-Hong You ◽

Xing Chen ◽

Gui-Ying Yan ◽

...

Keyword(s):

Protein Interactions ◽

Prediction Accuracy ◽

Relevance Vector Machine ◽

Evolutionary Information ◽

Protein Protein Interactions ◽

Machine Model

Download Full-text

Improved Movie Recommendations Based on a Hybrid Feature Combination Method

Vietnam Journal of Computer Science ◽

10.1142/s2196888819500192 ◽

2019 ◽

Vol 06 (03) ◽

pp. 363-376 ◽

Cited By ~ 1

Author(s):

Gharbi Alshammari ◽

Stelios Kapetanakis ◽

Abdullah Alshammari ◽

Nikolaos Polatidis ◽

Miltos Petridis

Keyword(s):

Collaborative Filtering ◽

Recommender Systems ◽

Hybrid Method ◽

Prediction Accuracy ◽

Combination Method ◽

Evaluation Metrics ◽

Classification Algorithms ◽

Feature Combination ◽

Knowledge Based ◽

Content Based Filtering

Recommender systems help users find relevant items efficiently based on their interests and historical interactions with other users. They are beneficial to businesses by promoting the sale of products and to user by reducing the search burden. Recommender systems can be developed by employing different approaches, including collaborative filtering (CF), demographic filtering (DF), content-based filtering (CBF) and knowledge-based filtering (KBF). However, large amounts of data can produce recommendations that are limited in accuracy because of diversity and sparsity issues. In this paper, we propose a novel hybrid method that combines user–user CF with the attributes of DF to indicate the nearest users, and compare four classifiers against each other. This method has been developed through an investigation of ways to reduce the errors in rating predictions based on users’ past interactions, which leads to improved prediction accuracy in all four classification algorithms. We applied a feature combination method that improves the prediction accuracy and to test our approach, we ran an offline evaluation using the 1M MovieLens dataset, well-known evaluation metrics and comparisons between methods with the results validating our proposed method.

Download Full-text