scholarly journals Iterative Collaborative Filtering for Sparse Matrix Estimation

2021 ◽  
Author(s):  
Christian Borgs ◽  
Jennifer T. Chayes ◽  
Devavrat Shah ◽  
Christina Lee Yu

Matrix estimation or completion has served as a canonical mathematical model for recommendation systems. More recently, it has emerged as a fundamental building block for data analysis as a first step to denoise the observations and predict missing values. Since the dawn of e-commerce, similarity-based collaborative filtering has been used as a heuristic for matrix etimation. At its core, it encodes typical human behavior: you ask your friends to recommend what you may like or dislike. Algorithmically, friends are similar “rows” or “columns” of the underlying matrix. The traditional heuristic for computing similarities between rows has costly requirements on the density of observed entries. In “Iterative Collaborative Filtering for Sparse Matrix Estimation” by Christian Borgs, Jennifer T. Chayes, Devavrat Shah, and Christina Lee Yu, the authors introduce an algorithm that computes similarities in sparse datasets by comparing expanded local neighborhoods in the associated data graph: in effect, you ask friends of your friends to recommend what you may like or dislike. This work provides bounds on the max entry-wise error of their estimate for low rank and approximately low rank matrices, which is stronger than the aggregate mean squared error bounds found in classical works. The algorithm is also interpretable, scalable, and amenable to distributed implementation.

2021 ◽  
pp. 202-208
Author(s):  
Daniel Theodorus ◽  
Sarjon Defit ◽  
Gunadi Widi Nurcahyo

Industri 4.0 mendorong banyak perusahaan bertransformasi ke sistem digital. Machine Learning merupakan salah satu solusi dalam analisa data. Analisa data menjadi poin penting dalam memberikan layanan yang terbaik (user experience) kepada pelanggan. Lokasi yang diangkat dalam penelitian ini adalah PT. Sentral Tukang Indonesia yang bergerak dalam bidang penjualan bahan bangunan dan alat pertukangan seperti: cat, tripleks, aluminium, keramik, dan hpl. Dengan banyaknya data yang tersedia, menyebabkan perusahaan mengalami kesulitan dalam memberikan rekomendasi produk kepada pelanggan. Sistem rekomendasi muncul sebagai solusi dalam memberikan rekomendasi produk,  berdasarkan interaksi antara pelanggan dengan pelanggan lainnya yang terdapat di dalam data histori penjualan. Tujuan dari penelitian ini adalah Membantu perusahaan dalam memberikan rekomendasi produk sehingga dapat meningkatkan penjualan, memudahkan pelanggan untuk menemukan produk yang dibutuhkan, dan meningkatkan layanan yang terbaik kepada pelanggan.Data yang digunakan adalah data histori penjualan dalam 1 periode (Q1 2021), data pelanggan, dan data produk pada PT. Sentral Tukang Indonesia. Data histori penjualan tersebut akan dibagi menjadi 80% untuk dataset training dan 20% untuk dataset testing. Metode Item-based Collaborative Filtering pada penelitian ini memakai algoritma Cosine Similarity untuk menghitung tingkat kemiripan antar produk. Prediksi score memakai rumus Weighted Sum dan dalam menghitung tingkat error memakai rumus Root Mean Squared Error. Hasil dari penelitian ini memperlihatkan rekomendasi top 10 produk per pelanggan. Produk yang tampil merupakan produk yang memiliki score tertinggi dari pelanggan tersebut. Penelitian ini dapat menjadi referensi dan acuan bagi perusahaan dalam memberikan rekomendasi produk yang dibutuhkan oleh pelanggan.


Methodology ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. 189-204
Author(s):  
Cailey E. Fitzgerald ◽  
Ryne Estabrook ◽  
Daniel P. Martin ◽  
Andreas M. Brandmaier ◽  
Timo von Oertzen

Missing data are ubiquitous in psychological research. They may come about as an unwanted result of coding or computer error, participants' non-response or absence, or missing values may be intentional, as in planned missing designs. We discuss the effects of missing data on χ²-based goodness-of-fit indices in Structural Equation Modeling (SEM), specifically on the Root Mean Squared Error of Approximation (RMSEA). We use simulations to show that naive implementations of the RMSEA have a downward bias in the presence of missing data and, thus, overestimate model goodness-of-fit. Unfortunately, many state-of-the-art software packages report the biased form of RMSEA. As a consequence, the scientific community may have been accepting a much larger fraction of models with non-acceptable model fit. We propose a bias-correction for the RMSEA based on information-theoretic considerations that take into account the expected misfit of a person with fully observed data. The corrected RMSEA is asymptotically independent of the proportion of missing data for misspecified models. Importantly, results of the corrected RMSEA computation are identical to naive RMSEA if there are no missing data.


Author(s):  
Yi Tay ◽  
Shuai Zhang ◽  
Anh Tuan Luu ◽  
Siu Cheung Hui ◽  
Lina Yao ◽  
...  

Factorization Machines (FMs) are a class of popular algorithms that have been widely adopted for collaborative filtering and recommendation tasks. FMs are characterized by its usage of the inner product of factorized parameters to model pairwise feature interactions, making it highly expressive and powerful. This paper proposes Holographic Factorization Machines (HFM), a new novel method of enhancing the representation capability of FMs without increasing its parameter size. Our approach replaces the inner product in FMs with holographic reduced representations (HRRs), which are theoretically motivated by associative retrieval and compressed outer products. Empirically, we found that this leads to consistent improvements over vanilla FMs by up to 4% improvement in terms of mean squared error, with improvements larger at smaller parameterization. Additionally, we propose a neural adaptation of HFM which enhances its capability to handle nonlinear structures. We conduct extensive experiments on nine publicly available datasets for collaborative filtering with explicit feedback. HFM achieves state-of-theart performance on all nine, outperforming strong competitors such as Attentional Factorization Machines (AFM) and Neural Matrix Factorization (NeuMF).


Author(s):  
Chisimkwuo John ◽  
Emmanuel J. Ekpenyong ◽  
Charles C. Nworu

This study assessed five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Squares imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables (Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD) from Nigeria quarterly monetary aggregate dataset from 1981 to 2019 using R-software. The data were collected from the Central Bank of Nigeria statistical bulletin. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.


2020 ◽  
Vol 10 (10) ◽  
pp. 3395 ◽  
Author(s):  
Yeonbin Son ◽  
Yerim Choi

As language editing became an essential process for enhancing the quality of a research manuscript, there are several companies providing manuscript editing services. In such companies, a manuscript submitted for proofreading is matched with an editing expert through a manual process, which is costly and often subjective. The major drawback of the manual process is that it is almost impossible to consider the inherent characteristics of a manuscript such as writing style and paragraph composition. To this end, we propose an expert recommendation method for manuscript editing services based on matrix factorization, a well-known collaborative filtering approach for learning latent information in ordinal ratings given by users. Specifically, binary ratings are utilized to substitute ordinal ratings when negative opinions are expressed by users since negative opinions are more accurately expressed by binary ratings than ordinal ratings. From the experiments using a real-world dataset, the proposed method outperformed the rest of the compared methods with an RMSE (root mean squared error) of 0.1. Moreover, the effectiveness of substituting ordinal ratings with binary ratings was validated by conducting sentiment analysis on text reviews.


2020 ◽  
Vol 10 (16) ◽  
pp. 5510 ◽  
Author(s):  
Diana Ferreira ◽  
Sofia Silva ◽  
António Abelha ◽  
José Machado

The magnitude of the daily explosion of high volumes of data has led to the emergence of the Big Data paradigm. The ever-increasing amount of information available on the Internet makes it increasingly difficult for individuals to find what they need quickly and easily. Recommendation systems have appeared as a solution to overcome this problem. Collaborative filtering is widely used in this type of systems, but high dimensions and data sparsity are always a main problem. With the idea of deep learning gaining more importance, several works have emerged to improve this type of filtering. In this article, a product recommendation system is proposed where an autoencoder based on a collaborative filtering method is employed. A comparison of this model with the Singular Value Decomposition is made and presented in the results section. Our experiment shows a very low Root Mean Squared Error (RMSE) value, considering that the recommendations presented to the users are in line with their interests and are not affected by the data sparsity problem as the datasets are very sparse, 0.996. The results are quite promising achieving an RMSE value of 0.029 in the first dataset and 0.010 in the second one.


2018 ◽  
Vol 28 (5) ◽  
pp. 1311-1327 ◽  
Author(s):  
Faisal M Zahid ◽  
Christian Heumann

Missing data is a common issue that can cause problems in estimation and inference in biomedical, epidemiological and social research. Multiple imputation is an increasingly popular approach for handling missing data. In case of a large number of covariates with missing data, existing multiple imputation software packages may not work properly and often produce errors. We propose a multiple imputation algorithm called mispr based on sequential penalized regression models. Each variable with missing values is assumed to have a different distributional form and is imputed with its own imputation model using the ridge penalty. In the case of a large number of predictors with respect to the sample size, the use of a quadratic penalty guarantees unique estimates for the parameters and leads to better predictions than the usual Maximum Likelihood Estimation (MLE), with a good compromise between bias and variance. As a result, the proposed algorithm performs well and provides imputed values that are better even for a large number of covariates with small samples. The results are compared with the existing software packages mice, VIM and Amelia in simulation studies. The missing at random mechanism was the main assumption in the simulation study. The imputation performance of the proposed algorithm is evaluated with mean squared imputation error and mean absolute imputation error. The mean squared error ([Formula: see text]), parameter estimates with their standard errors and confidence intervals are also computed to compare the performance in the regression context. The proposed algorithm is observed to be a good competitor to the existing algorithms, with smaller mean squared imputation error, mean absolute imputation error and mean squared error. The algorithm’s performance becomes considerably better than that of the existing algorithms with increasing number of covariates, especially when the number of predictors is close to or even greater than the sample size. Two real-life datasets are also used to examine the performance of the proposed algorithm using simulations.


2018 ◽  
Vol 291 ◽  
pp. 71-83 ◽  
Author(s):  
Xixi Jia ◽  
Xiangchu Feng ◽  
Weiwei Wang ◽  
Chen Xu ◽  
Lei Zhang

2017 ◽  
Vol 10 (04) ◽  
pp. 773-779
Author(s):  
V.B. Kamble ◽  
S.N. Deshmukh

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.


Sign in / Sign up

Export Citation Format

Share Document