scholarly journals An enhanced framework for solving cold start problem in movie recommendation systems

Author(s):  
Salma Adel Elzeheiry ◽  
N. E. Mekky ◽  
A. Atwan ◽  
Noha A. Hikal

<p>Recommendation systems (RSs) are used to obtain advice regarding decision-making. RSs have the shortcoming that a system cannot draw inferences for users or items regarding which it has not yet gathered sufficient information. This issue is known as the cold start issue. Aiming to alleviate the user’s cold start issue, the proposed recommendation algorithm combined tag data and logistic regression classification to predict the probability of the movies for a new user. First using alternating least square to extract product feature, and then diminish the feature vector by combining principal component analysis with logistic regression to predict the probability of genres of the movies. Finally, combining the most relevant tags based on similarity score with probability and find top N movies with high scores to the user. The proposed model is assessed using the root mean square error (RMSE), the mean absolute error (MAE), recall@N and precision@N and it is applied to 1M, 10M and 20M MovieLens datasets, resulting in an accuracy of 0.8806, 0.8791 and 0.8739.</p>

Author(s):  
Ali M. Ahmed Al-Sabaawi ◽  
Hacer Karacan ◽  
Yusuf Erkan Yenice

Recommendation systems (RSs) are tools for interacting with large and complex information spaces. They provide a personalized view of such spaces, prioritizing items likely to be of interest to the user. The main objective of RSs is to tool up users with desired items that meet their preferences. A major problem in RSs is called: “cold-start”; it is a potential problem called so in computer-based information systems which comprises a degree of automated data modeling. Particularly, it concerns the issue in which the system cannot draw any inferences nor have it yet gathered sufficient information about users or items. Since RSs performance is substantially limited by cold-start users and cold-start items problems; this research study takes the route for a major aim to attenuate users’ cold-start problem. Still in the process of researching, sundry studies have been conducted to tackle this issue by using clustering techniques to group users according to their social relations, their ratings or both. However, a clustering technique disregards a variety of users’ tastes. In this case, the researcher has adopted the overlapping technique as a tool to deal with the clustering technique’s defects. The advantage of the overlapping technique excels over others by allowing users to belong to multi-clusters at the same time according to their behavior in the social network and ratings feedback. On that account, a novel overlapping method is presented and applied. This latter is executed by using the partitioning around medoids (PAM) algorithm to implement the clustering, which is achieved by means of exploiting social relations and confidence values. After acquiring users’ clusters, the average distances are computed in each cluster. Thereafter, a content comparison is made regarding the distances between every user and the computed distances of the clusters. If the comparison result is less than or equal to the average distance of a cluster, a new user is added to this cluster. The singular value decomposition plus (SVD[Formula: see text]) method is then applied to every cluster to compute predictions values. The outcome is calculated by computing the average of mean absolute error (MAE) and root mean square error (RMSE) for every cluster. The model is tested by two real world datasets: Ciao and FilmTrust. Ultimately, findings have exhibited a great deal of insights on how the proposed model outperformed a number of the state-of-the-art studies in terms of prediction accuracy.


2015 ◽  
Vol 76 (13) ◽  
Author(s):  
Siraj Muhammed Pandhiani ◽  
Ani Shabri

In this study, new hybrid model is developed by integrating two models, the discrete wavelet transform and least square support vector machine (WLSSVM) model. The hybrid model is then used to measure for monthly stream flow forecasting for two major rivers in Pakistan. The monthly stream flow forecasting results are obtained by applying this model individually to forecast the rivers flow data of the Indus River and Neelum Rivers. The root mean square error (RMSE), mean absolute error (MAE) and the correlation (R) statistics are used for evaluating the accuracy of the WLSSVM, the proposed model. The results are compared with the results obtained through LSSVM. The outcome of such comparison shows that WLSSVM model is more accurate and efficient than LSSVM.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Zhichao Deng ◽  
Meiji Yan ◽  
Xu Xiao

In this paper, we propose an early warning model of credit risk for cross-border e-commerce. Our proposed model, i.e., KPCA-MPSO-BP, is constructed using kernel principal component analysis (KPCA), improved particle swarm optimization (IPSO), and BP neural network. Initially, we use KPCA to reduce the credit risk index for cross-border e-commerce. Next, the inertia weight and threshold of BP neural network are searched using MPSO. Finally, BP neural network is used for training the data of 13 different enterprises of cross-border e-commerce’s credit risk. To analyze the efficiency of our proposed approach, we use the data of five different enterprises for testing and evaluation. The experimental results show that the mean absolute error (MAE) and root mean square error (RMSE) of our model are the lowest in comparison to the existing models and have much better efficiency.


2018 ◽  
Vol 5 (1) ◽  
pp. 44-57 ◽  
Author(s):  
Santosh Kumar Sahoo ◽  
B. B. Choudhury

This article proposes a unique optimization algorithm like Adaptive Cuckoo Search (AdCS) algorithm followed by an Intrinsic Discriminant Analysis (IDA) to design an intelligent object classifier for inspection of defective object like bottle in a manufacturing unit. By using this methodology the response time is very faster than the other techniques. The projected scheme is authenticated using different bench mark test functions along with an effective inspection procedure for identification of bottle by using AdCS, Principal-Component-Analysis (PCA) and IDA. Due to this the projected procedures terms as PCA+IDA for dimension reduction in addition to this AdCS-IDA for classification or identification of defective bottles. The analyzed response obtained from by an application of AdCS algorithm followed by IDA and compared to other algorithm like Least-Square-Support-Vector-Machine (LSSVM), Linear Kernel Radial-Basic-Function (RBF) to the proposed model, the earlier applied scheme reveals the remarkable performance.


2018 ◽  
Vol 7 (4.30) ◽  
pp. 573
Author(s):  
Shuhaida Ismail ◽  
Ani Shabri ◽  
Aida Mustapha ◽  
Siraj Mohammed Pandhiani

The ability of obtain accurate information on future river flow is a fundamental key for water resources planning, and management. Traditionally, single models have been introduced to predict the future value of river flow. This paper investigates the ability of Principal Component Analysis as dimensionality reduction technique and combined with single Support Vector Machine and Least Square Support Vector Machine, referred to as PCA-SVM and PCA-LSSVM. This study also presents comparison between the proposed models with single models of SVM and LSSVM. These models are ranked based on four statistical measures namely Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Correlation Coefficient ( ), and Correlation of Efficiency (CE). The results shows that PCA combined with LSSVM has better performance compared to other models. The best ranked models are then measured using Mean of Forecasting Error (MFE) to determine its forecast rate. PCA-LSSVM proven to be better model as it also indicates a small percentage of under-predicted values compared to the observed river flow values of 0.89% for Tualang river while over-predicted by 2. 08% for Bernam river. The study concludes by recommending the PCA as dimension reduction approach combined with LSSVM for river flow forecasting due to better prediction results and stability than those achieved from single models  


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Fatemeh Ahouz ◽  
Amin Golabpour

Abstract Background The high prevalence of COVID-19 has made it a new pandemic. Predicting both its prevalence and incidence throughout the world is crucial to help health professionals make key decisions. In this study, we aim to predict the incidence of COVID-19 within a two-week period to better manage the disease. Methods The COVID-19 datasets provided by Johns Hopkins University, contain information on COVID-19 cases in different geographic regions since January 22, 2020 and are updated daily. Data from 252 such regions were analyzed as of March 29, 2020, with 17,136 records and 4 variables, namely latitude, longitude, date, and records. In order to design the incidence pattern for each geographic region, the information was utilized on the region and its neighboring areas gathered 2 weeks prior to the designing. Then, a model was developed to predict the incidence rate for the coming 2 weeks via a Least-Square Boosting Classification algorithm. Results The model was presented for three groups based on the incidence rate: less than 200, between 200 and 1000, and above 1000. The mean absolute error of model evaluation were 4.71, 8.54, and 6.13%, respectively. Also, comparing the forecast results with the actual values in the period in question showed that the proposed model predicted the number of globally confirmed cases of COVID-19 with a very high accuracy of 98.45%. Conclusion Using data from different geographical regions within a country and discovering the pattern of prevalence in a region and its neighboring areas, our boosting-based model was able to accurately predict the incidence of COVID-19 within a two-week period.


2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Khairunnisa Khairunnisa ◽  
Rizka Pitri ◽  
Victor P Butar-Butar ◽  
Agus M Soleh

This research used CFSRv2 data as output data general circulation model. CFSRv2 involves some variables data with high correlation, so in this research is using principal component regression (PCR) and partial least square (PLS) to solve the multicollinearity occurring in CFSRv2 data. This research aims to determine the best model between PCR and PLS to estimate rainfall at Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station by comparing RMSEP value and correlation value. Size used was 3×3, 4×4, 5×5, 6×6, 7×7, 8×8, 9×9, and 11×11 that was located between (-40) N - (-90) S and 1050 E -1100 E with a grid size of 0.5×0.5 The PLS model was the best model used in stastistical downscaling in this research than PCR model because of the PLS model obtained the lower RMSEP value and the higher correlation value. The best domain and RMSEP value for Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station is 9 × 9 with 100.06, 6 × 6 with 194.3, 8 × 8 with 117.6, and 6 × 6 with 108.2, respectively.


Author(s):  
Zoryna Yurynets ◽  
Rostyslav Yurynets ◽  
Nataliya Kunanets ◽  
Ivanna Myshchyshyn

In the current conditions of economic development, it is important to pay attention to the study of the main types of risks, effective methods of evaluation, monitoring, analysis of banking risks. One of the main approaches to quantitatively assessing the creditworthiness of borrowers is credit scoring. The objective of credit scoring is to optimize management decisions regarding the possibility of providing bank loans. In the article, the scientific and methodological provisions concerning the formation of a regression model for assessing bank risks in the process of granting loans to borrowers has been proposed. The proposed model is based on the use of logistic regression tools, discriminant analysis with the use of expert evaluation. During the formation of a regression model, the relationship between risk factors and probable magnitude of loan risk has been established. In the course of calculations, the coefficient of the individual's solvency has been calculated. Direct computer data preparation, including the calculation of the indicators selected in the process of discriminant analysis, has been carried out in the Excel package environment, followed by their import into the STATISTICA package for analysis in the “Logistic regression” sub-module of the “Nonlinear evaluation” module. The adequacy of the constructed model has been determined using the Macfaden's likelihood ratio index. The calculated value of the Macfaden's likelihood ratio index indicates the adequacy of the constructed model. The ability to issue loans to new clients has been evaluated using a regression model. The conducted calculations show the possibility of granting a loan exclusively to the second and third clients. The offered method allows to conduct assessment of client's solvency and risk prevention at different stages of lending, facilitates the possibility to independently make informed decisions on credit servicing of clients and management of a loan portfolio, optimization of management decisions in banks. In order for a loan-based model to continue to perform its functions, it must be periodically adjusted.


Sign in / Sign up

Export Citation Format

Share Document