scholarly journals Developing a Scalable and Accurate Job Recommendation System with Distributed Cluster System using Machine Learning Algorithm

2021 ◽  
Vol 7 (2) ◽  
pp. 71-78
Author(s):  
Timothy Dicky ◽  
Alva Erwin ◽  
Heru Purnomo Ipung

The purpose of this research is to develop a job recommender system based on the Hadoop MapReduce framework to achieve scalability of the system when it processes big data. Also, a machine learning algorithm is implemented inside the job recommender to produce an accurate job recommendation. The project begins by collecting sample data to build an accurate job recommender system with a centralized program architecture. Then a job recommender with a distributed system program architecture is implemented using Hadoop MapReduce which then deployed to a Hadoop cluster. After the implementation, both systems are tested using a large number of applicants and job data, with the time required for the program to compute the data is recorded to be analyzed. Based on the experiments, we conclude that the recommender produces the most accurate result when the cosine similarity measure is used inside the algorithm. Also, the centralized job recommender system is able to process the data faster compared to the distributed cluster job recommender system. But as the size of the data grows, the centralized system eventually will lack the capacity to process the data, while the distributed cluster job recommender is able to scale according to the size of the data.

2019 ◽  
Vol 8 (4) ◽  
pp. 2299-2302

Implementing a machine learning algorithm gives you a deep and practical appreciation for how the algorithm works. This knowledge can also help you to internalize the mathematical description of the algorithm by thinking of the vectors and matrices as arrays and the computational intuitions for the transformations on those structures. There are numerous micro-decisions required when implementing a machine learning algorithm, like Select programming language, Select Algorithm, Select Problem, Research Algorithm, Unit Test and these decisions are often missing from the formal algorithm descriptions. The notion of implementing a job recommendation (a classic machine learning problem) system using to two algorithms namely, KNN [3] and logistic regression [3] in more than one programming language (C++ and python) is introduced and we bring here the analysis and comparison of performance of each. We specifically focus on building a model for predictions of jobs in the field of computer sciences but they can be applied to a wide range of other areas as well. This paper can be used by implementers to deduce which language will best suite their needs to achieve accuracy along with efficiency We are using more than one algorithm to establish the fact that our finding is not just singularly applicable.


2020 ◽  
Vol 10 (4) ◽  
pp. 5-16
Author(s):  
V.A. Sudakov ◽  
I.A. Trofimov

The article proposes an unsupervised machine learning algorithm for assessing the most possible relationship between two elements of a set of customers and goods / services in order to build a recommendation system. Methods based on collaborative filtering and content-based filtering are considered. A combined algorithm for identifying relationships on sets has been developed, which combines the advantages of the analyzed approaches. The complexity of the algorithm is estimated. Recommendations are given on the efficient implementation of the algorithm in order to reduce the amount of memory used. Using the book recommendation problem as an example, the application of this combined algorithm is shown. This algorithm can be used for a “cold start” of a recommender system, when there are no labeled quality samples of training more complex models.


2021 ◽  
Vol 2069 (1) ◽  
pp. 012153
Author(s):  
Rania Labib

Abstract Architects often investigate the daylighting performance of hundreds of design solutions and configurations to ensure an energy-efficient solution for their designs. To shorten the time required for daylighting simulations, architects usually reduce the number of variables or parameters of the building and facade design. This practice usually results in the elimination of design variables that could contribute to an energy-optimized design configuration. Therefore, recent research has focused on incorporating machine learning algorithms that require the execution of only a relatively small subset of the simulations to predict the daylighting and energy performance of buildings. Although machine learning has been shown to be accurate, it still becomes a time-consuming process due to the time required to execute a set of simulations to be used as training and validation data. Furthermore, to save time, designers often decide to use a small simulation subset, which leads to a poorly designed machine learning algorithm that produces inaccurate results. Therefore, this study aims to introduce an automated framework that utilizes high performance computing (HPC) to execute the simulations necessary for the machine learning algorithm while saving time and effort. High performance computing facilitates the execution of thousands of tasks simultaneously for a time-efficient simulation process, therefore allowing designers to increase the size of the simulation’s subset. Pairing high performance computing with machine learning allows for accurate and nearly instantaneous building performance predictions.


2019 ◽  
Vol 18 (01) ◽  
pp. 1950011 ◽  
Author(s):  
Jasem M. Alostad

With recent advances in e-commerce platforms, the information overload has grown due to increasing number of users, rapid generation of data and items in the recommender system. This tends to create serious problems in such recommender systems. The increasing features in recommender systems pose some new challenges due to poor resilience to mitigate against vulnerable attacks. In particular, the recommender systems are more prone to be attacked by shilling attacks, which creates more vulnerability. A recommender system with poor detection of attacks leads to a reduced detection rate. The performance of the recommender system is thus affected with poor detection ability. Hence, in this paper, we improve the resilience against shilling attacks using a modified Support Vector Machine (SVM) and a machine learning algorithm. The Gaussian Mixture Model is used as a machine learning algorithm to increase the detection rate and it further reduces the dimensionality of data in recommender systems. The proposed method is evaluated against several result metrics, such as the recall rate, precision rate and false positive rate between different attacks. The results of the proposed system are evaluated against probabilistic recommender approaches to demonstrate the efficacy of machine learning language in recommender systems.


Author(s):  
Man Tianxing ◽  
Ildar Raisovich Baimuratov ◽  
Natalia Alexandrovna Zhukova

With the development of the Big Data, data analysis technology has been actively developed, and now it is used in various subject fields. More and more non-computer professional researchers use machine learning algorithms in their work. Unfortunately, datasets can be messy and knowledge cannot be directly extracted, which is why they need preprocessing. Because of the diversity of the algorithms, it is difficult for researchers to find the most suitable algorithm. Most of them choose algorithms through their intuition. The result is often unsatisfactory. Therefore, this article proposes a recommendation system for data processing. This system consists of an ontology subsystem and an estimation subsystem. Ontology technology is used to represent machine learning algorithm taxonomy, and information-theoretic based criteria are used to form recommendations. This system helps users to apply data processing algorithms without specific knowledge from the data science field.


2018 ◽  
Author(s):  
C.H.B. van Niftrik ◽  
F. van der Wouden ◽  
V. Staartjes ◽  
J. Fierstra ◽  
M. Stienen ◽  
...  

Author(s):  
Kunal Parikh ◽  
Tanvi Makadia ◽  
Harshil Patel

Dengue is unquestionably one of the biggest health concerns in India and for many other developing countries. Unfortunately, many people have lost their lives because of it. Every year, approximately 390 million dengue infections occur around the world among which 500,000 people are seriously infected and 25,000 people have died annually. Many factors could cause dengue such as temperature, humidity, precipitation, inadequate public health, and many others. In this paper, we are proposing a method to perform predictive analytics on dengue’s dataset using KNN: a machine-learning algorithm. This analysis would help in the prediction of future cases and we could save the lives of many.


Sign in / Sign up

Export Citation Format

Share Document