The Stock Exchange Prediction using Machine Learning Techniques: A Comprehensive and Systematic Literature Review

Rico Bayu Wiranata; Arif Djunaidy

doi:10.21609/jiki.v14i2.935

The Stock Exchange Prediction using Machine Learning Techniques: A Comprehensive and Systematic Literature Review

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v14i2.935 ◽

2021 ◽

Vol 14 (2) ◽

pp. 91-112

Author(s):

Rico Bayu Wiranata ◽

Arif Djunaidy

Keyword(s):

Literature Review ◽

Learning Algorithm ◽

Stock Exchange ◽

Majority Vote ◽

Estimation Method ◽

Small Error ◽

Machine Learning Techniques ◽

Data Sets ◽

Stock Prediction ◽

Data Set

This literature review identifies and analyzes research topic trends, types of data sets, learning algorithm, methods improvements, and frameworks used in stock exchange prediction. A total of 81 studies were investigated, which were published regarding stock predictions in the period January 2015 to June 2020 which took into account the inclusion and exclusion criteria. The literature review methodology is carried out in three major phases: review planning, implementation, and report preparation, in nine steps from defining systematic review requirements to presentation of results. Estimation or regression, clustering, association, classification, and preprocessing analysis of data sets are the five main focuses revealed in the main study of stock prediction research. The classification method gets a share of 35.80% from related studies, the estimation method is 56.79%, data analytics is 4.94%, the rest is clustering and association is 1.23%. Furthermore, the use of the technical indicator data set is 74.07%, the rest are combinations of datasets. To develop a stock prediction model 48 different methods have been applied, 9 of the most widely applied methods were identified. The best method in terms of accuracy and also small error rate such as SVM, DNN, CNN, RNN, LSTM, bagging ensembles such as RF, boosting ensembles such as XGBoost, ensemble majority vote and the meta-learner approach is ensemble Stacking. Several techniques are proposed to improve prediction accuracy by combining several methods, using boosting algorithms, adding feature selection and using parameter and hyper-parameter optimization.

Download Full-text

A Distributed Content-Based Video Retrieval System for Large Data-sets

10.21203/rs.3.rs-255106/v1 ◽

2021 ◽

Author(s):

ElMehdi SAOUDI ◽

Said Jai Andaloussi

Keyword(s):

Real Time ◽

Retrieval System ◽

Video Retrieval ◽

Multimedia Databases ◽

Video Data ◽

Multimedia Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Data Set ◽

Content Based Video Retrieval

Abstract With the rapid growth of the volume of video data and the development of multimedia technologies, it has become necessary to have the ability to accurately and quickly browse and search through information stored in large multimedia databases. For this purpose, content-based video retrieval ( CBVR ) has become an active area of research over the last decade. In this paper, We propose a content-based video retrieval system providing similar videos from a large multimedia data-set based on a query video. The approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key-frames for rapid browsing and efficient video indexing. We have implemented the proposed approach on both, single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments are performed using various benchmark action and activity recognition data-sets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to state-of-the-art methods.

Download Full-text

Water Quality Prediction Using Statistical Tool and Machine Learning Algorithm

Waste Management ◽

10.4018/978-1-7998-1210-4.ch029 ◽

2020 ◽

pp. 609-623

Author(s):

Arun Kumar Beerala ◽

Gobinath R. ◽

Shyamala G. ◽

Siribommala Manvitha

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Training Data ◽

Machine Learning Techniques ◽

Statistical Tool ◽

Data Set ◽

Water Quality Prediction ◽

Living Things ◽

Sampling Locations ◽

Different Seasons

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.

Download Full-text

Bagging Approach for Medical Plants Recognition Based on Their DNA Sequences

International Journal of Social Ecology and Sustainable Development ◽

10.4018/ijsesd.2018100103 ◽

2018 ◽

Vol 9 (4) ◽

pp. 45-60

Author(s):

Mohamed Elhadi Rahmani ◽

Abdelmalek Amine ◽

Reda Mohamed Hamou

Keyword(s):

Dna Sequences ◽

Majority Vote ◽

Data Sets ◽

Data Set ◽

Drug Production ◽

Medical Plants

Many drugs in modern medicines originate from plants and the first step in drug production, is the recognition of plants needed for this purpose. This article presents a bagging approach for medical plants recognition based on their DNA sequences. In this work, the authors have developed a system that recognize DNA sequences of 14 medical plants, first they divided the 14-class data set into bi class sub-data sets, then instead of using an algorithm to classify the 14-class data set, they used the same algorithm to classify the sub-data sets. By doing so, they have simplified the problem of classification of 14 plants into sub-problems of bi class classification. To construct the subsets, the authors extracted all possible pairs of the 14 classes, so they gave each class more chances to be well predicted. This approach allows the study of the similarity between DNA sequences of a plant with each other plants. In terms of results, the authors have obtained very good results in which the accuracy has been doubled (from 45% to almost 80%). Classification of a new sequence was completed according to majority vote.

Download Full-text

A PROBABILISTIC SELF-ORGANIZING MAP FOR BINARY DATA TOPOGRAPHIC CLUSTERING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026808002351 ◽

2008 ◽

Vol 07 (04) ◽

pp. 363-383 ◽

Cited By ~ 10

Author(s):

MUSTAPHA LEBBAH ◽

YOUNÈS BENNANI ◽

NICOLETA ROGOVSCHI

Keyword(s):

Binary Data ◽

Learning Algorithm ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Binary Coding ◽

Public Data ◽

Multivariate Binary Data ◽

Self Organizing

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Transductive Bounds for the Multi-Class Majority Vote Classifier

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013566 ◽

2019 ◽

Vol 33 ◽

pp. 3566-3573

Author(s):

Vasilii Feofanov ◽

Emilie Devijver ◽

Massih-Reza Amini

Keyword(s):

Learning Algorithm ◽

Majority Vote ◽

Confusion Matrix ◽

Data Sets ◽

Bayes Classifier ◽

Partially Labeled Data ◽

Margin Distribution ◽

Training Examples ◽

Multi Class Classification ◽

Self Learning

In this paper, we propose a transductive bound over the risk of the majority vote classifier learned with partially labeled data for the multi-class classification. The bound is obtained by considering the class confusion matrix as an error indicator and it involves the margin distribution of the classifier over each class and a bound over the risk of the associated Gibbs classifier. When this latter bound is tight and, the errors of the majority vote classifier per class are concentrated on a low margin zone; we prove that the bound over the Bayes classifier’ risk is tight. As an application, we extend the self-learning algorithm to the multi-class case. The algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the proposed transductive bound. Empirical results on different data sets show the effectiveness of our approach compared to the same algorithm where the threshold is fixed manually, to the extension of TSVM to multi-class classification and to a graph-based semi-supervised algorithm.

Download Full-text

Research on Network Security Application Based on Deep Learning

CONVERTER ◽

10.17762/converter.235 ◽

2021 ◽

pp. 598-605

Author(s):

Zhao Jianchao

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Rapid Development ◽

Internet Security ◽

Detection Accuracy ◽

Data Sets ◽

Learning Technology ◽

Data Set ◽

Deep Learning Algorithm ◽

Internet Industry

Behind the rapid development of the Internet industry, Internet security has become a hidden danger. In recent years, the outstanding performance of deep learning in classification and behavior prediction based on massive data makes people begin to study how to use deep learning technology. Therefore, this paper attempts to apply deep learning to intrusion detection to learn and classify network attacks. Aiming at the nsl-kdd data set, this paper first uses the traditional classification methods and several different deep learning algorithms for learning classification. This paper deeply analyzes the correlation among data sets, algorithm characteristics and experimental classification results, and finds out the deep learning algorithm which is relatively good at. Then, a normalized coding algorithm is proposed. The experimental results show that the algorithm can improve the detection accuracy and reduce the false alarm rate.

Download Full-text

A Systematic Literature Review of Non-Invasive Indoor Thermal Discomfort Detection

Applied Sciences ◽

10.3390/app10124085 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4085 ◽

Cited By ~ 1

Author(s):

Alla Marchenko ◽

Alenka Temeljotov-Salaj

Keyword(s):

Data Collection ◽

Literature Review ◽

Systematic Literature Review ◽

Learning Algorithm ◽

Technological Development ◽

Real Life ◽

Work Place ◽

Data Set ◽

Thermal Discomfort ◽

Non Invasive

Since 1997, scientists have been trying to utilize new non-invasive approaches for thermal discomfort detection, which promise to be more effective for comparing frameworks that need direct responses from users. Due to rapid technological development in the bio-metrical field, a systematic literature review to investigate the possibility of thermal discomfort detection at the work place by non-invasive means using bio-sensing technology was performed. Firstly, the problem intervention comparison outcome context (PICOC) framework was introduced in the study to identify the main points for meta-analysis and, in turn, to provide relevant keywords for the literature search. In total, 2776 studies were found and processed using the preferred reporting items for systematic reviews and meta-analyses (PRISMA) methodology. After filtering by defined criterion, 35 articles were obtained for detailed investigation with respect to facility types used in the experiment, amount of people for data collection and algorithms used for prediction of the thermal discomfort event. The given study concludes that there is potential for the creation of non-invasive thermal discomfort detection models via utilization of bio-sensing technologies, which will provide a better user interaction with the built environment, potentially decrease energy use and enable better productivity. There is definitely room for improvement within the field of non-invasive thermal discomfort detection, especially with respect to data collection, algorithm implementation and sample size, in order to have opportunities for the deployment of developed solutions in real life. Based on the literature review, the potential of novel technology is seen to utilize a more intelligent approach for performing non-invasive thermal discomfort prediction. The architecture of deep neural networks should be studied more due to the specifics of its hidden layers and its ability of hierarchical data extraction. This machine learning algorithm can provide a better model for thermal discomfort detection based on a data set with different types of bio-metrical variables.

Download Full-text

Improvement of Variant Adaptable LSTM Trained With Metaheuristic Algorithms for Healthcare Analysis

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch048 ◽

2021 ◽

pp. 1031-1051

Author(s):

Tarik A. Rashid ◽

Mohammad K. Hassan ◽

Mokhtar Mohammadi ◽

Kym Fraser

Keyword(s):

Diabetes Mellitus ◽

Neural Networks ◽

Time Series Data ◽

Short Term Memory ◽

Learning Algorithm ◽

Back Propagation ◽

Machine Learning Techniques ◽

Series Data ◽

Data Sets ◽

Patients With Diabetes

Recently, the population of the world has increased along with health problems. Diabetes mellitus disease as an example causes issues to the health of many patients globally. The task of this chapter is to develop a dynamic and intelligent decision support system for patients with different diseases, and it aims at examining machine-learning techniques supported by optimization techniques. Artificial neural networks have been used in healthcare for several decades. Most research works utilize multilayer layer perceptron (MLP) trained with back propagation (BP) learning algorithm to achieve diabetes mellitus classification. Nonetheless, MLP has some drawbacks, such as, convergence, which can be slow; local minima can affect the training process. It is hard to scale and cannot be used with time series data sets. To overcome these drawbacks, long short-term memory (LSTM) is suggested, which is a more advanced form of recurrent neural networks. In this chapter, adaptable LSTM trained with two optimizing algorithms instead of the back propagation learning algorithm is presented. The optimization algorithms are biogeography-based optimization (BBO) and genetic algorithm (GA). Dataset is collected locally and another benchmark dataset is used as well. Finally, the datasets fed into adaptable models; LSTM with BBO (LSTMBBO) and LSTM with GA (LSTMGA) for classification purposes. The experimental and testing results are compared and they are promising. This system helps physicians and doctors to provide proper health treatment for patients with diabetes mellitus. Details of source code and implementation of our system can be obtained in the following link “https://github.com/hamakamal/LSTM.”

Download Full-text

COMPARATIVE ANALYSIS OF PREDICTION TECHNIQUES ON THE BASIS OF TELECOM CUSTOMER CHURN

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2021.09.00002 ◽

2021 ◽

Vol 16 (9) ◽

Author(s):

Yasser Khan

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Performance Metrics ◽

Learning Algorithm ◽

Data Sets ◽

Data Set ◽

Broadband Service ◽

Customer Churn ◽

Artificial Neural ◽

Prediction Techniques

Telecommunication customer churn is considered as major cause for dropped revenue and customer baseline of voice, multimedia and broadband service provider. There is strong need on focusing to understand the contributory factors of churn. Now considering factors from data sets obtained from Pakistan major telecom operators are applied for modeling. On the basis of results obtained from the optimal techniques, comparative technical evaluation is carried out. This research study is comprised mainly of proposition of conceptual frame work for telecom customer churn that lead to creation of predictive model. This is trained tested and evaluated on given data set taken from Pakistan Telecom industry that has provided accurate & reliable outcomes. Out of four prevailing statistical and machine learning algorithm, artificial neural network is declared the most reliable model, followed by decision tree. The logistic regression is placed at last position by considering the performance metrics like accuracy, recall, precision and ROC curve. The results from research has revealed main parameters found responsible for customer churn were data rate, call failure rate, mean time to repair and monthly billing amount. On the basis of these parameter artificial neural network has achieved 79% more efficiency as compare to low performing statistical techniques.

Download Full-text

Asteroid mass estimation with the robust adaptive Metropolis algorithm

Astronomy and Astrophysics ◽

10.1051/0004-6361/201935608 ◽

2020 ◽

Vol 633 ◽

pp. A46

Author(s):

L. Siltala ◽

M. Granvik

Keyword(s):

Least Squares ◽

Bulk Density ◽

Recent Literature ◽

Estimation Method ◽

Synthetic Data ◽

Metropolis Algorithm ◽

Data Sets ◽

Data Set ◽

Mass Estimation ◽

Robust Adaptive

Context. The bulk density of an asteroid informs us about its interior structure and composition. To constrain the bulk density, one needs an estimated mass of the asteroid. The mass is estimated by analyzing an asteroid’s gravitational interaction with another object, such as another asteroid during a close encounter. An estimate for the mass has typically been obtained with linearized least-squares methods, despite the fact that this family of methods is not able to properly describe non-Gaussian parameter distributions. In addition, the uncertainties reported for asteroid masses in the literature are sometimes inconsistent with each other and are suspected to be unrealistically low. Aims. We aim to present a Markov-chain Monte Carlo (MCMC) algorithm for the asteroid mass estimation problem based on asteroid-asteroid close encounters. We verify that our algorithm works correctly by applying it to synthetic data sets. We use astrometry available through the Minor Planet Center to estimate masses for a select few example cases and compare our results with results reported in the literature. Methods. Our mass-estimation method is based on the robust adaptive Metropolis algorithm that has been implemented into the OpenOrb asteroid orbit computation software. Our method has the built-in capability to analyze multiple perturbing asteroids and test asteroids simultaneously. Results. We find that our mass estimates for the synthetic data sets are fully consistent with the ground truth. The nominal masses for real example cases typically agree with the literature but tend to have greater uncertainties than what is reported in recent literature. Possible reasons for this include different astrometric data sets and weights, different test asteroids, different force models or different algorithms. For (16) Psyche, the target of NASA’s Psyche mission, our maximum likelihood mass is approximately 55% of what is reported in the literature. Such a low mass would imply that the bulk density is significantly lower than previously expected and thus disagrees with the theory of (16) Psyche being the metallic core of a protoplanet. We do, however, note that masses reported in recent literature remain within our 3-sigma limits. Results. The new MCMC mass-estimation algorithm performs as expected, but a rigorous comparison with results from a least-squares algorithm with the exact same data set remains to be done. The matters of uncertainties in comparison with other algorithms and correlations of observations also warrant further investigation.

Download Full-text