scholarly journals The Stock Exchange Prediction using Machine Learning Techniques: A Comprehensive and Systematic Literature Review

2021 ◽  
Vol 14 (2) ◽  
pp. 91-112
Author(s):  
Rico Bayu Wiranata ◽  
Arif Djunaidy

This literature review identifies and analyzes research topic trends, types of data sets, learning algorithm, methods improvements, and frameworks used in stock exchange prediction. A total of 81 studies were investigated, which were published regarding stock predictions in the period January 2015 to June 2020 which took into account the inclusion and exclusion criteria. The literature review methodology is carried out in three major phases: review planning, implementation, and report preparation, in nine steps from defining systematic review requirements to presentation of results. Estimation or regression, clustering, association, classification, and preprocessing analysis of data sets are the five main focuses revealed in the main study of stock prediction research. The classification method gets a share of 35.80% from related studies, the estimation method is 56.79%, data analytics is 4.94%, the rest is clustering and association is 1.23%. Furthermore, the use of the technical indicator data set is 74.07%, the rest are combinations of datasets. To develop a stock prediction model 48 different methods have been applied, 9 of the most widely applied methods were identified. The best method in terms of accuracy and also small error rate such as SVM, DNN, CNN, RNN, LSTM, bagging ensembles such as RF, boosting ensembles such as XGBoost, ensemble majority vote and the meta-learner approach is ensemble Stacking. Several techniques are proposed to improve prediction accuracy by combining several methods, using boosting algorithms, adding feature selection and using parameter and hyper-parameter optimization.

2021 ◽  
Author(s):  
ElMehdi SAOUDI ◽  
Said Jai Andaloussi

Abstract With the rapid growth of the volume of video data and the development of multimedia technologies, it has become necessary to have the ability to accurately and quickly browse and search through information stored in large multimedia databases. For this purpose, content-based video retrieval ( CBVR ) has become an active area of research over the last decade. In this paper, We propose a content-based video retrieval system providing similar videos from a large multimedia data-set based on a query video. The approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key-frames for rapid browsing and efficient video indexing. We have implemented the proposed approach on both, single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments are performed using various benchmark action and activity recognition data-sets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to state-of-the-art methods.


2020 ◽  
pp. 609-623
Author(s):  
Arun Kumar Beerala ◽  
Gobinath R. ◽  
Shyamala G. ◽  
Siribommala Manvitha

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.


Author(s):  
Mohamed Elhadi Rahmani ◽  
Abdelmalek Amine ◽  
Reda Mohamed Hamou

Many drugs in modern medicines originate from plants and the first step in drug production, is the recognition of plants needed for this purpose. This article presents a bagging approach for medical plants recognition based on their DNA sequences. In this work, the authors have developed a system that recognize DNA sequences of 14 medical plants, first they divided the 14-class data set into bi class sub-data sets, then instead of using an algorithm to classify the 14-class data set, they used the same algorithm to classify the sub-data sets. By doing so, they have simplified the problem of classification of 14 plants into sub-problems of bi class classification. To construct the subsets, the authors extracted all possible pairs of the 14 classes, so they gave each class more chances to be well predicted. This approach allows the study of the similarity between DNA sequences of a plant with each other plants. In terms of results, the authors have obtained very good results in which the accuracy has been doubled (from 45% to almost 80%). Classification of a new sequence was completed according to majority vote.


Author(s):  
MUSTAPHA LEBBAH ◽  
YOUNÈS BENNANI ◽  
NICOLETA ROGOVSCHI

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.


Author(s):  
Vasilii Feofanov ◽  
Emilie Devijver ◽  
Massih-Reza Amini

In this paper, we propose a transductive bound over the risk of the majority vote classifier learned with partially labeled data for the multi-class classification. The bound is obtained by considering the class confusion matrix as an error indicator and it involves the margin distribution of the classifier over each class and a bound over the risk of the associated Gibbs classifier. When this latter bound is tight and, the errors of the majority vote classifier per class are concentrated on a low margin zone; we prove that the bound over the Bayes classifier’ risk is tight. As an application, we extend the self-learning algorithm to the multi-class case. The algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the proposed transductive bound. Empirical results on different data sets show the effectiveness of our approach compared to the same algorithm where the threshold is fixed manually, to the extension of TSVM to multi-class classification and to a graph-based semi-supervised algorithm.


CONVERTER ◽  
2021 ◽  
pp. 598-605
Author(s):  
Zhao Jianchao

Behind the rapid development of the Internet industry, Internet security has become a hidden danger. In recent years, the outstanding performance of deep learning in classification and behavior prediction based on massive data makes people begin to study how to use deep learning technology. Therefore, this paper attempts to apply deep learning to intrusion detection to learn and classify network attacks. Aiming at the nsl-kdd data set, this paper first uses the traditional classification methods and several different deep learning algorithms for learning classification. This paper deeply analyzes the correlation among data sets, algorithm characteristics and experimental classification results, and finds out the deep learning algorithm which is relatively good at. Then, a normalized coding algorithm is proposed. The experimental results show that the algorithm can improve the detection accuracy and reduce the false alarm rate.


2020 ◽  
Vol 10 (12) ◽  
pp. 4085 ◽  
Author(s):  
Alla Marchenko ◽  
Alenka Temeljotov-Salaj

Since 1997, scientists have been trying to utilize new non-invasive approaches for thermal discomfort detection, which promise to be more effective for comparing frameworks that need direct responses from users. Due to rapid technological development in the bio-metrical field, a systematic literature review to investigate the possibility of thermal discomfort detection at the work place by non-invasive means using bio-sensing technology was performed. Firstly, the problem intervention comparison outcome context (PICOC) framework was introduced in the study to identify the main points for meta-analysis and, in turn, to provide relevant keywords for the literature search. In total, 2776 studies were found and processed using the preferred reporting items for systematic reviews and meta-analyses (PRISMA) methodology. After filtering by defined criterion, 35 articles were obtained for detailed investigation with respect to facility types used in the experiment, amount of people for data collection and algorithms used for prediction of the thermal discomfort event. The given study concludes that there is potential for the creation of non-invasive thermal discomfort detection models via utilization of bio-sensing technologies, which will provide a better user interaction with the built environment, potentially decrease energy use and enable better productivity. There is definitely room for improvement within the field of non-invasive thermal discomfort detection, especially with respect to data collection, algorithm implementation and sample size, in order to have opportunities for the deployment of developed solutions in real life. Based on the literature review, the potential of novel technology is seen to utilize a more intelligent approach for performing non-invasive thermal discomfort prediction. The architecture of deep neural networks should be studied more due to the specifics of its hidden layers and its ability of hierarchical data extraction. This machine learning algorithm can provide a better model for thermal discomfort detection based on a data set with different types of bio-metrical variables.


Author(s):  
Tarik A. Rashid ◽  
Mohammad K. Hassan ◽  
Mokhtar Mohammadi ◽  
Kym Fraser

Recently, the population of the world has increased along with health problems. Diabetes mellitus disease as an example causes issues to the health of many patients globally. The task of this chapter is to develop a dynamic and intelligent decision support system for patients with different diseases, and it aims at examining machine-learning techniques supported by optimization techniques. Artificial neural networks have been used in healthcare for several decades. Most research works utilize multilayer layer perceptron (MLP) trained with back propagation (BP) learning algorithm to achieve diabetes mellitus classification. Nonetheless, MLP has some drawbacks, such as, convergence, which can be slow; local minima can affect the training process. It is hard to scale and cannot be used with time series data sets. To overcome these drawbacks, long short-term memory (LSTM) is suggested, which is a more advanced form of recurrent neural networks. In this chapter, adaptable LSTM trained with two optimizing algorithms instead of the back propagation learning algorithm is presented. The optimization algorithms are biogeography-based optimization (BBO) and genetic algorithm (GA). Dataset is collected locally and another benchmark dataset is used as well. Finally, the datasets fed into adaptable models; LSTM with BBO (LSTMBBO) and LSTM with GA (LSTMGA) for classification purposes. The experimental and testing results are compared and they are promising. This system helps physicians and doctors to provide proper health treatment for patients with diabetes mellitus. Details of source code and implementation of our system can be obtained in the following link “https://github.com/hamakamal/LSTM.”


Author(s):  
Yasser Khan

Telecommunication customer churn is considered as major cause for dropped revenue and customer baseline of voice, multimedia and broadband service provider. There is strong need on focusing to understand the contributory factors of churn. Now considering factors from data sets obtained from Pakistan major telecom operators are applied for modeling. On the basis of results obtained from the optimal techniques, comparative technical evaluation is carried out. This research study is comprised mainly of proposition of conceptual frame work for telecom customer churn that lead to creation of predictive model. This is trained tested and evaluated on given data set taken from Pakistan Telecom industry that has provided accurate & reliable outcomes. Out of four prevailing statistical and machine learning algorithm, artificial neural network is declared the most reliable model, followed by decision tree. The logistic regression is placed at last position by considering the performance metrics like accuracy, recall, precision and ROC curve. The results from research has revealed main parameters found responsible for customer churn were data rate, call failure rate, mean time to repair and monthly billing amount. On the basis of these parameter artificial neural network has achieved 79% more efficiency as compare to low performing statistical techniques.


2020 ◽  
Vol 633 ◽  
pp. A46
Author(s):  
L. Siltala ◽  
M. Granvik

Context. The bulk density of an asteroid informs us about its interior structure and composition. To constrain the bulk density, one needs an estimated mass of the asteroid. The mass is estimated by analyzing an asteroid’s gravitational interaction with another object, such as another asteroid during a close encounter. An estimate for the mass has typically been obtained with linearized least-squares methods, despite the fact that this family of methods is not able to properly describe non-Gaussian parameter distributions. In addition, the uncertainties reported for asteroid masses in the literature are sometimes inconsistent with each other and are suspected to be unrealistically low. Aims. We aim to present a Markov-chain Monte Carlo (MCMC) algorithm for the asteroid mass estimation problem based on asteroid-asteroid close encounters. We verify that our algorithm works correctly by applying it to synthetic data sets. We use astrometry available through the Minor Planet Center to estimate masses for a select few example cases and compare our results with results reported in the literature. Methods. Our mass-estimation method is based on the robust adaptive Metropolis algorithm that has been implemented into the OpenOrb asteroid orbit computation software. Our method has the built-in capability to analyze multiple perturbing asteroids and test asteroids simultaneously. Results. We find that our mass estimates for the synthetic data sets are fully consistent with the ground truth. The nominal masses for real example cases typically agree with the literature but tend to have greater uncertainties than what is reported in recent literature. Possible reasons for this include different astrometric data sets and weights, different test asteroids, different force models or different algorithms. For (16) Psyche, the target of NASA’s Psyche mission, our maximum likelihood mass is approximately 55% of what is reported in the literature. Such a low mass would imply that the bulk density is significantly lower than previously expected and thus disagrees with the theory of (16) Psyche being the metallic core of a protoplanet. We do, however, note that masses reported in recent literature remain within our 3-sigma limits. Results. The new MCMC mass-estimation algorithm performs as expected, but a rigorous comparison with results from a least-squares algorithm with the exact same data set remains to be done. The matters of uncertainties in comparison with other algorithms and correlations of observations also warrant further investigation.


Sign in / Sign up

Export Citation Format

Share Document