A Method to identify anomalies in stock market trading based on Probabilistic Machine Learning

Financial operations involve a significant amount of resources and can directly or indirectly affect the lives of virtually all people. For the efficiency and transparency in this context, it is essential to identify financial crimes and to punish the responsible. However, the large number of operations makes it unfeasible for analyzes made exclusively by humans. Thus, the application of automated data analysis techniques is essential. Within this scenario, this work presents a method that identifies anomalies that may be associated with operations in the stock exchange market prohibited by law. Specifically, we seek to find patterns related to insider trading. These types of operations can generate big losses for investors. In this work, publicly available information by the SEC and CVM, based on real cases on BOVESPA, NYSE and NASDAQ stock exchanges, is used as a training base. The method includes the creation of several candidate variables and the identification of relevant variables. With this definition, classifiers based on decision trees and Bayesian networks are constructed, and, after, evaluated and selected. The computational cost of performing such tasks can be quite significant, and it grows quickly with the amount of analyzed data. For this reason, the method considers the use of machine learning algorithms distributed in a computational cluster. In order to perform such tasks, we use the Weka framework with modules that allows the distribution of the processing load in a Hadoop cluster. The use of a computational cluster to execute learning algorithms in a large amount of data has been an active area of research, and this work contributes to the analysis of data in the specific context of financial operations. The obtained results show the feasibility of the approach, although the quality of the results is limited by the exclusive use of publicly available data.

Download Full-text

Signal-piloted processing and machine learning based efficient power quality disturbances recognition

PLoS ONE ◽

10.1371/journal.pone.0252104 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0252104

Author(s):

Saeed Mian Qaisar

Keyword(s):

Machine Learning ◽

Power Quality ◽

Data Storage ◽

Computational Cost ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Processing Load ◽

Suggested Approach ◽

Power Quality Disturbances

Significant losses can occur for various smart grid stake holders due to the Power Quality Disturbances (PQDs). Therefore, it is necessary to correctly recognize and timely mitigate the PQDs. In this context, an emerging trend is the development of machine learning assisted PQDs management. Based on the conventional processing theory, the existing PQDs identification is time-invariant. It can result in a huge amount of unnecessary information being collected, processed, and transmitted. Consequently, needless processing activities, power consumption and latency can occur. In this paper, a novel combination of signal-piloted acquisition, adaptive-rate segmentation and time-domain features extraction with machine learning tools is suggested. The signal-piloted acquisition and processing brings real-time compression. Therefore, a remarkable reduction can be secured in the data storage, processing and transmission requirement towards the post classifier. Additionally, a reduced computational cost and latency of classifier is promised. The classification is accomplished by using robust machine learning algorithms. A comparison is made among the k-Nearest Neighbor, Naïve Bayes, Artificial Neural Network and Support Vector Machine. Multiple metrics are used to test the success of classification. It permits to avoid any biasness of findings. The applicability of the suggested approach is studied for automated recognition of the power signal’s major voltage and transient disturbances. Results show that the system attains a 6.75-fold reduction in the collected information and the processing load and secures the 98.05% accuracy of classification.

Download Full-text

Exhaustive Probabilistic Neural Network for attribute selection and supervised seismic facies classification

Interpretation ◽

10.1190/int-2020-0102.1 ◽

2020 ◽

pp. 1-67

Author(s):

David Lubo-Robles ◽

Thang Ha ◽

Sivaramakrishnan Lakshmivarahan ◽

Kurt J. Marfurt ◽

Matthew J. Pranter

Keyword(s):

Neural Network ◽

Machine Learning ◽

Probabilistic Neural Network ◽

Learning Algorithms ◽

Computational Cost ◽

Component Analysis ◽

Machine Learning Algorithms ◽

Seismic Facies ◽

Seismic Survey ◽

Facies Classification

Machine learning algorithms such as principal component analysis (PCA), independent component analysis (ICA), self-organizing maps (SOM), and artificial neural networks (ANN), have been used by geoscientists to not only accelerate the interpretation of their data, but also to provide a more quantitative estimate of the likelihood that any voxel belongs to a given facies. Identifying the best combination of attributes needed to perform either supervised or unsupervised machine learning tasks continues to be the most-asked question by interpreters. In the past decades, stepwise regression and genetic algorithms have been used together with supervised learning algorithms to select the best number and combination of attributes. For reasons of computational efficiency, these techniques do not test all the seismic attribute combinations, potentially leading to a suboptimal classification. In this study, we develop an exhaustive probabilistic neural network (PNN) algorithm which exploits the PNN’s capacity in exploring non-linear relationships to obtain the optimal attribute subset that best differentiates target seismic facies of interest. We show the efficacy of our proposed workflow in differentiating salt from non-salt seismic facies in a Eugene Island seismic survey, offshore Louisiana. We find that from seven input candidate attributes, the Exhaustive PNN is capable of removing irrelevant attributes by selecting a smaller subset of four seismic attributes. The enhanced classification using fewer attributes also reduces the computational cost. We then use the resulting facies probability volumes to construct the 3D distribution of the salt diapir geobodies embedded in a stratigraphic matrix.

Download Full-text

Stock Price Forecasting on Telecommunication Sector Companies in Indonesia Stock Exchange Using Machine Learning Algorithms

2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS) ◽

10.1109/icoris50180.2020.9320758 ◽

2020 ◽

Author(s):

Jimmy H. Moedjahedy ◽

Reymon Rotikan ◽

Wien Fitrian Roshandi ◽

Joe Yuan Mambu

Keyword(s):

Machine Learning ◽

Stock Price ◽

Stock Exchange ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Price Forecasting ◽

Telecommunication Sector ◽

Stock Price Forecasting

Download Full-text

Forecasting gold prices in India using ARIMAX and machine learning algorithms

INTERNATIONAL RESEARCH JOURNAL OF AGRICULTURAL ECONOMICS AND STATISTICS ◽

10.15740/has/irjaes/11.2/299-310 ◽

2020 ◽

Vol 11 (2) ◽

pp. 299-310

Author(s):

P. Sai Shankar ◽

M. Krishna Reddy

Keyword(s):

Machine Learning ◽

Time Series ◽

Stock Exchange ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Percentage Error ◽

Lasso Regression ◽

Explanatory Variables ◽

Repo Rate

Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. The main object of this paper is to compare the traditional time series model with machine learning algorithms. To predict the gold prices based on economic factors such as inflation, exchange rate, crude price, bank rate, repo rate, reverse repo rate, gold reserve ration, Bombay stock exchange and National stock exchange. Two lagged variables are taken for each variable in the analysis. The ARIMAX model is developed to forecast Indian gold prices using daily data for the period 2016 to 2020 obtained from World Gold Council. We fitted the ARIMAX (4,1,1) model to our data which exhibited the least AIC values. In the mean while, decision tree, random forest, lasso regression, ridge regression, XGB and ensemble models were also examined to forecast the gold prices based on host of explanatory variables. The forecasting performance of the models were evaluated using mean absolute error, mean absolute percentage error and root mean squared errors. Ensemble model out performs than that of the other models for predicting the gold prices based on set of explanatory variables.

Download Full-text

Burned Area Classification Based on Extreme Learning Machine and Sentinel-2 Images

Applied Sciences ◽

10.3390/app12010009 ◽

2021 ◽

Vol 12 (1) ◽

pp. 9

Author(s):

John Gajardo ◽

Marco Mora ◽

Guillermo Valdés-Nicolao ◽

Marcos Carrasco-Benavides

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Computational Cost ◽

Machine Learning Algorithms ◽

Burned Area ◽

Training Time ◽

Burned Areas ◽

Area Classification ◽

And Training ◽

Sentinel 2

Sentinel-2 satellite images allow high separability for mapping burned and unburned areas. This problem has been extensively addressed using machine-learning algorithms. However, these need a suitable dataset and entail considerable training time. Recently, extreme learning machines (ELM) have presented high precision in classification and regression problems but with low computational cost. This paper proposes evaluating ELM to map burned areas and compare them with other machine-learning algorithms broadly used. Several indices, metrics and training times were used to assess the performance of the algorithms. Considering the average of datasets, the best performance was obtained by random forest (DICE = 0.93; omission and commission = 0.08) and ELM (DICE = 0.90; omission and commission = 0.07). The training time for the best model was from ELM (1.45 s) and logistic regression (1.85 s). According to results, ELM was the best burned-area classification algorithm, considering precision and training time, evidencing great potential to map burned areas at global scales with medium-high spatial resolution images. This information is essential to fire-risk systems and burned-area records used to design prevention and fire-combat strategies, and it provides valuable knowledge on the effect of fires on the landscape and atmosphere.

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

The Unlearnable Checkerboard Pattern

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.1.2.holloway.1 ◽

2019 ◽

Vol 1 (2) ◽

pp. 78-80

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Checkerboard Pattern ◽

Simple Task

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms. Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.

Download Full-text