scholarly journals A Method to identify anomalies in stock market trading based on Probabilistic Machine Learning

2019 ◽  
Vol 2 (2) ◽  
pp. 42
Author(s):  
Paulo Andre Lima De Castro ◽  
Anderson R.B. Teodoro

Financial operations involve a significant amount of resources and can directly or indirectly affect the lives of virtually all people. For the efficiency and transparency in this context, it is essential to identify financial crimes and to punish the responsible. However, the large number of operations makes it unfeasible for analyzes made exclusively by humans. Thus, the application of automated data analysis techniques is essential. Within this scenario, this work presents a method that identifies anomalies that may be associated with operations in the stock exchange market prohibited by law. Specifically, we seek to find patterns related to insider trading. These types of operations can generate big losses for investors. In this work, publicly available information by the SEC and CVM, based on real cases on BOVESPA, NYSE and NASDAQ stock exchanges, is used as a training base. The method includes the creation of several candidate variables and the identification of relevant variables. With this definition, classifiers based on decision trees and Bayesian networks are constructed, and, after, evaluated and selected. The computational cost of performing such tasks can be quite significant, and it grows quickly with the amount of analyzed data. For this reason, the method considers the use of machine learning algorithms distributed in a computational cluster. In order to perform such tasks, we use the Weka framework with modules that allows the distribution of the processing load in a Hadoop cluster. The use of a computational cluster to execute learning algorithms in a large amount of data has been an active area of research, and this work contributes to the analysis of data in the specific context of financial operations. The obtained results show the feasibility of the approach, although the quality of the results is limited by the exclusive use of publicly available data.

PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0252104
Author(s):  
Saeed Mian Qaisar

Significant losses can occur for various smart grid stake holders due to the Power Quality Disturbances (PQDs). Therefore, it is necessary to correctly recognize and timely mitigate the PQDs. In this context, an emerging trend is the development of machine learning assisted PQDs management. Based on the conventional processing theory, the existing PQDs identification is time-invariant. It can result in a huge amount of unnecessary information being collected, processed, and transmitted. Consequently, needless processing activities, power consumption and latency can occur. In this paper, a novel combination of signal-piloted acquisition, adaptive-rate segmentation and time-domain features extraction with machine learning tools is suggested. The signal-piloted acquisition and processing brings real-time compression. Therefore, a remarkable reduction can be secured in the data storage, processing and transmission requirement towards the post classifier. Additionally, a reduced computational cost and latency of classifier is promised. The classification is accomplished by using robust machine learning algorithms. A comparison is made among the k-Nearest Neighbor, Naïve Bayes, Artificial Neural Network and Support Vector Machine. Multiple metrics are used to test the success of classification. It permits to avoid any biasness of findings. The applicability of the suggested approach is studied for automated recognition of the power signal’s major voltage and transient disturbances. Results show that the system attains a 6.75-fold reduction in the collected information and the processing load and secures the 98.05% accuracy of classification.


2020 ◽  
pp. 1-67
Author(s):  
David Lubo-Robles ◽  
Thang Ha ◽  
Sivaramakrishnan Lakshmivarahan ◽  
Kurt J. Marfurt ◽  
Matthew J. Pranter

Machine learning algorithms such as principal component analysis (PCA), independent component analysis (ICA), self-organizing maps (SOM), and artificial neural networks (ANN), have been used by geoscientists to not only accelerate the interpretation of their data, but also to provide a more quantitative estimate of the likelihood that any voxel belongs to a given facies. Identifying the best combination of attributes needed to perform either supervised or unsupervised machine learning tasks continues to be the most-asked question by interpreters. In the past decades, stepwise regression and genetic algorithms have been used together with supervised learning algorithms to select the best number and combination of attributes. For reasons of computational efficiency, these techniques do not test all the seismic attribute combinations, potentially leading to a suboptimal classification. In this study, we develop an exhaustive probabilistic neural network (PNN) algorithm which exploits the PNN’s capacity in exploring non-linear relationships to obtain the optimal attribute subset that best differentiates target seismic facies of interest. We show the efficacy of our proposed workflow in differentiating salt from non-salt seismic facies in a Eugene Island seismic survey, offshore Louisiana. We find that from seven input candidate attributes, the Exhaustive PNN is capable of removing irrelevant attributes by selecting a smaller subset of four seismic attributes. The enhanced classification using fewer attributes also reduces the computational cost. We then use the resulting facies probability volumes to construct the 3D distribution of the salt diapir geobodies embedded in a stratigraphic matrix.


Author(s):  
P. Sai Shankar ◽  
M. Krishna Reddy

Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. The main object of this paper is to compare the traditional time series model with machine learning algorithms. To predict the gold prices based on economic factors such as inflation, exchange rate, crude price, bank rate, repo rate, reverse repo rate, gold reserve ration, Bombay stock exchange and National stock exchange. Two lagged variables are taken for each variable in the analysis. The ARIMAX model is developed to forecast Indian gold prices using daily data for the period 2016 to 2020 obtained from World Gold Council. We fitted the ARIMAX (4,1,1) model to our data which exhibited the least AIC values. In the mean while, decision tree, random forest, lasso regression, ridge regression, XGB and ensemble models were also examined to forecast the gold prices based on host of explanatory variables. The forecasting performance of the models were evaluated using mean absolute error, mean absolute percentage error and root mean squared errors. Ensemble model out performs than that of the other models for predicting the gold prices based on set of explanatory variables.


2021 ◽  
Vol 12 (1) ◽  
pp. 9
Author(s):  
John Gajardo ◽  
Marco Mora ◽  
Guillermo Valdés-Nicolao ◽  
Marcos Carrasco-Benavides

Sentinel-2 satellite images allow high separability for mapping burned and unburned areas. This problem has been extensively addressed using machine-learning algorithms. However, these need a suitable dataset and entail considerable training time. Recently, extreme learning machines (ELM) have presented high precision in classification and regression problems but with low computational cost. This paper proposes evaluating ELM to map burned areas and compare them with other machine-learning algorithms broadly used. Several indices, metrics and training times were used to assess the performance of the algorithms. Considering the average of datasets, the best performance was obtained by random forest (DICE = 0.93; omission and commission = 0.08) and ELM (DICE = 0.90; omission and commission = 0.07). The training time for the best model was from ELM (1.45 s) and logistic regression (1.85 s). According to results, ELM was the best burned-area classification algorithm, considering precision and training time, evidencing great potential to map burned areas at global scales with medium-high spatial resolution images. This information is essential to fire-risk systems and burned-area records used to design prevention and fire-combat strategies, and it provides valuable knowledge on the effect of fires on the landscape and atmosphere.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2019 ◽  
Vol 1 (2) ◽  
pp. 78-80
Author(s):  
Eric Holloway

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms.  Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.


Sign in / Sign up

Export Citation Format

Share Document