Evolving Decision Rules to Discover Patterns in Financial Data Sets

Author(s):  
Alma Lilia García-Almanza ◽  
Edward P. K. Tsang ◽  
Edgar Galván-López

Author(s):  
Pu Wang ◽  
Edward P. K. Tsang ◽  
Thomas Weise ◽  
Ke Tang ◽  
Xin Yao


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Giacomo Vaccario ◽  
Luca Verginer ◽  
Frank Schweitzer

AbstractHigh skill labour is an important factor underpinning the competitive advantage of modern economies. Therefore, attracting and retaining scientists has become a major concern for migration policy. In this work, we study the migration of scientists on a global scale, by combining two large data sets covering the publications of 3.5 million scientists over 60 years. We analyse their geographical distances moved for a new affiliation and their age when moving, this way reconstructing their geographical “career paths”. These paths are used to derive the world network of scientists’ mobility between cities and to analyse its topological properties. We further develop and calibrate an agent-based model, such that it reproduces the empirical findings both at the level of scientists and of the global network. Our model takes into account that the academic hiring process is largely demand-driven and demonstrates that the probability of scientists to relocate decreases both with age and with distance. Our results allow interpreting the model assumptions as micro-based decision rules that can explain the observed mobility patterns of scientists.



2012 ◽  
pp. 163-186
Author(s):  
Jirí Krupka ◽  
Miloslava Kašparová ◽  
Pavel Jirava ◽  
Jan Mandys

The chapter presents the problem of quality of life modeling in the Czech Republic based on classification methods. It concerns a comparison of methodological approaches; in the first case the approach of the Institute of Sociology of the Academy of Sciences of the Czech Republic was used, the second case is concerning a project of the civic association Team Initiative for Local Sustainable Development. On the basis of real data sets from the institute and team initiative the authors synthesized and analyzed quality of life classification models. They used decision tree classification algorithms for generating transparent decision rules and compare the classification results of decision tree. The classifier models on the basis of C5.0, CHAID, C&RT and C5.0 boosting algorithms were proposed and analyzed. The designed classification model was created in Clementine.



Author(s):  
Qinrong Feng ◽  
Duoqian Miao ◽  
Ruizhi Wang

Decision rules mining is an important technique in machine learning and data mining, it has been studied intensively during the past few years. However, most existing algorithms are based on flat data tables, from which sets of decision rules mined may be very large for massive data sets. Such sets of rules are not easily understandable and really useful for users. Moreover, too many rules may lead to over-fitting. Thus, a method of decision rules mining from different abstract levels was provided in this chapter, which aims to improve the efficiency of decision rules mining by combining the hierarchical structure of multidimensional model and the techniques of rough set theory. Our algorithm for decision rules mining follows the so called separate-and-conquer strategy. Namely, certain rules were mined beginning from the most abstract level, and supporting sets of those certain rules were removed from the universe, then drill down to the next level to recursively mine other certain rules which supporting sets are included in the remaining objects until no objects remain in the universe or getting to the primitive level. So this algorithm can output some generalized rules with different degree of generalization.



2003 ◽  
Vol 17 (22n24) ◽  
pp. 4003-4012 ◽  
Author(s):  
Mogens H. Jensen ◽  
Anders Johansen ◽  
Ingve Simonsen

We consider inverse statistics in turbulence and financial data. By inverse statistics, also sometimes called exit time statistics, we "turn" the variables around such that the fluctuating variable becomes the fixed variable, while the fixed variable becomes fluctuating. In that sense we can probe distinct regimes of the data sets. In the case of turbulence, we obtain a new set of (multi)-scaling exponents which monitor the dissipation regime. In the case of economics, we obtain a distribution of waiting times needed to achieve a predefined level of return. Such a distribution typically goes through a maximum at a time called the optimal investment horizon[Formula: see text], since this defines the most likely waiting time for obtaining a given return ρ. By considering equal positive and negative levels of return, we report on a quantitative gain-loss asymmetry most pronounced for short horizons.



2013 ◽  
Vol 3 (4) ◽  
pp. 31-46 ◽  
Author(s):  
Hanaa Ismail Elshazly ◽  
Ahmad Taher Azar ◽  
Aboul Ella Hassanien ◽  
Abeer Mohamed Elkorany

Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge size and features, continuous and real attributes that refer to patients' investigations. Therefore, discretization and feature selection process are considered a key issue in improving the extracted knowledge from patients' investigations records. In this paper, a hybrid system that integrates Rough Set (RS) and Genetic Algorithm (GA) is presented for the efficient classification of medical data sets of different sizes and dimensionalities. Genetic Algorithm is applied with the aim of reducing the dimension of medical datasets and RS decision rules were used for efficient classification. Furthermore, the proposed system applies the Entropy Gain Information (EI) for discretization process. Four biomedical data sets are tested by the proposed system (EI-GA-RS), and the highest score was obtained through three different datasets. Other different hybrid techniques shared the proposed technique the highest accuracy but the proposed system preserves its place as one of the highest results systems four three different sets. EI as discretization technique also is a common part for the best results in the mentioned datasets while RS as an evaluator realized the best results in three different data sets.



1993 ◽  
Vol 03 (03) ◽  
pp. 745-755 ◽  
Author(s):  
TED JADITZ ◽  
CHERA L. SAYERS

This paper examines recent developments in nonlinear science in economics. Several claims of findings of chaos in economic data are reviewed. We discuss how each claim has been revised in light of further analysis, and point out several traps for empirical researchers in economic data. These traps suggest certain methodological refinements useful for researchers analyzing very small data sets, including diagnostic tests to detect ill conditioned data, filtering data to exclude nonchaotic alternatives, and nonparametric procedures to check the precision of parameter estimates. Most specialists in the field would say there is no conclusive evidence of chaos in economic or financial data.



2011 ◽  
Vol 48 (A) ◽  
pp. 367-378 ◽  
Author(s):  
Paul Embrechts ◽  
Thomas Liniger ◽  
Lu Lin

A Hawkes process is also known under the name of a self-exciting point process and has numerous applications throughout science and engineering. We derive the statistical estimation (maximum likelihood estimation) and goodness-of-fit (mainly graphical) for multivariate Hawkes processes with possibly dependent marks. As an application, we analyze two data sets from finance.



2008 ◽  
Vol 22 (26) ◽  
pp. 2571-2578 ◽  
Author(s):  
CHUNXIA YANG ◽  
HONGFA WU ◽  
YINGCHAO ZHANG

Based on six large empirical data sets, the financial data sequences are decomposed by Empirical Mode Decomposition (EMD) into various quasi-periodic fluctuation modes, including weekly, half-month, seasonal, about-four-years and so on, which may indicate some abnormal return oscillation patterns. The corresponding average periods are calculated by Fast Fourier Transform Algorithm (FFT), about 6 days for the weekly, about 10 days for the half-month, about 60 days for the seasonal and 1020 days or so for the about-four-years. These obtained results show that the mode periods may be universal for different markets.





Sign in / Sign up

Export Citation Format

Share Document