Application of Predictive Methods to Financial Data Sets

Abstract Financial data sets are growing too fast and need to be analyzed. Data science has many different techniques to store and summarize, mining, running simulations and finally analyzing them. Among data science methods, predictive methods play a critical role in analyzing financial data sets. In the current paper, applications of 22 methods classified in four categories namely data mining and machine learning, numerical analysis, operation research techniques and meta-heuristic techniques, in financial data sets are studied. To this end, first, literature reviews on these methods are given. For each method, a data analysis case (as an illustrative example) is presented and the problem is analyzed with the mentioned method. An actual case is given to apply those methods to solve the problem and to choose a better one. Finally, a conclusion section is proposed.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Using of data science in healthcare

Problems of Innovation and Investment Development ◽

10.33813/2224-1213.24.2021.15 ◽

2021 ◽

pp. 149-156

Author(s):

Ihor Ponomarenko ◽

Oleksandra Lubkovska

Keyword(s):

Machine Learning ◽

Health Care ◽

Business Intelligence ◽

Data Science ◽

Large Data ◽

Science Methods ◽

Medical Field ◽

Learning Methods ◽

Machine Learning Methods ◽

Using Data

The subject of the research is the approach to the possibility of using data science methods in the field of health care for integrated data processing and analysis in order to optimize economic and specialized processes The purpose of writing this article is to address issues related to the specifics of the use of Data Science methods in the field of health care on the basis of comprehensive information obtained from various sources. Methodology. The research methodology is system-structural and comparative analyzes (to study the application of BI-systems in the process of working with large data sets); monograph (the study of various software solutions in the market of business intelligence); economic analysis (when assessing the possibility of using business intelligence systems to strengthen the competitive position of companies). The scientific novelty the main sources of data on key processes in the medical field. Examples of innovative methods of collecting information in the field of health care, which are becoming widespread in the context of digitalization, are presented. The main sources of data in the field of health care used in Data Science are revealed. The specifics of the application of machine learning methods in the field of health care in the conditions of increasing competition between market participants and increasing demand for relevant products from the population are presented. Conclusions. The intensification of the integration of Data Science in the medical field is due to the increase of digitized data (statistics, textual informa- tion, visualizations, etc.). Through the use of machine learning methods, doctors and other health professionals have new opportunities to improve the efficiency of the health care system as a whole. Key words: Data science, efficiency, information, machine learning, medicine, Python, healthcare.

Download Full-text

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

Social Effects of the Application of Data Science Methods

Encyclopedia of Education and Information Technologies ◽

10.1007/978-3-030-10576-1_300584 ◽

2020 ◽

pp. 1522-1522

Keyword(s):

Data Science ◽

Social Effects ◽

Science Methods

Download Full-text

Multi Objective Optimization Based Feature Selection Algorithms for Big Data Analytics: A Review

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.54105/ijainn.e1040.121521 ◽

2021 ◽

pp. 1-4

Author(s):

Aakriti Shukla ◽

◽

Dr Damodar Prasad Tiwari ◽

Keyword(s):

Feature Selection ◽

Big Data ◽

Data Science ◽

Big Data Analytics ◽

Data Sets ◽

Big Data Applications ◽

High Workload ◽

Long Time ◽

Analytical Result ◽

Result Analysis

Dimension reduction or feature selection is thought to be the backbone of big data applications in order to improve performance. Many scholars have shifted their attention in recent years to data science and analysis for real-time applications using big data integration. It takes a long time for humans to interact with big data. As a result, while handling high workload in a distributed system, it is necessary to make feature selection elastic and scalable. In this study, a survey of alternative optimizing techniques for feature selection are presented, as well as an analytical result analysis of their limits. This study contributes to the development of a method for improving the efficiency of feature selection in big complicated data sets.

Download Full-text

How data science methods can improve the quality and efficiency of ICF and HEDP research

10.2172/1650593 ◽

2020 ◽

Author(s):

Patrick Knapp ◽

Michael Glinsky ◽

Benjamin Tobias ◽

John Kline

Keyword(s):

Data Science ◽

Science Methods

Download Full-text

Closing the research-implementation gap using data science tools: a case study with pollinators of British Columbia

10.1101/2020.10.30.362699 ◽

2020 ◽

Author(s):

Laura Melissa Guzman ◽

Tyler Kelly ◽

Lora Morandin ◽

Leithen M’Gonigle ◽

Elizabeth Elle

Keyword(s):

British Columbia ◽

Data Science ◽

Academic Research ◽

Science Methods ◽

Research Implementation ◽

Conservation Practice ◽

Implementation Gap ◽

Web App ◽

Using Data

AbstractA challenge in conservation is the gap between knowledge generated by researchers and the information being used to inform conservation practice. This gap, widely known as the research-implementation gap, can limit the effectiveness of conservation practice. One way to address this is to design conservation tools that are easy for practitioners to use. Here, we implement data science methods to develop a tool to aid in conservation of pollinators in British Columbia. Specifically, in collaboration with Pollinator Partnership Canada, we jointly develop an interactive web app, the goal of which is two-fold: (i) to allow end users to easily find and interact with the data collected by researchers on pollinators in British Columbia (prior to development of this app, data were buried in supplements from individual research publications) and (ii) employ up to date statistical tools in order to analyse phenological coverage of a set of plants. Previously, these tools required high programming competency in order to access. Our app provides an example of one way that we can make the products of academic research more accessible to conservation practitioners. We also provide the source code to allow other developers to develop similar apps suitable for their data.

Download Full-text

Collaboration between Government and Research Community to Respond to COVID-19: Israel’s Case

Journal of Open Innovation Technology Market and Complexity ◽

10.3390/joitmc7040208 ◽

2021 ◽

Vol 7 (4) ◽

pp. 208

Author(s):

Mor Peleg ◽

Amnon Reichman ◽

Sivan Shachar ◽

Tamir Gadot ◽

Meytal Avgil Tsadok ◽

...

Keyword(s):

Health Policy ◽

Data Science ◽

Research Community ◽

Remote Access ◽

Data Driven ◽

Multidisciplinary Teams ◽

Science Methods ◽

Research Environments ◽

Future Data ◽

The Government

Triggered by the COVID-19 crisis, Israel’s Ministry of Health (MoH) held a virtual datathon based on deidentified governmental data. Organized by a multidisciplinary committee, Israel’s research community was invited to offer insights to help solve COVID-19 policy challenges. The Datathon was designed to develop operationalizable data-driven models to address COVID-19 health policy challenges. Specific relevant challenges were defined and diverse, reliable, up-to-date, deidentified governmental datasets were extracted and tested. Secure remote-access research environments were established. Registration was open to all citizens. Around a third of the applicants were accepted, and they were teamed to balance areas of expertise and represent all sectors of the community. Anonymous surveys for participants and mentors were distributed to assess usefulness and points for improvement and retention for future datathons. The Datathon included 18 multidisciplinary teams, mentored by 20 data scientists, 6 epidemiologists, 5 presentation mentors, and 12 judges. The insights developed by the three winning teams are currently considered by the MoH as potential data science methods relevant for national policies. Based on participants’ feedback, the process for future data-driven regulatory responses for health crises was improved. Participants expressed increased trust in the MoH and readiness to work with the government on these or future projects.

Download Full-text

The R package “eseis” – a software toolbox for environmental seismology

Earth Surface Dynamics ◽

10.5194/esurf-6-669-2018 ◽

2018 ◽

Vol 6 (3) ◽

pp. 669-686 ◽

Cited By ~ 8

Author(s):

Michael Dietze

Keyword(s):

Data Science ◽

Earth Science ◽

Science Research ◽

Worked Examples ◽

R Package ◽

Research Field ◽

Data Sets ◽

Research Fields ◽

Scientific Disciplines ◽

Analysis Environment

Abstract. Environmental seismology is the study of the seismic signals emitted by Earth surface processes. This emerging research field is at the intersection of seismology, geomorphology, hydrology, meteorology, and further Earth science disciplines. It amalgamates a wide variety of methods from across these disciplines and ultimately fuses them in a common analysis environment. This overarching scope of environmental seismology requires a coherent yet integrative software which is accepted by many of the involved scientific disciplines. The statistic software R has gained paramount importance in the majority of data science research fields. R has well-justified advances over other mostly commercial software, which makes it the ideal language to base a comprehensive analysis toolbox on. The article introduces the avenues and needs of environmental seismology, and how these are met by the R package eseis. The conceptual structure, example data sets, and available functions are demonstrated. Worked examples illustrate possible applications of the package and in-depth descriptions of the flexible use of the functions. The package has a registered DOI, is available under the GPL licence on the Comprehensive R Archive Network (CRAN), and is maintained on GitHub.

Download Full-text

A Holistic Approach to Financial Data Science: Data, Technology, and Analytics

The Journal of Financial Data Science ◽

10.3905/jfds.2020.1.031 ◽

2020 ◽

Vol 2 (2) ◽

pp. 64-84

Author(s):

Tamer Khraisha

Keyword(s):

Data Science ◽

Holistic Approach ◽

Financial Data ◽

Science Data

Download Full-text