Big Data: New Tricks for Econometrics

Computers are now involved in many economic transactions and can capture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well, but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning, and so on may allow for more effective ways to model complex relationships. In this essay, I will describe a few of these tools for manipulating and analyzing big data. I believe that these methods have a lot to offer and should be more widely known and used by economists.

Download Full-text

Big Data as a Revolutionary Tool in Finance

Journal of Financial Innovation ◽

10.15194/jofi_2015.v1.i2.26 ◽

2015 ◽

Vol 1 (2) ◽

Author(s):

Aureliano Angel Bressan

Keyword(s):

Big Data ◽

Portfolio Management ◽

Predictive Analytics ◽

Research Field ◽

Machine Learning Techniques ◽

Product Reviews ◽

Reputational Risk ◽

Learning Techniques ◽

Model Complex ◽

Complex Relationships

A data driven culture is arising as a research field and analytic tool in Finance and Management since the advent of structured, semi-structured and unstructured socio-economic and demographic information from social media, mobile devices, blogs and product reviews from consumers. Big Data, the expression that encompasses this revolution, involves the usage of new tools for financial professionals and academic researchers due to the size of data involved, which require more powerful manipulation tools. In this sense, Machine Learning techniques can allow more effective ways to model complex relationships that arise from the interaction of different types of data, regarding issues such as Operational and Reputational Risk, Portfolio Management, Business Intelligence and Predictive Analytics. The following books can be a good start for those interested in this new field.

Download Full-text

Predictive modeling for peri-implantitis by using machine learning techniques

Scientific Reports ◽

10.1038/s41598-021-90642-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tomoaki Mameno ◽

Masahiro Wada ◽

Kazunori Nozaki ◽

Toshihito Takahashi ◽

Yoshitaka Tsujioka ◽

...

Keyword(s):

Machine Learning ◽

Demographic Data ◽

Risk Indicators ◽

Machine Learning Techniques ◽

Support Vector ◽

Machine Learning Methods ◽

Complex Interactions ◽

Learning Techniques ◽

Increased Risk ◽

Vector Machines

AbstractThe purpose of this retrospective cohort study was to create a model for predicting the onset of peri-implantitis by using machine learning methods and to clarify interactions between risk indicators. This study evaluated 254 implants, 127 with and 127 without peri-implantitis, from among 1408 implants with at least 4 years in function. Demographic data and parameters known to be risk factors for the development of peri-implantitis were analyzed with three models: logistic regression, support vector machines, and random forests (RF). As the results, RF had the highest performance in predicting the onset of peri-implantitis (AUC: 0.71, accuracy: 0.70, precision: 0.72, recall: 0.66, and f1-score: 0.69). The factor that had the most influence on prediction was implant functional time, followed by oral hygiene. In addition, PCR of more than 50% to 60%, smoking more than 3 cigarettes/day, KMW less than 2 mm, and the presence of less than two occlusal supports tended to be associated with an increased risk of peri-implantitis. Moreover, these risk indicators were not independent and had complex effects on each other. The results of this study suggest that peri-implantitis onset was predicted in 70% of cases, by RF which allows consideration of nonlinear relational data with complex interactions.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

Implementing Computational Intelligence Techniques for Security Systems Design - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-2418-3.ch007 ◽

2020 ◽

pp. 146-162

Author(s):

Hesham M. Al-Ammal

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Anomaly Detection ◽

Real Life ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Learning Techniques ◽

Vector Machines

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

A Probe into the Parallel Elements of Diabetes Mellitus and Alzheimer’s Disease

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9238 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3598-3604

Author(s):

M. S. Roobini ◽

M. Lakshmi

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Morphological Changes ◽

Memory Loss ◽

Machine Learning Techniques ◽

Support Vector ◽

Huge Number ◽

Learning Techniques ◽

Vector Machines ◽

The Mind

Alzheimer’s Disease (AD) is a standout amongst the most familiar types of memory loss influencing a huge number of senior individuals around the world which is the main source of dementia and memory misfortune. AD causes shrinkage in hippocampus and cerebral cortex and it grows the ventricles in the mind Enhancing home and network based composed consideration is basic to alleviating Alzheimer’s impacts on people and families and to decreasing mounting medicinal services costs. Distinguishing early morphological changes in the mind and making early determination are vital for Alzheimer’s ailment (AD). A few machine learning techniques, for example, Support vector machines have been utilized and a portion of these strategies have been appeared to be extremely compelling in diagnosing AD from neuroimages, some of the time significantly more viable than human radiologists. MRI uncover the data of AD however decay districts are diverse for various individuals which makes the finding somewhat trickier. By utilizing Convolutional Neural Networks, the issue can be settled with insignificant mistake rate. This paper proposes a profound Convolutional Neural Network (CNN) for Alzheimer’s Disease finding utilizing mind MRI information examination. The calculation was prepared and tried utilizing the MRI information from Alzheimer’s Disease Neuroimaging Initiative.

Download Full-text

Predictive Models for the Medical Diagnosis of Dengue: A Case Study in Paraguay

Computational and Mathematical Methods in Medicine ◽

10.1155/2019/7307803 ◽

2019 ◽

Vol 2019 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Jorge D. Mello-Román ◽

Julio C. Mello-Román ◽

Santiago Gómez-Guerrero ◽

Miguel García-Torres

Keyword(s):

Public Health ◽

Medical Diagnosis ◽

Machine Learning Techniques ◽

Support Vector ◽

The Public ◽

Learning Techniques ◽

Previous Diagnosis ◽

Vector Machines ◽

High Incidence

Early diagnosis of dengue continues to be a concern for public health in countries with a high incidence of this disease. In this work, we compared two machine learning techniques: artificial neural networks (ANN) and support vector machines (SVM) as assistance tools for medical diagnosis. The performance of classification models was evaluated in a real dataset of patients with a previous diagnosis of dengue extracted from the public health system of Paraguay during the period 2012–2016. The ANN multilayer perceptron achieved better results with an average of 96% accuracy, 96% sensitivity, and 97% specificity, with low variation in thirty different partitions of the dataset. In comparison, SVM polynomial obtained results above 90% for accuracy, sensitivity, and specificity.

Download Full-text

Detection of Loss Zones while Drilling Using Different Machine Learning Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4051553 ◽

2021 ◽

pp. 1-29

Author(s):

Ahmed Alsaihati ◽

Mahmoud Abughaban ◽

Salaheldin Elkatatny ◽

Abdulazeez Abdulraheem

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbors ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Vector Machines ◽

Testing Set

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.

Download Full-text

Acoustic Diversity Classification Using Machine Learning Techniques: Towards Automated Marine Big Data Analysis

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020600118 ◽

2020 ◽

Vol 29 (03n04) ◽

pp. 2060011

Author(s):

Emna Hachicha Belghith ◽

François Rioult ◽

Medjber Bouzidi

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Techniques ◽

Acoustic Diversity ◽

Marine Data

During the last years, big data has become the new emerging trend that increasingly attracting the attention of the R&D community in several fields (e.g., image processing, database engineering, data mining, artificial intelligence). Marine data is part of these fields which accommodates this growth, hence the appearance of marine big data paradigm that monitoring advocates the assessment of human impact on marine data. Nonetheless, supporting acoustic sounds classification is missing in such environment, with taking into account the diversity of such data (i.e., sounds of living undersea species, sounds of human activities, and sounds of environmental effects). To overcome this issue, we propose in this paper an approach that efficiently allowing acoustic diversity classification using machine learning techniques. The aim is to reach an automated support of marine big data analysis. We have conducted a set of experiments, using a real marine dataset, in order to validate our approach and show its effectiveness and efficiency. To do so, three machine learning techniques are employed: (i) classic machine learning models (i.e., k-nearest neighbor and support vector machine), (ii) deep learning based on convolutional neural networks, and (iii) transfer learning based on the reuse of pretrained models.

Download Full-text

Novel Approach for Rooftop Detection Using Support Vector Machine

ISRN Machine Vision ◽

10.1155/2013/819768 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Hayk Baluyan ◽

Bikash Joshi ◽

Amer Al Hinai ◽

Wei Lee Woon

Keyword(s):

High Rate ◽

Machine Learning Techniques ◽

Support Vector ◽

Histogram Method ◽

Novel Approach ◽

Learning Techniques ◽

Vector Machines ◽

First Pass ◽

Homogeneous Regions ◽

Improved Accuracy

A new method for detecting rooftops in satellite images is presented. The proposed method is based on a combination of machine learning techniques, namely, k-means clustering and support vector machines (SVM). Firstly k-means clustering is used to segment the image into a set of rooftop candidates—these are homogeneous regions in the image which are potentially associated with rooftop areas. Next, the candidates are submitted to a classification stage which determines which amongst them correspond to “true” rooftops. To achieve improved accuracy, a novel two-pass classification process is used. In the first pass, a trained SVM is used in the normal way to distinguish between rooftop and nonrooftop regions. However, this can be a challenging task, resulting in a relatively high rate of misclassification. Hence, the second pass, which we call the “histogram method,” was devised with the aim of detecting rooftops which were missed in the first pass. The performance of the model is assessed both in terms of the percentage of correctly classified candidates as well as the accuracy of the estimated rooftop area.

Download Full-text