Earthquake Investigation and Visual Cognizance of Multivariate Temporal Tabular Data Using Machine Learning

Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.

Download Full-text

Machine Learning for Question Answering from Tabular Data

18th International Conference on Database and Expert Systems Applications (DEXA 2007) ◽

10.1109/dexa.2007.4312923 ◽

2007 ◽

Author(s):

Mahboob Alam Khalid ◽

Valentin Jijkoun ◽

Maarten de Rijke

Keyword(s):

Machine Learning ◽

Question Answering ◽

Tabular Data

Download Full-text

A Review of Tabular Data Synthesis Using GANs on an IDS Dataset

Information ◽

10.3390/info12090375 ◽

2021 ◽

Vol 12 (9) ◽

pp. 375

Author(s):

Stavroula Bourou ◽

Andreas El Saer ◽

Terpsichori-Helen Velivassaki ◽

Artemis Voulkidis ◽

Theodore Zahariadis

Keyword(s):

Machine Learning ◽

Defense Mechanism ◽

Learning Tools ◽

Data Generation ◽

Tabular Data ◽

Network Systems ◽

Generative Adversarial Network ◽

Data Synthesis ◽

Detection Systems ◽

Adversarial Network

Recent technological innovations along with the vast amount of available data worldwide have led to the rise of cyberattacks against network systems. Intrusion Detection Systems (IDS) play a crucial role as a defense mechanism in networks against adversarial attackers. Machine Learning methods provide various cybersecurity tools. However, these methods require plenty of data to be trained efficiently, which may be hard to collect or to use due to privacy reasons. One of the most notable Machine Learning tools is the Generative Adversarial Network (GAN), and it has great potential for tabular data synthesis. In this work, we start by briefly presenting the most popular GAN architectures, VanillaGAN, WGAN, and WGAN-GP. Focusing on tabular data generation, CTGAN, CopulaGAN, and TableGAN models are used for the creation of synthetic IDS data. Specifically, the models are trained and evaluated on an NSL-KDD dataset, considering the limitations and requirements that this procedure needs. Finally, based on certain quantitative and qualitative methods, we argue and evaluate the most prominent GANs for tabular network data synthesis.

Download Full-text

Automatically Explaining Machine Learning Prediction Results on Asthma Hospital Visits in Patients With Asthma: Secondary Analysis (Preprint)

10.2196/preprints.21965 ◽

2020 ◽

Author(s):

Gang Luo ◽

Michael D Johnson ◽

Flory L Nkoy ◽

Shan He ◽

Bryan L Stone

Keyword(s):

Machine Learning ◽

Performance Measures ◽

Care Management ◽

Secondary Analysis ◽

Learning Model ◽

Tabular Data ◽

Tailored Interventions ◽

Machine Learning Model ◽

Hospital Visits ◽

Intermountain Healthcare

BACKGROUND Asthma is a major chronic disease that poses a heavy burden on health care. To facilitate the allocation of care management resources aimed at improving outcomes for high-risk patients with asthma, we recently built a machine learning model to predict asthma hospital visits in the subsequent year in patients with asthma. Our model is more accurate than previous models. However, like most machine learning models, it offers no explanation of its prediction results. This creates a barrier for use in care management, where interpretability is desired. OBJECTIVE This study aims to develop a method to automatically explain the prediction results of the model and recommend tailored interventions without lowering the performance measures of the model. METHODS Our data were imbalanced, with only a small portion of data instances linking to future asthma hospital visits. To handle imbalanced data, we extended our previous method of automatically offering rule-formed explanations for the prediction results of any machine learning model on tabular data without lowering the model’s performance measures. In a secondary analysis of the 334,564 data instances from Intermountain Healthcare between 2005 and 2018 used to form our model, we employed the extended method to automatically explain the prediction results of our model and recommend tailored interventions. The patient cohort consisted of all patients with asthma who received care at Intermountain Healthcare between 2005 and 2018, and resided in Utah or Idaho as recorded at the visit. RESULTS Our method explained the prediction results for 89.7% (391/436) of the patients with asthma who, per our model’s correct prediction, were likely to incur asthma hospital visits in the subsequent year. CONCLUSIONS This study is the first to demonstrate the feasibility of automatically offering rule-formed explanations for the prediction results of any machine learning model on imbalanced tabular data without lowering the performance measures of the model. After further improvement, our asthma outcome prediction model coupled with the automatic explanation function could be used by clinicians to guide the allocation of limited asthma care management resources and the identification of appropriate interventions.

Download Full-text

A Machine-Learning Approach to Automatic Detection of Delimiters in Tabular Data Files

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc-smartcity-dss.2016.0213 ◽

2016 ◽

Author(s):

Shitesh Saurav ◽

Peter Schwarz

Keyword(s):

Machine Learning ◽

Automatic Detection ◽

Learning Approach ◽

Tabular Data ◽

Machine Learning Approach ◽

Data Files

Download Full-text

Machine Learning for Question Answering from Tabular Data

18th International Conference on Database and Expert Systems Applications (DEXA 2007) ◽

10.1109/dexa.2007.119 ◽

2007 ◽

Cited By ~ 9

Author(s):

Mahboob Alam Khalid ◽

Valentin Jijkoun ◽

Maarten de Rijke

Keyword(s):

Machine Learning ◽

Question Answering ◽

Tabular Data

Download Full-text

A Comparative Study of Machine Learning Models for Tabular Data Through Challenge of Monitoring Parkinson’s Disease Progression Using Voice Recordings

Advances in Computer Vision and Computational Biology - Transactions on Computational Science and Computational Intelligence ◽

10.1007/978-3-030-71051-4_38 ◽

2021 ◽

pp. 485-496

Author(s):

Mohammadreza Iman ◽

Amy Giuntini ◽

Hamid Reza Arabnia ◽

Khaled Rasheed

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Comparative Study ◽

Disease Progression ◽

Learning Models ◽

Tabular Data ◽

Machine Learning Models

Download Full-text

A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data

Applied Sciences ◽

10.3390/app11167274 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7274

Author(s):

Raphael Mazzine Barbosa de Oliveira ◽

David Martens

Keyword(s):

Machine Learning ◽

Building Blocks ◽

Evaluation Metrics ◽

Tabular Data ◽

Objective Metrics ◽

Algorithmic Approaches

Counterfactual explanations are viewed as an effective way to explain machine learning predictions. This interest is reflected by a relatively young literature with already dozens of algorithms aiming to generate such explanations. These algorithms are focused on finding how features can be modified to change the output classification. However, this rather general objective can be achieved in different ways, which brings about the need for a methodology to test and benchmark these algorithms. The contributions of this work are manifold: First, a large benchmarking study of 10 algorithmic approaches on 22 tabular datasets is performed, using nine relevant evaluation metrics; second, the introduction of a novel, first of its kind, framework to test counterfactual generation algorithms; third, a set of objective metrics to evaluate and compare counterfactual results; and, finally, insight from the benchmarking results that indicate which approaches obtain the best performance on what type of dataset. This benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context, and can help researchers in the design and evaluation of current and future counterfactual generation algorithms. Our findings show that, overall, there’s no single best algorithm to generate counterfactual explanations as the performance highly depends on properties related to the dataset, model, score, and factual point specificities.

Download Full-text

Transfer Learning for Tabular Data

10.36227/techrxiv.16974124.v1 ◽

2021 ◽

Author(s):

Leonid Joffe

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Deep Learning ◽

Transfer Learning ◽

Internal Representation ◽

The Other ◽

Learning Models ◽

Tabular Data ◽

Convolutional Network ◽

Feature Interactions

Deep learning models for tabular data are restricted to a specific table format. Computer vision models, on the other hand, have a broader applicability; they work on all images and can learn universal features. This allows them to be trained on enormous corpora and have very wide transferability and applicability. Inspired by these properties, this work presents an architecture that aims to capture useful patterns across arbitrary tables. The model is trained on randomly sampled subsets of features from a table, processed by a convolutional network. This internal representation captures feature interactions that appear in the table. Experimental results show that the embeddings produced by this model are useful and transferable across many commonly used machine learning benchmarks datasets. Specifically, that using the embeddings produced by the network as additional features, improves the performance of a number of classifiers.

Download Full-text

Transfer Learning for Tabular Data

10.36227/techrxiv.16974124 ◽

2021 ◽

Author(s):

Leonid Joffe

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Deep Learning ◽

Transfer Learning ◽

Internal Representation ◽

The Other ◽

Learning Models ◽

Tabular Data ◽

Convolutional Network ◽

Feature Interactions

Deep learning models for tabular data are restricted to a specific table format. Computer vision models, on the other hand, have a broader applicability; they work on all images and can learn universal features. This allows them to be trained on enormous corpora and have very wide transferability and applicability. Inspired by these properties, this work presents an architecture that aims to capture useful patterns across arbitrary tables. The model is trained on randomly sampled subsets of features from a table, processed by a convolutional network. This internal representation captures feature interactions that appear in the table. Experimental results show that the embeddings produced by this model are useful and transferable across many commonly used machine learning benchmarks datasets. Specifically, that using the embeddings produced by the network as additional features, improves the performance of a number of classifiers.

Download Full-text