Machine Learning Classification Models for More Effective Mine Safety Inspections

Volume 14: Emerging Technologies; Engineering Management, Safety, Ethics, Society, and Education; Materials: Genetics to Structures ◽

10.1115/imece2014-38709 ◽

2014 ◽

Author(s):

Jeremy M. Gernand

Keyword(s):

Machine Learning ◽

Safety Management ◽

Regression Tree ◽

The United States ◽

Worker Safety ◽

Classification And Regression Tree ◽

Mine Safety ◽

Health Administration ◽

Mining Operations ◽

Machine Learning Classification

The safety of mining in the United States has improved significantly over the past few decades, although it remains one of the more dangerous occupations. Following the Sago mine disaster in January 2006, federal legislation (The Mine Improvement and New Emergency Response {MINER} Act of 2006) tightened regulations and sought to strengthen the authority and safety inspection practices of the Mine Safety and Health Administration (MSHA). While penalties and inspection frequency have increased, understanding of what types of inspection findings are most indicative of serious future incidents is limited. The most effective safety management and oversight could be accomplished by a thorough understanding of what types of infractions or safety inspection findings are most indicative of serious future personnel injuries. However, given the large number of potentially different and unique inspection findings, varied mine characteristics, and types of specific safety incidents, this question is complex in terms of the large number of potentially relevant input parameters. New regulations rely on increasing the frequency and severity of infraction penalties to encourage mining operations to improve worker safety, but without the knowledge of which specific infractions may truly be signaling a dangerous work environment. This paper seeks to inform the question, what types of inspection findings are most indicative of serious future incidents for specific types of mining operations? This analysis utilizes publicly available MSHA databases of cited infractions and reportable incidents. These inspection results are used to train machine learning Classification and Regression Tree (CART) and Random Forest (RF) models that divide the groups of mines into peer groups based on their recent infractions and other defining characteristics with the aim of predicting whether or not a fatal or serious disabling injury is more likely to occur in the following 12-month period. With these characteristics available, additional scrutiny may be appropriately directed at those mining operations at greatest risk of experiencing a worker fatality or disabling injury in the near future. Increased oversight and attention on these mines where workers are at greatest risk may more effectively reduce the likelihood of worker deaths and injuries than increased penalties and inspection frequency alone.

Download Full-text

Evaluating the Effectiveness of Mine Safety Enforcement Actions in Forecasting the Lost-Days Rate at Specific Worksites

ASCE-ASME J Risk and Uncert in Engrg Sys Part B Mech Engrg ◽

10.1115/1.4032929 ◽

2016 ◽

Vol 2 (4) ◽

Author(s):

Jeremy M. Gernand

Keyword(s):

Safety Management ◽

Regression Tree ◽

The United States ◽

Worker Safety ◽

Classification And Regression Tree ◽

Mine Safety ◽

Health Administration ◽

Mining Operations ◽

Machine Learning Classification ◽

Safety And Health

The safety of mining in the United States has improved significantly over the past few decades, although it remains one of the more dangerous occupations. Following the Sago mine disaster in January 2006, federal legislation (The Mine Improvement and New Emergency Response [MINER] Act of 2006) tightened regulations and sought to strengthen the authority and safety-inspection practices of the Mine Safety and Health Administration (MSHA). While penalties and inspection frequency have increased, understanding of what types of inspection findings are most indicative of serious future incidents is limited. The most effective safety management and oversight could be accomplished by a thorough understanding of what types of infractions or safety inspection findings are most indicative of serious future personnel injuries. However, given the large number of potentially different and unique inspection findings, varied mine characteristics, and types of specific safety incidents, this question is complex in terms of the large number of potentially relevant input parameters. New regulations rely on increasing the frequency and severity of infraction penalties to encourage mining operations to improve worker safety, but without the knowledge of which specific infractions may truly be signaling a dangerous work environment. This paper seeks to inform the question: What types of inspection findings are most indicative of serious future incidents for specific types of mining operations? This analysis utilizes publicly available MSHA databases of cited infractions and reportable incidents. These inspection results are used to train machine learning Classification and Regression Tree (CART) and Random Forest (RF) models that divide the groups of mines into peer groups based on their recent infractions and other defining characteristics with the aim of predicting whether or not a fatal or serious disabling injury is more likely to occur in the following 12-month period. With these characteristics available, additional scrutiny may be appropriately directed at those mining operations at greatest risk of experiencing a worker fatality or disabling injury in the near future. Increased oversight and attention on these mines where workers are at greatest risk may more effectively reduce the likelihood of worker deaths and injuries than increased penalties and inspection frequency alone.

Download Full-text

Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17197054 ◽

2020 ◽

Vol 17 (19) ◽

pp. 7054

Author(s):

Anurag Yedla ◽

Fatemeh Davoudi Kakhki ◽

Ali Jannesari

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Predictive Analytics ◽

Machine Learning Algorithms ◽

Mine Safety ◽

Machine Learning Techniques ◽

Health Administration ◽

Tabular Data ◽

Mining Operations ◽

Learning Techniques

Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.

Download Full-text

Predictors of Turnover Intention in U.S. Federal Government Workforce: Machine Learning Evidence That Perceived Comprehensive HR Practices Predict Turnover Intention

Public Personnel Management ◽

10.1177/0091026020977562 ◽

2020 ◽

pp. 009102602097756

Author(s):

In Gu Kang ◽

Ben Croft ◽

Barbara A. Bichelmeyer

Keyword(s):

Machine Learning ◽

Job Satisfaction ◽

At Risk ◽

Turnover Intention ◽

Regression Tree ◽

Classification And Regression Tree ◽

Federal Employees ◽

Organizational Policies ◽

Machine Learning Classification ◽

Hr Practices

This study aims to identify important predictors of turnover intention and to characterize subgroups of U.S. federal employees at high risk for turnover intention. Data were drawn from the 2018 Federal Employee Viewpoint Survey (FEVS, unweighted N = 598,003), a nationally representative sample of U.S. federal employees. Machine learning Classification and Regression Tree (CART) analyses were conducted to predict turnover intention and accounted for sample weights. CART analyses identified six at-risk subgroups. Predictor importance scores showed job satisfaction was the strongest predictor of turnover intention, followed by satisfaction with organization, loyalty, accomplishment, involvement in decisions, likeness to job, satisfaction with promotion opportunities, skill development opportunities, organizational tenure, and pay satisfaction. Consequently, Human Resource (HR) departments should seek to implement comprehensive HR practices to enhance employees’ perceptions on job satisfaction, workplace environments and systems, and favorable organizational policies and supports and make tailored interventions for the at-risk subgroups.

Download Full-text

Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine

Minerals ◽

10.3390/min11070776 ◽

2021 ◽

Vol 11 (7) ◽

pp. 776

Author(s):

Rajive Ganguli ◽

Preston Miller ◽

Rambabu Pothina

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Safety Management ◽

Safety Data ◽

Mine Safety ◽

Health Administration ◽

Single Model ◽

Mine Site

To achieve the goal of preventing serious injuries and fatalities, it is important for a mine site to analyze site specific mine safety data. The advances in natural language processing (NLP) create an opportunity to develop machine learning (ML) tools to automate analysis of mine health and safety management systems (HSMS) data without requiring experts at every mine site. As a demonstration, nine random forest (RF) models were developed to classify narratives from the Mine Safety and Health Administration (MSHA) database into nine accident types. MSHA accident categories are quite descriptive and are, thus, a proxy for high level understanding of the incidents. A single model developed to classify narratives into a single category was more effective than a single model that classified narratives into different categories. The developed models were then applied to narratives taken from a mine HSMS (non-MSHA), to classify them into MSHA accident categories. About two thirds of the non-MSHA narratives were automatically classified by the RF models. The automatically classified narratives were then evaluated manually. The evaluation showed an accuracy of 96% for automated classifications. The near perfect classification of non-MSHA narratives by MSHA based machine learning models demonstrates that NLP can be a powerful tool to analyze HSMS data.

Download Full-text

Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes

Applied Sciences ◽

10.3390/app112311227 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11227

Author(s):

Arnold Kamis ◽

Yudan Ding ◽

Zhenzhen Qu ◽

Chenchen Zhang

Keyword(s):

United States ◽

Machine Learning ◽

Additive Model ◽

Regression Tree ◽

Predictor Variable ◽

The United States ◽

Predictor Variables ◽

Future Research ◽

Machine Learning Methods ◽

Variance Explained

The purpose of this paper is to model the cases of COVID-19 in the United States from 13 March 2020 to 31 May 2020. Our novel contribution is that we have obtained highly accurate models focused on two different regimes, lockdown and reopen, modeling each regime separately. The predictor variables include aggregated individual movement as well as state population density, health rank, climate temperature, and political color. We apply a variety of machine learning methods to each regime: Multiple Regression, Ridge Regression, Elastic Net Regression, Generalized Additive Model, Gradient Boosted Machine, Regression Tree, Neural Network, and Random Forest. We discover that Gradient Boosted Machines are the most accurate in both regimes. The best models achieve a variance explained of 95.2% in the lockdown regime and 99.2% in the reopen regime. We describe the influence of the predictor variables as they change from regime to regime. Notably, we identify individual person movement, as tracked by GPS data, to be an important predictor variable. We conclude that government lockdowns are an extremely important de-densification strategy. Implications and questions for future research are discussed.

Download Full-text

Flood Susceptibility Modeling in a Subtropical Humid Low-Relief Alluvial Plain Environment: Application of Novel Ensemble Machine Learning Approach

Frontiers in Earth Science ◽

10.3389/feart.2021.659296 ◽

2021 ◽

Vol 9 ◽

Author(s):

Manish Pandey ◽

Aman Arora ◽

Alireza Arabameri ◽

Romulus Costache ◽

Naveen Kumar ◽

...

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Classification And Regression Tree ◽

Ground Subsidence ◽

Ensemble Model ◽

Ganga Plain ◽

Humid Climate ◽

Area Index ◽

Flood Susceptibility ◽

Middle Ganga Plain

This study has developed a new ensemble model and tested another ensemble model for flood susceptibility mapping in the Middle Ganga Plain (MGP). The results of these two models have been quantitatively compared for performance analysis in zoning flood susceptible areas of low altitudinal range, humid subtropical fluvial floodplain environment of the Middle Ganga Plain (MGP). This part of the MGP, which is in the central Ganga River Basin (GRB), is experiencing worse floods in the changing climatic scenario causing an increased level of loss of life and property. The MGP experiencing monsoonal subtropical humid climate, active tectonics induced ground subsidence, increasing population, and shifting landuse/landcover trends and pattern, is the best natural laboratory to test all the susceptibility prediction genre of models to achieve the choice of best performing model with the constant number of input parameters for this type of topoclimatic environmental setting. This will help in achieving the goal of model universality, i.e., finding out the best performing susceptibility prediction model for this type of topoclimatic setting with the similar number and type of input variables. Based on the highly accurate flood inventory and using 12 flood predictors (FPs) (selected using field experience of the study area and literature survey), two machine learning (ML) ensemble models developed by bagging frequency ratio (FR) and evidential belief function (EBF) with classification and regression tree (CART), CART-FR and CART-EBF, were applied for flood susceptibility zonation mapping. Flood and non-flood points randomly generated using flood inventory have been apportioned in 70:30 ratio for training and validation of the ensembles. Based on the evaluation performance using threshold-independent evaluation statistic, area under receiver operating characteristic (AUROC) curve, 14 threshold-dependent evaluation metrices, and seed cell area index (SCAI) meant for assessing different aspects of ensembles, the study suggests that CART-EBF (AUCSR = 0.843; AUCPR = 0.819) was a better performant than CART-FR (AUCSR = 0.828; AUCPR = 0.802). The variability in performances of these novel-advanced ensembles and their comparison with results of other published models espouse the need of testing these as well as other genres of susceptibility models in other topoclimatic environments also. Results of this study are important for natural hazard managers and can be used to compute the damages through risk analysis.

Download Full-text

GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran

Environmental Monitoring and Assessment ◽

10.1007/s10661-015-5049-6 ◽

2015 ◽

Vol 188 (1) ◽

Cited By ~ 208

Author(s):

Seyed Amir Naghibi ◽

Hamid Reza Pourghasemi ◽

Barnali Dixon

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Groundwater Potential ◽

Classification And Regression Tree ◽

Learning Models ◽

Boosted Regression Tree ◽

Potential Mapping ◽

Classification And Regression ◽

Groundwater Potential Mapping ◽

Machine Learning Models

Download Full-text

Single-Channel EEG-Based Machine Learning Method for Prescreening Major Depressive Disorder

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019500342 ◽

2019 ◽

Vol 18 (05) ◽

pp. 1579-1603 ◽

Cited By ~ 2

Author(s):

Zhijiang Wan ◽

Hao Zhang ◽

Jiajin Huang ◽

Haiyan Zhou ◽

Jie Yang ◽

...

Keyword(s):

Machine Learning ◽

Major Depressive Disorder ◽

Depressive Disorder ◽

Single Channel ◽

Regression Tree ◽

Classification And Regression Tree ◽

Machine Learning Method ◽

Learning Method ◽

Eeg Analysis ◽

Major Depressive

Many studies developed the machine learning method for discriminating Major Depressive Disorder (MDD) and normal control based on multi-channel electroencephalogram (EEG) data, less concerned about using single channel EEG collected from forehead scalp to discriminate the MDD. The EEG dataset is collected by the Fp1 and Fp2 electrode of a 32-channel EEG system. The result demonstrates that the classification performance based on the EEG of Fp1 location exceeds the performance based on the EEG of Fp2 location, and shows that single-channel EEG analysis can provide discrimination of MDD at the level of multi-channel EEG analysis. Furthermore, a portable EEG device collecting the signal from Fp1 location is used to collect the second dataset. The Classification and Regression Tree combining genetic algorithm (GA) achieves the highest accuracy of 86.67% based on leave-one-participant-out cross validation, which shows that the single-channel EEG-based machine learning method is promising to support MDD prescreening application.

Download Full-text

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

JMIR Medical Informatics ◽

10.2196/18910 ◽

2020 ◽

Vol 8 (7) ◽

pp. e18910

Author(s):

Debbie Rankin ◽

Michaela Black ◽

Raymond Bond ◽

Jonathan Wallace ◽

Maurice Mulvenna ◽

...

Keyword(s):

Machine Learning ◽

Health Care ◽

Bayesian Network ◽

Synthetic Data ◽

Regression Tree ◽

Real Data ◽

Classification And Regression Tree ◽

Supervised Machine Learning ◽

Statistical Disclosure ◽

Classification And Regression

Background The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. Objective This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. Methods A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. Results A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. Conclusions The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.

Download Full-text

An Analysis of Contributing Mining Factors in Coal Workers’ Pneumoconiosis Prevalence in the United States Coal Mines, 1986-2018

10.21203/rs.3.rs-506261/v1 ◽

2021 ◽

Author(s):

Younes Shekarian ◽

Elham Rahimi ◽

Naser Shekarian ◽

Mohammad Rezaee ◽

Pedram Roghanchi

Keyword(s):

United States ◽

Statistical Analysis ◽

Coal Mines ◽

The United States ◽

Mine Safety ◽

Western Region ◽

Health Administration ◽

Underground Coal Mines ◽

The Western Region ◽

Coal Workers

Abstract In the United States, an unexpected and severe increase in coal miners’ lung diseases in the late 1990s prompted researchers to investigate the causes of the disease resurgence. This study aims to scrutinize the effects of various mining parameters, including coal rank, mine size, mining method, coal seam height, and geographical location on the prevalence of CWP in surface and underground coal mines. A comprehensive dataset was created using the U.S. Mine Safety and Health Administration (MSHA) Employment and Accident/Injury databases. The information was merged based on the mine ID by utilizing SQL data management software. A total number of 123,643 mine-year observations were included in the statistical analysis. Generalized Estimating Equation (GEE) model was used to conduct a statistical analysis on a total of 29,707, and 32,643 mine-year observations for underground and surface coal mines, respectively. The results of the econometrics approach revealed that coal workers in underground coal mines are at a greater risk of CWP comparing to those of surface coal operations. Furthermore, underground coal mines in the Appalachia and Interior regions are at a higher risk of CWP prevalence than the Western region. Surface coal mines in the Appalachian coal region are more susceptible to CWP than miners in the Western region. The analysis also indicated that coal workers working in smaller mines are more vulnerable to CWP than those in large mine sizes. Furthermore, coal workers in thin-seam underground mine operations are more likely to develop CWP.

Download Full-text