Quantitative Mineral Mapping of Drill Core Surfaces II: Long-Wave Infrared Mineral Characterization Using μXRF and Machine Learning

Abstract Long-wave infrared (LWIR) spectra can be interpreted using a Random Forest machine learning approach to predict mineral species and abundances. In this study, hydrothermally altered carbonate rock core samples from the Fourmile Carlin-type Au discovery, Nevada, were analyzed by LWIR and micro-X-ray fluorescence (μXRF). Linear programming-derived mineral abundances from quantified μXRF data were used as training data to construct a series of Random Forest regression models. The LWIR Random Forest models produced mineral proportion estimates with root mean square errors of 1.17 to 6.75% (model predictions) and 1.06 to 6.19% (compared to quantitative X-ray diffraction data) for calcite, dolomite, kaolinite, white mica, phlogopite, K-feldspar, and quartz. These results are comparable to the error of proportion estimates from linear spectral deconvolution (±7–15%), a commonly used spectral unmixing technique. Having a mineralogical and chemical training data set makes it possible to identify and quantify mineralogy and provides a more robust and meaningful LWIR spectral interpretation than current methods of utilizing a spectral library or spectral end-member extraction. Using the method presented here, LWIR spectroscopy can be used to overcome the limitations inherent with the use of short-wave infrared (SWIR) in fine-grained, low reflectance rocks. This new approach can be applied to any deposit type, improving the accuracy and speed of infrared data interpretation.

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

Explainable machine learning models to understand determinants of COVID-19 mortality in the United States (Preprint)

10.2196/preprints.20511 ◽

2020 ◽

Author(s):

Piyush Mathur ◽

Tavpritesh Sethi ◽

Anya Mathur ◽

Kamal Maheshwari ◽

Jacek Cywinski ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

The United States ◽

Training Data ◽

Mitigation Measures ◽

Data Set ◽

Administrative Action ◽

The Us ◽

Feature Importance ◽

Machine Learning Models

UNSTRUCTURED Introduction The COVID-19 pandemic exhibits an uneven geographic spread which leads to a locational mismatch of testing, mitigation measures and allocation of healthcare resources (human, equipment, and infrastructure).(1) In the absence of effective treatment, understanding and predicting the spread of COVID-19 is unquestionably valuable for public health and hospital authorities to plan for and manage the pandemic. While there have been many models developed to predict mortality, the authors sought to develop a machine learning prediction model that provides an estimate of the relative association of socioeconomic, demographic, travel, and health care characteristics of COVID-19 disease mortality among states in the United States(US). Methods State-wise data was collected for all the features predicting COVID-19 mortality and for deriving feature importance (eTable 1 in the Supplement).(2) Key feature categories include demographic characteristics of the population, pre-existing healthcare utilization, travel, weather, socioeconomic variables, racial distribution and timing of disease mitigation measures (Figure 1 & 2). Two machine learning models, Catboost regression and random forest were trained independently to predict mortality in states on data partitioned into a training (80%) and test (20%) set.(3) Accuracy of models was assessed by R2 score. Importance of the features for prediction of mortality was calculated via two machine learning algorithms - SHAP (SHapley Additive exPlanations) calculated upon CatBoost model and Boruta, a random forest based method trained with 10,000 trees for calculating statistical significance (3-5). Results Results are based on 60,604 total deaths in the US, as of April 30, 2020. Actual number of deaths ranged widely from 7 (Wyoming) to 18,909 (New York).CatBoost regression model obtained an R2 score of 0.99 on the training data set and 0.50 on the test set. Random Forest model obtained an R2 score of 0.88 on the training data set and 0.39 on the test set. Nine out of twenty variables were significantly higher than the maximum variable importance achieved by the shadow dataset in Boruta regression (Figure 2).Both models showed the high feature importance for pre-existing high healthcare utilization reflective in nursing home beds per capita and doctors per 100,000 population. Overall population characteristics such as total population and population density also correlated positively with the number of deaths.Notably, both models revealed a high positive correlation of deaths with percentage of African Americans. Direct flights from China, especially Wuhan were also significant in both models as predictors of death, therefore reflecting early spread of the disease. Associations between deaths and weather patterns, hospital bed capacity, median age, timing of administrative action to mitigate disease spread such as the closure of educational institutions or stay at home order were not significant. The lack of some associations, e.g., administrative action may reflect delayed outcomes of interventions which were not yet reflected in data. Discussion COVID-19 disease has varied spread and mortality across communities amongst different states in the US. While our models show that high population density, pre-existing need for medical care and foreign travel may increase transmission and thus COVID-19 mortality, the effect of geographic, climate and racial disparities on COVID-19 related mortality is not clear. The purpose of our study was not state-wise accurate prediction of deaths in the US, which has already been challenging.(6) Location based understanding of key determinants of COVID-19 mortality, is critically needed for focused targeting of mitigation and control measures. Risk assessment-based understanding of determinants affecting COVID-19 outcomes, using a dynamic and scalable machine learning model such as the two proposed, can help guide resource management and policy framework.

Download Full-text

Evaluation of artificial neural network algorithms for predicting METs and activity type from accelerometer data: validation on an independent sample

Journal of Applied Physiology ◽

10.1152/japplphysiol.00309.2011 ◽

2011 ◽

Vol 111 (6) ◽

pp. 1804-1812 ◽

Cited By ~ 80

Author(s):

Patty S. Freedson ◽

Kate Lyden ◽

Sarah Kozey-Keadle ◽

John Staudenmayer

Keyword(s):

Machine Learning ◽

Training Data ◽

Accelerometer Data ◽

Activity Type ◽

Data Set ◽

University Of Tennessee ◽

Metabolic Equivalents ◽

Artificial Neural ◽

University Of Massachusetts ◽

Mean Square Errors

Previous work from our laboratory provided a “proof of concept” for use of artificial neural networks (nnets) to estimate metabolic equivalents (METs) and identify activity type from accelerometer data (Staudenmayer J, Pober D, Crouter S, Bassett D, Freedson P, J Appl Physiol 107: 1330–1307, 2009). The purpose of this study was to develop new nnets based on a larger, more diverse, training data set and apply these nnet prediction models to an independent sample to evaluate the robustness and flexibility of this machine-learning modeling technique. The nnet training data set (University of Massachusetts) included 277 participants who each completed 11 activities. The independent validation sample ( n = 65) (University of Tennessee) completed one of three activity routines. Criterion measures were 1) measured METs assessed using open-circuit indirect calorimetry; and 2) observed activity to identify activity type. The nnet input variables included five accelerometer count distribution features and the lag-1 autocorrelation. The bias and root mean square errors for the nnet MET trained on University of Massachusetts and applied to University of Tennessee were +0.32 and 1.90 METs, respectively. Seventy-seven percent of the activities were correctly classified as sedentary/light, moderate, or vigorous intensity. For activity type, household and locomotion activities were correctly classified by the nnet activity type 98.1 and 89.5% of the time, respectively, and sport was correctly classified 23.7% of the time. Use of this machine-learning technique operates reasonably well when applied to an independent sample. We propose the creation of an open-access activity dictionary, including accelerometer data from a broad array of activities, leading to further improvements in prediction accuracy for METs, activity intensity, and activity type.

Download Full-text

A Machine Learning Model for Accurate Prediction of Sepsis in ICU Patients in China

10.21203/rs.3.rs-152663/v1 ◽

2021 ◽

Author(s):

Dong Wang ◽

JinBo Li ◽

Yali Sun ◽

Xianfei Ding ◽

Xiaojuan Zhang ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Secondary Analysis ◽

Adverse Outcomes ◽

Training Data ◽

Medical Systems ◽

Validation Data ◽

Data Set ◽

Icu Patients

Abstract Background: Although numerous studies are conducted every year on how to reduce the fatality rate associated with sepsis, it is still a major challenge faced by patients, clinicians, and medical systems worldwide. Early identification and prediction of patients at risk of sepsis and adverse outcomes associated with sepsis are critical. We aimed to develop an artificial intelligence algorithm that can predict sepsis early.Methods: This was a secondary analysis of an observational cohort study from the Intensive Care Unit of the First Affiliated Hospital of Zhengzhou University. A total of 4449 infected patients were randomly assigned to the development and validation data set at a ratio of 4:1. After extracting electronic medical record data, a set of 55 features (variables) was calculated and passed to the random forest algorithm to predict the onset of sepsis.Results: The pre-procedure clinical variables were used to build a prediction model from the training data set using the random forest machine learning method; a 5-fold cross-validation was used to evaluate the prediction accuracy of the model. Finally, we tested the model using the validation data set. The area obtained by the model under the receiver operating characteristic (ROC) curve (AUC) was 0.91, the sensitivity was 87%, and the specificity was 89%.Conclusions: The newly established model can accurately predict the onset of sepsis in ICU patients in clinical settings as early as possible. Prospective studies are necessary to determine the clinical utility of the proposed sepsis prediction model.

Download Full-text

Predicting Solid Particle Erosion and Uncertainty in Elbows by Artificial Intelligence Methods

Volume 2: Fluid Mechanics; Multiphase Flows ◽

10.1115/fedsm2020-20458 ◽

2020 ◽

Author(s):

Soroor Karimi ◽

Bohan Xu ◽

Alireza Asgharpour ◽

Siamack A. Shirazi ◽

Sandip Sen

Keyword(s):

Machine Learning ◽

Random Forest ◽

Elastic Net ◽

Machine Learning Algorithms ◽

Superficial Velocity ◽

Training Data ◽

Support Vector ◽

Good Prediction ◽

Data Set ◽

Erosion Prediction

Abstract AI approaches include machine learning algorithms in which models are trained from existing data to predict the behavior of the system for previously unseen cases. Recent studies at the Erosion/Corrosion Research Center (E/CRC) have shown that these methods can be quite effective in predicting erosion. However, these methods are not widely used in the engineering industries due to the lack of work and information in this area. Moreover, in most of the available literature, the reported models and results have not been rigorously tested. This fact suggests that these models cannot be fully trusted for the applications for which they are trained. Therefore, in this study three machine learning models, including Elastic Net, Random Forest and Support Vector Machine (SVM), are utilized to increase the confidence in these tools. First, these models are trained with a training data set. Next, the model hyper-parameters are optimized by using nested cross validation. Finally, the results are verified with a test data set. This process is repeated several times to assure the accuracy of the results. In order to be able to predict the erosion under different conditions with these three models, six main variables are considered in the training data set. These variables include material hardness, pipe diameter, particle size, liquid viscosity, liquid superficial velocity, and gas superficial velocity. All three studied models show good prediction performances. The Random Forest and SVM approaches, however, show slightly better results compared to Elastic Net. The performance of these models is compared to both CFD erosion simulation results and also to Sand Production Pipe Saver (SPPS) results, a mechanistic erosion prediction software developed at the E/CRC. The comparison shows SVM prediction has a better match with both CFD and SPPS. The application of AI model to determine the uncertainty of calculated erosion is also discussed.

Download Full-text

Machine Learning Algorithms in Fraud Detection: Case Study on Retail Consumer Financing Company

Asia Pacific Fraud Journal ◽

10.21532/apfjournal.v6i2.216 ◽

2021 ◽

Vol 6 (2) ◽

pp. 213

Author(s):

Nadya Intan Mustika ◽

Bagus Nenda ◽

Dona Ramadhan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Historical Data ◽

Learning Algorithm ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Random Forest Algorithm ◽

Data Set

This study aims to implement a machine learning algorithm in detecting fraud based on historical data set in a retail consumer financing company. The outcome of machine learning is used as samples for the fraud detection team. Data analysis is performed through data processing, feature selection, hold-on methods, and accuracy testing. There are five machine learning methods applied in this study: Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Historical data are divided into two groups: training data and test data. The results show that the Random Forest algorithm has the highest accuracy with a training score of 0.994999 and a test score of 0.745437. This means that the Random Forest algorithm is the most accurate method for detecting fraud. Further research is suggested to add more predictor variables to increase the accuracy value and apply this method to different financial institutions and different industries.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Integrating Machine/Deep Learning Methods and Filtering Techniques for Reliable Mineral Phase Segmentation of 3D X-ray Computed Tomography Images

Energies ◽

10.3390/en14154595 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4595

Author(s):

Parisa Asadi ◽

Lauren E. Beckingham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ct Images ◽

Ct Imaging ◽

Learning Method ◽

Learning Methods ◽

X Ray ◽

Machine Learning Methods ◽

Filtering Techniques

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.

Download Full-text