Evaluating Data-Driven Techniques to Optimize Drilling on the Moon

Abstract Several companies and countries have announced plans to drill in the lunar South Pole in the next five years. The drilling process on the Moon or any other planetary body is similar to other exploration drilling by using rotary drills, for example the oil and gas drilling. However, the key performance indicators (KPIs) for this type of drilling are significantly different. This work aimed to develop the drilling optimization algorithms to optimize drilling on the Moon based on the experiences with the terrestrial drilling in related industries. A test drilling unit was designed and fabricated under a NASA Early Stage Innovation (ESI) grant; A high-frequency data acquisition system was used to record drilling responses at 1000 Hz. Parameters like weight on bit (WOB), torque, RPM, rate of penetration (ROP), mechanical specific energy (MSE), field penetration index (FPI), and the uniaxial compressive strength (UCS) were recorded for 40 boreholes in the analog formations. This work utilizes the large dataset comprising of more than 1 billion data points recorded while drilling into various lunar analogous formations and cryogenic lunar formations to optimize power consumption and bit wear during drilling operations. The dataset was processed to minimize the noise. The effect of drilling dysfunctions like auger choking and bit wear was also removed. Extensive feature engineering was performed to identify how each of the parameter affects power consumption and bit wear. The data was then used to train various regression algorithms based on the machine learning approaches like the random forest, gradient boosting, support vector machines, logistic regression, polynomial regression, and artificial neural network to evaluate the applicability of each of these approach in optimizing the power consumption using the control variables like RPM and penetration rate. The best performing algorithm based on ease of application, runtime, and accuracy of the algorithm was selected to provide recommendations for ROP and RPM which would result in minimum power consumption and bit wear for a specific bit design. Since the target location for most lunar expeditions is in permanently shadowed regions, the power available for a drilling operation is extremely limited. The bit wear will significantly affect the mission life too. Algorithms developed here would be vital in ensuring efficient and successful operations on the Moon leading to more robust exploration of the targeted lunar regions.

Download Full-text

Predicting the Tool Wear of a Drilling Process Using Novel Machine Learning XGBoost-SDA

Materials ◽

10.3390/ma13214952 ◽

2020 ◽

Vol 13 (21) ◽

pp. 4952

Author(s):

Mahdi S. Alajmi ◽

Abdullah M. Almeshal

Keyword(s):

Machine Learning ◽

Cast Iron ◽

Tool Wear ◽

Flank Wear ◽

Accurate Prediction ◽

Superior Performance ◽

Gradient Boosting ◽

Support Vector ◽

Drilling Process ◽

Extreme Gradient Boosting

Tool wear negatively impacts the quality of workpieces produced by the drilling process. Accurate prediction of tool wear enables the operator to maintain the machine at the required level of performance. This research presents a novel hybrid machine learning approach for predicting the tool wear in a drilling process. The proposed approach is based on optimizing the extreme gradient boosting algorithm’s hyperparameters by a spiral dynamic optimization algorithm (XGBoost-SDA). Simulations were carried out on copper and cast-iron datasets with a high degree of accuracy. Further comparative analyses were performed with support vector machines (SVM) and multilayer perceptron artificial neural networks (MLP-ANN), where XGBoost-SDA showed superior performance with regard to the method. Simulations revealed that XGBoost-SDA results in the accurate prediction of flank wear in the drilling process with mean absolute error (MAE) = 4.67%, MAE = 5.32%, and coefficient of determination R2 = 0.9973 for the copper workpiece. Similarly, for the cast iron workpiece, XGBoost-SDA resulted in surface roughness predictions with MAE = 5.25%, root mean square error (RMSE) = 6.49%, and R2 = 0.975, which closely agree with the measured values. Performance comparisons between SVM, MLP-ANN, and XGBoost-SDA show that XGBoost-SDA is an effective method that can ensure high predictive accuracy about flank wear values in a drilling process.

Download Full-text

A stacked generalization ensemble model for optimization and prediction of the gas well rate of penetration: a case study in Xinjiang

Journal of Petroleum Exploration and Production Technology ◽

10.1007/s13202-021-01402-z ◽

2021 ◽

Author(s):

Naipeng Liu ◽

Hui Gao ◽

Zhen Zhao ◽

Yule Hu ◽

Longchen Duan

Keyword(s):

Pearson Correlation ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Model ◽

Rate Of Penetration ◽

Gas Drilling ◽

Light Gradient ◽

Stacked Generalization ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting

AbstractIn gas drilling operations, the rate of penetration (ROP) parameter has an important influence on drilling costs. Prediction of ROP can optimize the drilling operational parameters and reduce its overall cost. To predict ROP with satisfactory precision, a stacked generalization ensemble model is developed in this paper. Drilling data were collected from a shale gas survey well in Xinjiang, northwestern China. First, Pearson correlation analysis is used for feature selection. Then, a Savitzky-Golay smoothing filter is used to reduce noise in the dataset. In the next stage, we propose a stacked generalization ensemble model that combines six machine learning models: support vector regression (SVR), extremely randomized trees (ET), random forest (RF), gradient boosting machine (GB), light gradient boosting machine (LightGBM) and extreme gradient boosting (XGB). The stacked model generates meta-data from the five models (SVR, ET, RF, GB, LightGBM) to compute ROP predictions using an XGB model. Then, the leave-one-out method is used to verify modeling performance. The performance of the stacked model is better than each single model, with R2 = 0.9568 and root mean square error = 0.4853 m/h achieved on the testing dataset. Hence, the proposed approach will be useful in optimizing gas drilling. Finally, the particle swarm optimization (PSO) algorithm is used to optimize the relevant ROP parameters.

Download Full-text

Machine learning methods for predicting postpartum depression: A scoping review (Preprint)

10.2196/preprints.29765 ◽

2021 ◽

Author(s):

Kiran Saqib ◽

Amber Fozia Khan ◽

Zahid Ahmad Butt

Keyword(s):

Machine Learning ◽

Big Data ◽

Postpartum Depression ◽

Scoping Review ◽

Early Stage ◽

Maternal Mental Health ◽

Gradient Boosting ◽

Support Vector ◽

Study Results ◽

Extreme Gradient Boosting

BACKGROUND Machine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely given the rapid technological developments in recent years. OBJECTIVE This paper aims to synthesize the literature on machine learning and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD). METHODS A scoping review methodology using the Arksey and O’Malley framework was employed to rapidly map the research activity in the field of ML for predicting PPD. A literature search was conducted through health and IT research databases, including PsycInfo, PubMed, IEEE Xplore and the ACM Digital Library from Sep 2020 till Jan 2021. Data were extracted on the article’s ML model, data type, and study results. RESULTS A total of fourteen (14) studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine (SVM) and random forests (RF) were the most commonly employed algorithms in addition to naïve Bayes, regression, artificial neural network, decision trees and extreme gradient boosting. There was considerable heterogeneity in the best performing ML algorithm across selected studies. The area under the receiver-operating-characteristic curve (AUC) values reported for different algorithms were SVM (Range: 0.78-0.86); RF method (0.88); extreme gradient boosting (0.80); logistic regression (0.93); and extreme gradient boosting (0.71) respectively. CONCLUSIONS ML algorithms are capable of analyzing larger datasets and performing more advanced computations, that can significantly improve the detection of PPD at an early stage. Further clinical-research collaborations are required to fine-tune ML algorithms for prediction and treatments. ML might become part of evidence-based practice, in addition to clinical knowledge and existing research evidence.

Download Full-text

Machine learning methods for predicting postpartum depression: A scoping review (Preprint)

10.2196/preprints.29838 ◽

2021 ◽

Author(s):

Kiran Saqib ◽

Amber Fozia Khan ◽

Zahid Ahmad Butt

Keyword(s):

Machine Learning ◽

Big Data ◽

Postpartum Depression ◽

Scoping Review ◽

Early Stage ◽

Maternal Mental Health ◽

Gradient Boosting ◽

Support Vector ◽

Study Results ◽

Extreme Gradient Boosting

BACKGROUND Machine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely given the rapid technological developments in recent years. OBJECTIVE This paper aims to synthesize the literature on machine learning and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD). METHODS A scoping review methodology using the Arksey and O’Malley framework was employed to rapidly map the research activity in the field of ML for predicting PPD. Two independent researchers searched PsycInfo, PubMed, IEEE Xplore and the ACM Digital Library in September 2020 to identify relevant publications in the past 12 years. Data were extracted on the article’s ML model, data type, and study results. RESULTS A total of fourteen (14) studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine (SVM) and random forests (RF) were the most commonly employed algorithms in addition to naïve Bayes, regression, artificial neural network, decision trees and extreme gradient boosting. There was considerable heterogeneity in the best performing ML algorithm across selected studies. The area under the receiver-operating-characteristic curve (AUC) values reported for different algorithms were SVM (Range: 0.78-0.86); RF method (0.88); extreme gradient boosting (0.80); logistic regression (0.93); and extreme gradient boosting (0.71) respectively. CONCLUSIONS ML algorithms are capable of analyzing larger datasets and performing more advanced computations, that can significantly improve the detection of PPD at an early stage. Further clinical-research collaborations are required to fine-tune ML algorithms for prediction and treatments. ML might become part of evidence-based practice, in addition to clinical knowledge and existing research evidence.

Download Full-text

Machine Learning Models for COVID-19 Detection in Brazil Based on Symptoms (Preprint)

10.2196/preprints.27293 ◽

2021 ◽

Author(s):

Íris Viana dos Santos Santana ◽

Andressa C. M. da Silveira ◽

Álvaro Sobrinho ◽

Lenardo Chaves e Silva ◽

Leandro Dias da Silva ◽

...

Keyword(s):

Machine Learning ◽

Early Stage ◽

Area Under The Curve ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Accuracy Score ◽

K Nearest Neighbors ◽

Runny Nose ◽

Extreme Gradient Boosting

BACKGROUND controlling the COVID-19 outbreak in Brazil is considered a challenge of continental proportions due to the high population and urban density, weak implementation and maintenance of social distancing strategies, and limited testing capabilities. OBJECTIVE to contribute to addressing such a challenge, we present the implementation and evaluation of supervised Machine Learning (ML) models to assist the COVID-19 detection in Brazil based on early-stage symptoms. METHODS firstly, we conducted data preprocessing and applied the Chi-squared test in a Brazilian dataset, mainly composed of early-stage symptoms, to perform statistical analyses. Afterward, we implemented ML models using the Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) algorithms. We evaluated the ML models using precision, accuracy score, recall, the area under the curve, and the Friedman and Nemenyi tests. Based on the comparison, we grouped the top five ML models and measured feature importance. RESULTS the MLP model presented the highest mean accuracy score, with more than 97.85%, when compared to GBM (> 97.39%), RF (> 97.36%), DT (> 97.07%), XGBoost (> 97.06%), KNN (> 95.14%), and SVM (> 94.27%). Based on the statistical comparison, we grouped MLP, GBM, DT, RF, and XGBoost, as the top five ML models, because the evaluation results are statistically indistinguishable. The ML models` importance of features used during predictions varies from gender, profession, fever, sore throat, dyspnea, olfactory disorder, cough, runny nose, taste disorder, and headache. CONCLUSIONS supervised ML models effectively assist the decision making in medical diagnosis and public administration (e.g., testing strategies), based on early-stage symptoms that do not require advanced and expensive exams.

Download Full-text

Aprendizado de máquina para classificação da desocupação de leitos pós-cirúrgicos: Aprendizado de máquina para classificação da desocupação de leitos pós-cirúrgicos

10.31414/em.2020.d.131235 ◽

2020 ◽

Author(s):

◽

A. G. Teramachi

Keyword(s):

Random Forest ◽

Length Of Stay ◽

Congenital Heart ◽

Early Stage ◽

Heart Diseases ◽

Poor Quality ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Heart Patients

Congenital heart diseases are among the most common congenital anomalies and if they aren’t discovered and treated properly at na early stage, babies and children can have a poor quality of life and may die over time. In many cases, surgical intervention is necessary before the first year of life and when it occurs, it is importante to estimate the length of stay in post-surgical beds, both for capacity management, planning and optimization of resources by the hospital and to guide patients and their families. The present study aims to propose two models, through the use of Machine Learning algorithms, one to classify the length of stay in post-surgical ICU beds and the other to classify the length of stay in post-surgical ward beds, since research related to the length of stay in postsurgical ward beds is rare. The data used to train the algoritgms are regarding cardiac surgeries performed on congenital heart patients extracted from the ASSIST, private database of the Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (InCor - FMUSP). The trained algorithms were: Random Forest, Extra Trees, Gradient Boosting, Adaboost, Support Vector Machine and the Multilayer Perceptron neural network trained with the Backpropagation algorithm. The model that presented the best performance to classify the length of stay in ICU beds was the Random Forest and to classify the length of stay of ward beds was the Gradient Boosting

Download Full-text

iPMI: Machine Learning-Aided Identification of Parametrial Invasion in Women with Early-Stage Cervical Cancer

Diagnostics ◽

10.3390/diagnostics11081454 ◽

2021 ◽

Vol 11 (8) ◽

pp. 1454

Author(s):

Phasit Charoenkwan ◽

Watshara Shoombuatong ◽

Chalaithorn Nantasupha ◽

Tanarat Muangmool ◽

Prapaporn Suprasert ◽

...

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Early Stage ◽

Superior Performance ◽

Gradient Boosting ◽

Support Vector ◽

Efficient System ◽

Extreme Gradient Boosting ◽

Independent Test ◽

Early Stage Cervical Cancer

Radical hysterectomy is a recommended treatment for early-stage cervical cancer. However, the procedure is associated with significant morbidities resulting from the removal of the parametrium. Parametrial cancer invasion (PMI) is found in a minority of patients but the efficient system used to predict it is lacking. In this study, we develop a novel machine learning (ML)-based predictive model based on a random forest model (called iPMI) for the practical identification of PMI in women. Data of 1112 stage IA-IIA cervical cancer patients who underwent primary surgery were collected and considered as the training dataset, while data from an independent cohort of 116 consecutive patients were used as the independent test dataset. Based on these datasets, iPMI-Econ was then developed by using basic clinicopathological data available prior to surgery, while iPMI-Power was also introduced by adding pelvic node metastasis and uterine corpus invasion to the iPMI-Econ. Both 10-fold cross-validations and independent test results showed that iPMI-Power outperformed other well-known ML classifiers (e.g., logistic regression, decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes, support vector machine, and extreme gradient boosting). Upon comparison, it was found that iPMI-Power was effective and had a superior performance to other well-known ML classifiers in predicting PMI. It is anticipated that the proposed iPMI may serve as a cost-effective and rapid approach to guide important clinical decision-making.

Download Full-text

Machine-Learning-Based Prediction of Corrosion Behavior in Additively Manufactured Inconel 718

Data ◽

10.3390/data6080080 ◽

2021 ◽

Vol 6 (8) ◽

pp. 80

Author(s):

O. V. Mythreyi ◽

M. Rohith Srinivaas ◽

Tigga Amit Kumar ◽

R. Jayaganthan

Keyword(s):

Machine Learning ◽

Corrosion Behavior ◽

Inconel 718 ◽

Polynomial Regression ◽

Research Work ◽

Model Performance ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

This research work focuses on machine-learning-assisted prediction of the corrosion behavior of laser-powder-bed-fused (LPBF) and postprocessed Inconel 718. Corrosion testing data of these specimens were collected and fit into the following machine learning algorithms: polynomial regression, support vector regression, decision tree, and extreme gradient boosting. The model performance, after hyperparameter optimization, was evaluated using a set of established metrics: R2, mean absolute error, and root mean square error. Among the algorithms, the extreme gradient boosting algorithm performed best in predicting the corrosion behavior, closely followed by other algorithms. Feature importance analysis was executed in order to determine the postprocessing parameters that influenced the most the corrosion behavior in Inconel 718 manufactured by LPBF.

Download Full-text

Investigating the use of random forest, gradient boosting machine, support vector machine and their ensemble applied to fault detection

10.26678/abcm.cobem2017.cob17-1600 ◽

2017 ◽

Author(s):

Luis Felipe Nogoseke ◽

Gabriel Herman Bernardim Andrade ◽

Marco Boaretto ◽

Leandro Coelho

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Fault Detection ◽

Gradient Boosting ◽

Support Vector ◽

Gradient Boosting Machine

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text