Development of an Optimal Model For Rate of Penetration Rop Using Deep Neural Networks DNN.

Mapping Intimacies ◽

10.2118/207161-ms ◽

2021 ◽

Author(s):

_ _

Keyword(s):

Machine Learning ◽

Empirical Model ◽

Empirical Models ◽

The Other ◽

Gradient Boosting ◽

Past Century ◽

Learning Models ◽

Continuous Increase ◽

Unseen Data ◽

Machine Learning Models

Abstract For the past century, optimization of drilling has caught the eyes of many researchers. The main areas center on ROP, fluid treatment, and bit selection. They all share the same goal of maximizing ROP and reducing NPT. In other to develop an optimal control system, ROP must be predicted accurately, unfortunately, it is a complex parameter that is affected by multiple drilling parameters, rock properties, fluid properties, and bit selection. Models used for prediction have developed from empirical models like Bourgoyne and Young's to more intelligent models such as SVM and ANN. With the continuous increase in data obtained from sensors while drilling, there is still much work to be done in this field. In this research, the improvement of an empirical model and the development of an intelligent model are presented. The Bourgoyne and Young's model uses multiple linear regression to estimate coefficients which it then inserts into an empirical formula to predict ROP. This model was modified using non-linear curve-fitting to estimate the coefficients and make it reduce bias to generalize better. Machine learning models such as Gradient Boosting, Random Forest, ANN, and DNN were used in the development of a predictive model for the ROP. These models were easier to develop compared to the empirical model since they rely more on data rather than statistical formulas. The data used in this research include drilling data from 3 wells drilled in 2 fields within the Niger Delta region in Nigeria. The models were developed and trained on one of the wells, while the remaining two were used for testing the performance of the models. The modified empirical model improved the efficiency of the base model by 14% during validation but performs poorly on unseen data from the other two wells. The Machine learning models outperform the empirical models and perform accurately on unseen data from the other wells. DNN was the best performing model achieving an average accuracy of 0.987 for the 3 wells.

Download Full-text

Machine learning models for the estimation of monthly mean daily reference evapotranspiration based on cross-station and synthetic data

Hydrology Research ◽

10.2166/nh.2019.060 ◽

2019 ◽

Vol 50 (6) ◽

pp. 1730-1750 ◽

Cited By ~ 6

Author(s):

Lifeng Wu ◽

Youwen Peng ◽

Junliang Fan ◽

Yicheng Wang

Keyword(s):

Machine Learning ◽

Reference Evapotranspiration ◽

Irrigation Scheduling ◽

Temperature Data ◽

The Other ◽

Estimation Accuracy ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Abstract The estimation of reference evapotranspiration (ET0) is important in hydrology research, irrigation scheduling design and water resources management. This study explored the capability of eight machine learning models, i.e., Artificial Neuron Network (ANN), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Multivariate Adaptive Regression Spline (MARS), Support Vector Machine (SVM), Extreme Learning Machine and a novel Kernel-based Nonlinear Extension of Arps Decline (KNEA) Model, for modeling monthly mean daily ET0 using only temperature data from local or cross stations. These machine learning models were also compared with the temperature-based Hargreaves–Samani equation. The results indicated that the estimation accuracy of these machine learning models differed in various scenarios. The tree-based models (RF, GBDT and XGBoost) exhibited higher estimation accuracy than the other models in the local application. When the station has only temperature data, the MARS and SVM models were slightly superior to the other models, while the ANN and HS models performed worse than the others. When there was no temperature data at the target station and the data from adjacent stations were used instead, MARS, SVM and KNEA were the suitable models. The results can provide a solution for ET0 estimation in the absence of complete meteorological data.

Download Full-text

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data

Proceedings of the Workshop on Human-In-the-Loop Data Analytics - HILDA'19 ◽

10.1145/3328519.3329126 ◽

2019 ◽

Author(s):

Sergey Redyuk ◽

Sebastian Schelter ◽

Tammo Rukat ◽

Volker Markl ◽

Felix Biessmann

Keyword(s):

Machine Learning ◽

Black Box ◽

Learning Models ◽

Unseen Data ◽

Machine Learning Models

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Prediction of Head Movement in 360-Degree Videos Using Attention Model

Sensors ◽

10.3390/s21113678 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3678

Author(s):

Dongwon Lee ◽

Minji Choi ◽

Joohyun Lee

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Moving Average ◽

The Other ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Attention Model ◽

Long Short Term Memory ◽

Machine Learning Models

In this paper, we propose a prediction algorithm, the combination of Long Short-Term Memory (LSTM) and attention model, based on machine learning models to predict the vision coordinates when watching 360-degree videos in a Virtual Reality (VR) or Augmented Reality (AR) system. Predicting the vision coordinates while video streaming is important when the network condition is degraded. However, the traditional prediction models such as Moving Average (MA) and Autoregression Moving Average (ARMA) are linear so they cannot consider the nonlinear relationship. Therefore, machine learning models based on deep learning are recently used for nonlinear predictions. We use the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural network methods, originated in Recurrent Neural Networks (RNN), and predict the head position in the 360-degree videos. Therefore, we adopt the attention model to LSTM to make more accurate results. We also compare the performance of the proposed model with the other machine learning models such as Multi-Layer Perceptron (MLP) and RNN using the root mean squared error (RMSE) of predicted and real coordinates. We demonstrate that our model can predict the vision coordinates more accurately than the other models in various videos.

Download Full-text

O-203 Application of machine learning to predict aneuploidy and mosaicism in embryos from in vitro fertilization (IVF) cycles

Human Reproduction ◽

10.1093/humrep/deab128.014 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

J A Ortiz ◽

R Morales ◽

B Lledo ◽

E Garcia-Hernandez ◽

A Cascales ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Predictive Models ◽

Maternal Age ◽

The Other ◽

Predictor Variables ◽

Learning Models ◽

Male Factor ◽

Factors Associated ◽

Machine Learning Models

Abstract Study question Is it possible to predict the likelihood of an IVF embryo being aneuploid and/or mosaic using a machine learning algorithm? Summary answer There are paternal, maternal, embryonic and IVF-cycle factors that are associated with embryonic chromosomal status that can be used as predictors in machine learning models. What is known already The factors associated with embryonic aneuploidy have been extensively studied. Mostly maternal age and to a lesser extent male factor and ovarian stimulation have been related to the occurrence of chromosomal alterations in the embryo. On the other hand, the main factors that may increase the incidence of embryo mosaicism have not yet been established. The models obtained using classical statistical methods to predict embryonic aneuploidy and mosaicism are not of high reliability. As an alternative to traditional methods, different machine and deep learning algorithms are being used to generate predictive models in different areas of medicine, including human reproduction. Study design, size, duration The study design is observational and retrospective. A total of 4654 embryos from 1558 PGT-A cycles were included (January-2017 to December-2020). The trophoectoderm biopsies on D5, D6 or D7 blastocysts were analysed by NGS. Embryos with ≤25% aneuploid cells were considered euploid, between 25-50% were classified as mosaic and aneuploid with >50%. The variables of the PGT-A were recorded in a database from which predictive models of embryonic aneuploidy and mosaicism were developed. Participants/materials, setting, methods The main indications for PGT-A were advanced maternal age, abnormal sperm FISH and recurrent miscarriage or implantation failure. Embryo analysis were performed using Veriseq-NGS (Illumina). The software used to carry out all the analysis was R (RStudio). The library used to implement the different algorithms was caret. In the machine learning models, 22 predictor variables were introduced, which can be classified into 4 categories: maternal, paternal, embryonic and those specific to the IVF cycle. Main results and the role of chance The different couple, embryo and stimulation cycle variables were recorded in a database (22 predictor variables). Two different predictive models were performed, one for aneuploidy and the other for mosaicism. The predictor variable was of multi-class type since it included the segmental and whole chromosome alteration categories. The dataframe were first preprocessed and the different classes to be predicted were balanced. A 80% of the data were used for training the model and 20% were reserved for further testing. The classification algorithms applied include multinomial regression, neural networks, support vector machines, neighborhood-based methods, classification trees, gradient boosting, ensemble methods, Bayesian and discriminant analysis-based methods. The algorithms were optimized by minimizing the Log_Loss that measures accuracy but penalizing misclassifications. The best predictive models were achieved with the XG-Boost and random forest algorithms. The AUC of the predictive model for aneuploidy was 80.8% (Log_Loss 1.028) and for mosaicism 84.1% (Log_Loss: 0.929). The best predictor variables of the models were maternal age, embryo quality, day of biopsy and whether or not the couple had a history of pregnancies with chromosomopathies. The male factor only played a relevant role in the mosaicism model but not in the aneuploidy model. Limitations, reasons for caution Although the predictive models obtained can be very useful to know the probabilities of achieving euploid embryos in an IVF cycle, increasing the sample size and including additional variables could improve the models and thus increase their predictive capacity. Wider implications of the findings Machine learning can be a very useful tool in reproductive medicine since it can allow the determination of factors associated with embryonic aneuploidies and mosaicism in order to establish a predictive model for both. To identify couples at risk of embryo aneuploidy/mosaicism could benefit them of the use of PGT-A. Trial registration number Not Applicable

Download Full-text

Predicting Electric Vehicle Charging Station Availability Using Ensemble Machine Learning

Energies ◽

10.3390/en14237834 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7834

Author(s):

Christopher Hecht ◽

Jan Figgener ◽

Dirk Uwe Sauer

Keyword(s):

Machine Learning ◽

Binary Data ◽

Training Data ◽

Gradient Boosting ◽

Traffic Density ◽

Learning Models ◽

Charging Infrastructure ◽

Ensemble Models ◽

Charging Station ◽

Machine Learning Models

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.

Download Full-text

A machine learning approach to inform developmental milestone achievement for children with autism (Preprint)

10.2196/preprints.29242 ◽

2021 ◽

Author(s):

Munirul M. Haque ◽

Masud Rabbani ◽

Dipranjan Das Dipal ◽

Md Ishrak Islam Zarif ◽

Anik Iqbal ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Children With Autism ◽

Autism Spectrum ◽

The Other ◽

Supervised Machine Learning ◽

Learning Models ◽

Children With Asd ◽

Socio Demographic Factors ◽

Machine Learning Models

BACKGROUND Care for children with autism spectrum disorder (ASD) can be challenging for families and medical care systems. This is especially true in Low-and-Middle-Income-countries (LMIC) like Bangladesh. To improve family-practitioner communication and developmental monitoring of children with ASD, [spell out] (mCARE) was developed. Within this study, mCARE was used to track child milestone achievement and family socio-demographic assets to inform mCARE feasibility/scalability and family-asset informed practitioner recommendations. OBJECTIVE The objectives of this paper are three-fold. First, document how mCARE can be used to monitor child milestone achievement. Second, demonstrate how advanced machine learning models can inform our understanding of milestone achievement in children with ASD. Third, describe family/child socio-demographic factors that are associated with earlier milestone achievement in children with ASD (across five machine learning models). METHODS Using mCARE collected data, this study assessed milestone achievement in 300 children with ASD from Bangladesh. In this study, we used four supervised machine learning (ML) algorithms (Decision Tree, Logistic Regression, k-Nearest Neighbors, Artificial Neural Network) and one unsupervised machine learning (K-means Clustering) to build models of milestone achievement based on family/child socio-demographic details. For analyses, the sample was randomly divided in half to train the ML models and then their accuracy was estimated based on the other half of the sample. Each model was specified for the following milestones: Brushes teeth, Asks to use the toilet, Urinates in the toilet or potty, and Buttons large buttons. RESULTS This study aimed to find a suitable machine learning algorithm for milestone prediction/achievement for children with ASD using family/child socio-demographic characteristics. For, Brushes teeth, the three supervised machine learning models met or exceeded an accuracy of 95% with Logistic Regression, KNN, and ANN as the most robust socio-demographic predictors. For Asks to use toilet, 84.00% accuracy was achieved with the KNN and ANN models. For these models, the family socio-demographic predictors of “family expenditure” and “parents’ age” accounted for most of the model variability. The last two parameters, Urinates in toilet or potty and Buttons large buttons had an accuracy of 91.00% and 76.00%, respectively, in ANN. Overall, the ANN had a higher accuracy (Above ~80% on average) among the other algorithms for all the parameters. Across the models and milestones, “family expenditure”, “family size/ type”, “living places” and “parent’s age and occupation” were the most influential family/child socio-demographic factors. CONCLUSIONS mCARE was successfully deployed in an LMIC (i.e., Bangladesh), allowing parents and care-practitioners a mechanism to share detailed information on child milestones achievement. Using advanced modeling techniques this study demonstrates how family/child socio-demographic elements can inform child milestone achievement. Specifically, families with fewer socio-demographic resources reported later milestone attainment. Developmental science theories highlight how family/systems can directly influence child development and this study provides a clear link between family resources and child developmental progress. Clinical implications for this work could include supporting the larger family system to improve child milestone achievement. CLINICALTRIAL We took the IRB from Marquette University Institutional Review Board on July 9, 2020, with the protocol number HR-1803022959, and titled “MOBILE-BASED CARE FOR CHILDREN WITH AUTISM SPECTRUM DISORDER USING REMOTE EXPERIENCE SAMPLING METHOD (MCARE)” for recruiting a total of 316 subjects, of which we recruited 300. (Details description of participants in Methods section)

Download Full-text

Estimation of Chlorophyll-a Concentrations in Small Water Bodies: Comparison of Fused Gaofen-6 and Sentinel-2 Sensors

Remote Sensing ◽

10.3390/rs14010229 ◽

2022 ◽

Vol 14 (1) ◽

pp. 229

Author(s):

Jiarui Shi ◽

Qian Shen ◽

Yue Yao ◽

Junsheng Li ◽

Fu Chen ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll A ◽

Water Bodies ◽

Gradient Boosting ◽

Learning Models ◽

Small Water ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Sentinel 2 ◽

Small Water Bodies

Chlorophyll-a concentrations in water bodies are one of the most important environmental evaluation indicators in monitoring the water environment. Small water bodies include headwater streams, springs, ditches, flushes, small lakes, and ponds, which represent important freshwater resources. However, the relatively narrow and fragmented nature of small water bodies makes it difficult to monitor chlorophyll-a via medium-resolution remote sensing. In the present study, we first fused Gaofen-6 (a new Chinese satellite) images to obtain 2 m resolution images with 8 bands, which was approved as a good data source for Chlorophyll-a monitoring in small water bodies as Sentinel-2. Further, we compared five semi-empirical and four machine learning models to estimate chlorophyll-a concentrations via simulated reflectance using fused Gaofen-6 and Sentinel-2 spectral response function. The results showed that the extreme gradient boosting tree model (one of the machine learning models) is the most accurate. The mean relative error (MRE) was 9.03%, and the root-mean-square error (RMSE) was 4.5 mg/m3 for the Sentinel-2 sensor, while for the fused Gaofen-6 image, MRE was 6.73%, and RMSE was 3.26 mg/m3. Thus, both fused Gaofen-6 and Sentinel-2 could estimate the chlorophyll-a concentrations in small water bodies. Since the fused Gaofen-6 exhibited a higher spatial resolution and Sentinel-2 exhibited a higher temporal resolution.

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v2 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text

Intra-domain and cross-domain transfer learning for time series

10.5194/egusphere-egu21-12142 ◽

2021 ◽

Author(s):

Erik Otović ◽

Marko Njirjak ◽

Dario Jozinović ◽

Goran Mauša ◽

Alberto Michelini ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Transfer Learning ◽

Time Series Data ◽

The Other ◽

Series Data ◽

Sound Recognition ◽

Transfer Of Knowledge ◽

Learning Models ◽

Machine Learning Models

In this study, we compared the performance of machine learning models trained using transfer learning and those that were trained from scratch - on time series data. Four machine learning models were used for the experiment. Two models were taken from the field of seismology, and the other two are general-purpose models for working with time series data. The accuracy of selected models was systematically observed and analyzed when switching within the same domain of application (seismology), as well as between mutually different domains of application (seismology, speech, medicine, finance). In seismology, we used two databases of local earthquakes (one in counts, and the other with the instrument response removed) and a database of global earthquakes for predicting earthquake magnitude; other datasets targeted classifying spoken words (speech), predicting stock prices (finance) and classifying muscle movement from EMG signals (medicine). In practice, it is very demanding and sometimes impossible to collect datasets of tagged data large enough to successfully train a machine learning model. Therefore, in our experiment, we use reduced data sets of 1,500 and 9,000 data instances to mimic such conditions. Using the same scaled-down datasets, we trained two sets of machine learning models: those that used transfer learning for training and those that were trained from scratch. We compared the performances between pairs of models in order to draw conclusions about the utility of transfer learning. In order to confirm the validity of the obtained results, we repeated the experiments several times and applied statistical tests to confirm the significance of the results. The study shows when, within the set experimental framework, the transfer of knowledge brought improvements in terms of model accuracy and in terms of model convergence rate. Our results show that it is possible to achieve better performance and faster convergence by transferring knowledge from the domain of global earthquakes to the domain of local earthquakes; sometimes also vice versa. However, improvements in seismology can sometimes also be achieved by transferring knowledge from medical and audio domains. The results show that the transfer of knowledge between other domains brought even more significant improvements, compared to those within the field of seismology. For example, it has been shown that models in the field of sound recognition have achieved much better performance compared to classical models and that the domain of sound recognition is very compatible with knowledge from other domains. We came to similar conclusions for the domains of medicine and finance. Ultimately, the paper offers suggestions when transfer learning is useful, and the explanations offered can provide a good starting point for knowledge transfer using time series data.

Download Full-text