Synthetic Sonic Log Generation With Machine Learning: A Contest Summary From Five Methods

Author(s):  
Yanxiang Yu ◽  
◽  
Chicheng Xu ◽  
Siddharth Misra ◽  
Weichang Li ◽  
...  

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.

Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2021 ◽  
Author(s):  
Sophie Goliber ◽  
Taryn Black ◽  
Ginny Catania ◽  
James M. Lea ◽  
Helene Olsen ◽  
...  

Abstract. Marine-terminating outlet glacier terminus traces, mapped from satellite and aerial imagery, have been used extensively in understanding how outlet glaciers adjust to climate change variability over a range of time scales. Numerous studies have digitized termini manually, but this process is labor-intensive, and no consistent approach exists. A lack of coordination leads to duplication of efforts, particularly for Greenland, which is a major scientific research focus. At the same time, machine learning techniques are rapidly making progress in their ability to automate accurate extraction of glacier termini, with promising developments across a number of optical and SAR satellite sensors. These techniques rely on high quality, manually digitized terminus traces to be used as training data for robust automatic traces. Here we present a database of manually digitized terminus traces for machine learning and scientific applications. These data have been collected, cleaned, assigned with appropriate metadata including image scenes, and compiled so they can be easily accessed by scientists. The TermPicks data set includes 39,060 individual terminus traces for 278 glaciers with a mean and median number of traces per glacier of 136 ± 190 and 93, respectively. Across all glaciers, 32,567 dates have been picked, of which 4,467 have traces from more than one author (duplication of 14 %). We find a median error of ∼100 m among manually-traced termini. Most traces are obtained after 1999, when Landsat 7 was launched. We also provide an overview of an updated version of The Google Earth Engine Digitization Tool (GEEDiT), which has been developed specifically for future manual picking of the Greenland Ice Sheet.


2020 ◽  
pp. 609-623
Author(s):  
Arun Kumar Beerala ◽  
Gobinath R. ◽  
Shyamala G. ◽  
Siribommala Manvitha

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.


2021 ◽  
Vol 8 (1) ◽  
pp. 28
Author(s):  
S. L. Ávila ◽  
H. M. Schaberle ◽  
S. Youssef ◽  
F. S. Pacheco ◽  
C. A. Penz

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.


Author(s):  
Jonathan M. Gumley ◽  
Hayden Marcollo ◽  
Stuart Wales ◽  
Andrew E. Potts ◽  
Christopher J. Carra

Abstract There is growing importance in the offshore floating production sector to develop reliable and robust means of continuously monitoring the integrity of mooring systems for FPSOs and FPUs, particularly in light of the upcoming introduction of API-RP-2MIM. Here, the limitations of the current range of monitoring techniques are discussed, including well established technologies such as load cells, sonar, or visual inspection, within the context of the growing mainstream acceptance of data science and machine learning. Due to the large fleet of floating production platforms currently in service, there is a need for a readily deployable solution that can be retrofitted to existing platforms to passively monitor the performance of floating assets on their moorings, for which machine learning based systems have particular advantages. An earlier investigation conducted in 2016 on a shallow water, single point moored FPSO employed host facility data from in-service field measurements before and after a single mooring line failure event. This paper presents how the same machine learning techniques were applied to a deep water, semi taut, spread moored system where there was no host facility data available, therefore requiring a calibrated hydrodynamic numerical model to be used as the basis for the training data set. The machine learning techniques applied to both real and synthetically generated data were successful in replicating the response of the original system, even with the latter subjected to different variations of artificial noise. Furthermore, utilizing a probability-based approach, it was demonstrated that replicating the response of the underlying system was a powerful technique for predicting changes in the mooring system.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1553
Author(s):  
Dharmendra Sharma ◽  
Pavel Davidson ◽  
Philipp Müller ◽  
Robert Piché

Vertical ground reaction force (vGRF) can be measured by force plates or instrumented treadmills, but their application is limited to indoor environments. Insoles remove this restriction but suffer from low durability (several hundred hours). Therefore, interest in the indirect estimation of vGRF using inertial measurement units and machine learning techniques has increased. This paper presents a methodology for indirectly estimating vGRF and other features used in gait analysis from measurements of a wearable GPS-aided inertial navigation system (INS/GPS) device. A set of 27 features was extracted from the INS/GPS data. Feature analysis showed that six of these features suffice to provide precise estimates of 11 different gait parameters. Bagged ensembles of regression trees were then trained and used for predicting gait parameters for a dataset from the test subject from whom the training data were collected and for a dataset from a subject for whom no training data were available. The prediction accuracies for the latter were significantly worse than for the first subject but still sufficiently good. K-nearest neighbor (KNN) and long short-term memory (LSTM) neural networks were then used for predicting vGRF and ground contact times. The KNN yielded a lower normalized root mean square error than the neural network for vGRF predictions but cannot detect new patterns in force curves.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Burak Cankaya ◽  
Berna Eren Tokgoz ◽  
Ali Dag ◽  
K.C. Santosh

Purpose This paper aims to propose a machine learning-based automatic labeling methodology for chemical tanker activities that can be applied to any port with any number of active tankers and the identification of important predictors. The methodology can be applied to any type of activity tracking that is based on automatically generated geospatial data. Design/methodology/approach The proposed methodology uses three machine learning algorithms (artificial neural networks, support vector machines (SVMs) and random forest) along with information fusion (IF)-based sensitivity analysis to classify chemical tanker activities. The data set is split into training and test data based on vessels, with two vessels in the training data and one in the test data set. Important predictors were identified using a receiver operating characteristic comparative approach, and overall variable importance was calculated using IF from the top models. Findings Results show that an SVM model has the best balance between sensitivity and specificity, at 93.5% and 91.4%, respectively. Speed, acceleration and change in the course on the ground for the vessels are identified as the most important predictors for classifying vessel activity. Research limitations/implications The study evaluates the vessel movements waiting between different terminals in the same port, but not their movements between different ports for their tank-cleaning activities. Practical implications The findings in this study can be used by port authorities, shipping companies, vessel operators and other stakeholders for decision support, performance tracking, as well as for automated alerts. Originality/value This analysis makes original contributions to the existing literature by defining and demonstrating a methodology that can automatically label vehicle activity based on location data and identify certain characteristics of the activity by finding important location-based predictors that effectively classify the activity status.


2020 ◽  
Author(s):  
Derek McKay ◽  
Andreas Kvammen

Abstract. The machine learning research community has focused greatly on bias in algorithms and have identified different manifestations of it. Bias in the training samples is recognised as a potential source of prejudice in machine learning. It can be introduced by human experts who define the training sets. As machine learning techniques are being applied to auroral classification, it is important to identify and address potential sources of expert-injected bias. In an ongoing study, 13 947 auroral images were manually classified with significant differences between classifications. This large data set allowed identification of some of these biases, especially those originating as a result of the ergonomics of the classification process. These findings are presented in this paper, to serve as a checklist for improving training data integrity, not just for expert classifications, but also for crowd-sourced, citizen science projects. As the application of machine learning techniques to auroral research is relatively new, it is important that biases are identified and addressed before they become endemic in the corpus of training data.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
R Hariharan ◽  
P He ◽  
C Hickman ◽  
J Chambost ◽  
C Jacques ◽  
...  

Abstract Study question Is a pre-trained machine learning algorithm able to accurately detect cellular arrangement in 4-cell embryos from a different continent? Summary answer Artificial Intelligence (AI) analysis of 4-cell embryo classification is transferable across clinics globally with 79% accuracy. What is known already Previous studies observing four-cell human embryo configurations have demonstrated that non-tetrahedral embryos (embryos in which cells make contact with fewer than 3 other cells) are associated with compromised blastulation and implantation potential. Previous research by this study group has indicated the efficacy of AI models in classification of tetrahedral and non-tetrahedral embryos with 87% accuracy, with a database comprising 2 clinics both from the same country (Brazil). This study aims to evaluate the transferability and robustness of this model on blind test data from a different country (France). Study design, size, duration The study was a retrospective cohort analysis in which 909 4-cell embryo images (“tetrahedral”, n = 749; “non-tetrahedral”, n = 160) were collected from 3 clinics (2 Brazilian, 1 French). All embryos were captured at the central focal plane using Embryoscope™ time-lapse incubators. The training data consisted solely of embryo images captured in Brazil (586 tetrahedral; 87 non-tetrahedral) and the test data consisted exclusively of embryo images captured in France (163 tetrahedral; 72 non-tetrahedral). Participants/materials, setting, methods The embryo images were labelled as either “tetrahedral” or “non-tetrahedral” at their respective clinics. Annotations were then validated by three operators. A ResNet–50 neural network model pretrained on ImageNet was fine-tuned on the training dataset to predict the correct annotation for each image. We used the cross entropy loss function and the RMSprop optimiser (lr = 1e–5). Simple data augmentations (flips and rotations) were used during the training process to help counteract class imbalances. Main results and the role of chance Our model was capable of classifying embryos in the blind French test set with 79% accuracy when trained with the Brazilian data. The model had sensitivity of 91% and 51% for tetrahedral and non-tetrahedral embryos respectively; precision was 81% and 73%; F1 score was 86% and 60%; and AUC was 0.61 and 0.64. This represents a 10% decrease in accuracy compared to when the model both trained and tested on different data from the same clinics. Limitations, reasons for caution Although strict inclusion and exclusion criteria were used, inter-operator variability may affect the pre-processing stage of the algorithm. Moreover, as only one focal plane was used, ambiguous cases were interpoloated and further annotated. Analysing embryos at multiple focal planes may prove crucial in improving the accuracy of the model. Wider implications of the findings: Though the use of machine learning models in the analysis of embryo imagery has grown in recent years, there has been concern over their robustness and transferability. While previous results have demonstrated the utility of locally-trained models, our results highlight the potential for models to be implemented across different clinics. Trial registration number Not applicable


Sign in / Sign up

Export Citation Format

Share Document