scholarly journals Water cut/salt content forecasting in oil wells using a novel data-driven approach

Author(s):  
Rouhollah Ahmadi ◽  
Jamal Shahrabi ◽  
Babak Aminshahidy

Water cut is an important parameter in reservoir management and surveillance. Unlike traditional approaches, including numerical simulation and analytical techniques, which were developed for predicting water production in oil wells based on some assumptions and limitations, a new data-driven approach is proposed for forecasting water cut in two different types of oil wells in this article. First, a classification approach is presented for water cut prediction in sweet oil wells with discontinuous salt production patterns. Different classification algorithms including Support Vector Machine (SVM), Classification Tree (CT), Random Forest (RF), Multi-Layer Perceptron (MLP), Linear Discriminant Analysis (LDA) and Naïve Bayes (NB) are investigated in this regard. According to the results of a case study on a real Iranian sweet oil well, RF, CT, MLP and SVM can provide the best performance measures, respectively. Next, a Vector Autoregressive (VAR) model is proposed for forecasting water cut in salty oil wells with continuous water production during the life of the well. The proposed VAR model is verified using data of two real salty oil wells. The results confirm that the well-tuned proposed VAR model could provide reliable and acceptable results with very good accuracy in forecasting water production for the near future days.

Atmosphere ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 701
Author(s):  
Bong-Chul Seo

This study describes a framework that provides qualitative weather information on winter precipitation types using a data-driven approach. The framework incorporates the data retrieved from weather radars and the numerical weather prediction (NWP) model to account for relevant precipitation microphysics. To enable multimodel-based ensemble classification, we selected six supervised machine learning models: k-nearest neighbors, logistic regression, support vector machine, decision tree, random forest, and multi-layer perceptron. Our model training and cross-validation results based on Monte Carlo Simulation (MCS) showed that all the models performed better than our baseline method, which applies two thresholds (surface temperature and atmospheric layer thickness) for binary classification (i.e., rain/snow). Among all six models, random forest presented the best classification results for the basic classes (rain, freezing rain, and snow) and the further refinement of the snow classes (light, moderate, and heavy). Our model evaluation, which uses an independent dataset not associated with model development and learning, led to classification performance consistent with that from the MCS analysis. Based on the visual inspection of the classification maps generated for an individual radar domain, we confirmed the improved classification capability of the developed models (e.g., random forest) compared to the baseline one in representing both spatial variability and continuity.


Water ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 54 ◽  
Author(s):  
Congcong Sun ◽  
Benjamí Parellada ◽  
Vicenç Puig ◽  
Gabriela Cembrano

Leaks in water distribution networks (WDNs) are one of the main reasons for water loss during fluid transportation. Considering the worldwide problem of water scarcity, added to the challenges that a growing population brings, minimizing water losses through leak detection and localization, timely and efficiently using advanced techniques is an urgent humanitarian need. There are numerous methods being used to localize water leaks in WDNs through constructing hydraulic models or analyzing flow/pressure deviations between the observed data and the estimated values. However, from the application perspective, it is very practical to implement an approach which does not rely too much on measurements and complex models with reasonable computation demand. Under this context, this paper presents a novel method for leak localization which uses a data-driven approach based on limit pressure measurements in WDNs with two stages included: (1) Two different machine learning classifiers based on linear discriminant analysis (LDA) and neural networks (NNET) are developed to determine the probabilities of each node having a leak inside a WDN; (2) Bayesian temporal reasoning is applied afterwards to rescale the probabilities of each possible leak location at each time step after a leak is detected, with the aim of improving the localization accuracy. As an initial illustration, the hypothetical benchmark Hanoi district metered area (DMA) is used as the case study to test the performance of the proposed approach. Using the fitting accuracy and average topological distance (ATD) as performance indicators, the preliminary results reaches more than 80% accuracy in the best cases.


This paper proposes a methodology that uses a large-scale employment dataset in order to explore which factors affect employment and how. The proposed methodology is a combination of predictive modelling, variable significance analysis, and VEC analysis. Modelling is based on logistic regression, linear discriminant analysis, neural network, classification tree, and support vector machine. Following the CRISP-DM standard process model, we train binary classifiers optimising their hyper-parameters and measure their performance by prediction accuracy, ROC analysis, and AUC. Using sensitivity analysis, we rank the variable significance in order to identify and measure factors of employment. Using VEC analysis, we further explore how values of those factors affect employment. Findings show that best performing models are neural networks and support vector machines with preference to the latter for quality of VEC. Experiments also suggest that education and age are primary contributors for correct classification with specific value distribution, discussed in the paper. All results were validated using a rigorous testing procedure that involves training, validation, and test data partitions and a combination of multiple runs along with three-fold cross-validation. This study addresses some gaps in previous research publications, which lack quantification of the conclusions made.


Author(s):  
Zhexiang Chi ◽  
Taotao Zhou ◽  
Simin Huang ◽  
Yan-Fu Li

Polygonal wear is one of the most critical failure modes of high-speed train wheels that would significantly compromise the safety and reliability of high-speed train operation. However, the mechanism underpinning wheel polygon is complex and still not fully understood, which makes it challenging to track its evolution of the polygonal wheel. The large amount of data gathered through regular inspection and maintenance of Chinese high-speed trains provides a promising way to tackle this challenge with data-driven methods. This article proposes a data-driven approach to predict the degree of the polygonal wear, assess the reliability of individual wheels and the health index of all wheels of a high-speed train for maintenance priority ranking. The synthetic minority over-sampling technique—nominal continuous is adopted to augment the maintenance dataset of imbalanced and mixed features. The autoencoder is used to learn abstract features to represent the original datasets, which are then fed into a support vector machine classifier. The approach is coherently optimized by tuning the model hyper-parameters based on Bayesian optimization. The effectiveness of our proposed approach is demonstrated by the wheel maintenance data obtained from the year 2016 to 2017. The results can also be used to support practical maintenance priority allocation.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1588 ◽  
Author(s):  
Donghyun Kim ◽  
Sangbong Lee ◽  
Jihwan Lee

The fluctuation of the oil price and the growing requirement to reduce greenhouse gas emissions have forced ship builders and shipping companies to improve the energy efficiency of the vessels. The accurate prediction of the required propulsion power at various operating condition is essential to evaluate the energy-saving potential of a vessel. Currently, a new ship is expected to use the ISO15016 method in estimating added resistance induced by external environmental factors in power prediction. However, since ISO15016 usually assumes static water conditions, it may result in low accuracy when it is applied to various operating conditions. Moreover, it is time consuming to apply the ISO15016 method because it is computationally expensive and requires many input data. To overcome this limitation, we propose a data-driven approach to predict the propulsion power of a vessel. In this study, support vector regression (SVR) is used to learn from big data obtained from onboard measurement and the National Oceanic and Atmospheric Administration (NOAA) database. As a result, we show that our data-driven approach shows superior performance compared to the ISO15016 method if the big data of the solid line are secured.


2021 ◽  
Vol 27 (1) ◽  
Author(s):  
Nicola I. Lorè ◽  
Rebecca De Lorenzo ◽  
Paola M. V. Rancoita ◽  
Federica Cugnata ◽  
Alessandra Agresti ◽  
...  

Abstract Background Host inflammation contributes to determine whether SARS-CoV-2 infection causes mild or life-threatening disease. Tools are needed for early risk assessment. Methods We studied in 111 COVID-19 patients prospectively followed at a single reference Hospital fifty-three potential biomarkers including alarmins, cytokines, adipocytokines and growth factors, humoral innate immune and neuroendocrine molecules and regulators of iron metabolism. Biomarkers at hospital admission together with age, degree of hypoxia, neutrophil to lymphocyte ratio (NLR), lactate dehydrogenase (LDH), C-reactive protein (CRP) and creatinine were analysed within a data-driven approach to classify patients with respect to survival and ICU outcomes. Classification and regression tree (CART) models were used to identify prognostic biomarkers. Results Among the fifty-three potential biomarkers, the classification tree analysis selected CXCL10 at hospital admission, in combination with NLR and time from onset, as the best predictor of ICU transfer (AUC [95% CI] = 0.8374 [0.6233–0.8435]), while it was selected alone to predict death (AUC [95% CI] = 0.7334 [0.7547–0.9201]). CXCL10 concentration abated in COVID-19 survivors after healing and discharge from the hospital. Conclusions CXCL10 results from a data-driven analysis, that accounts for presence of confounding factors, as the most robust predictive biomarker of patient outcome in COVID-19. Graphic abstract


2019 ◽  
Author(s):  
SF Woodward ◽  
D Reiss ◽  
MO Magnasco

AbstractMost research into bottlenose dolphins’ (Tursiops truncatus’) capacity for communication has centered on tonal calls termed whistles, in particular individually distinctive contact calls referred to as signature whistles. While “non-signature” whistles exist, and may be important components of bottlenose dolphins’ communicative repertoire, they have not been studied extensively. This is in part due to the difficulty of attributing whistles to specific individuals, a challenge that has limited the study of not only non-signature whistles but the study of general acoustic exchanges among socializing dolphins. In this paper, we propose the first machine-learning-based approach to identifying the source locations of semi-stationary, tonal, whistle-like sounds in a highly reverberant space, specifically a half-cylindrical dolphin pool. We deliver estimated time-difference-of-arrivals (TDOA’s) and normalized cross-correlation values computed from pairs of hydrophone signals to a random forest model for high-feature-volume classification and feature selection, and subsequently deliver the selected features into linear discriminant analysis, linear and quadratic Support Vector Machine (SVM), and Gaussian process models. In our 14-source-location setup, we achieve perfect accuracy in localization by classification and high accuracy in localization by regression (median absolute deviation of 0.66 m, interquartile range of 0.34 m - 1.57 m), with fewer than 10,000 features. By building a parsimonious (minimum-feature) classification tree model for the same task, we show that a minimally sufficient feature set is consistent with the information valued by a strictly geometric, time-difference-of-arrival-based approach to sound source localization. Ultimately, our regression models yielded better accuracy than the established Steered-Response Power (SRP) method when all training data were used, and comparable accuracy along the pool surface when deprived of training data at testing sites; our methods additionally boast improved computation time and the potential for superior localization accuracy in all dimensions with more training data.


2019 ◽  
Vol 21 (18) ◽  
pp. 9159-9167 ◽  
Author(s):  
Xinyu Wang ◽  
Yang Hong ◽  
Man Wang ◽  
Gongming Xin ◽  
Yanan Yue ◽  
...  

A data-driven approach combining classical molecular dynamics simulation and machine learning technique is used to investigate the mechanical properties of freestanding h-MoSe2 and t-MoSe2.


Author(s):  
Dongxiu Ou ◽  
Rui Xue ◽  
Ke Cui

Turnout systems on railways are crucial for safety protection and improvements in efficiency. The statistics show that the most common faults in railway system are turnout system faults. Therefore, many railway systems have adopted the microcomputer monitoring system (MMS) to monitor their health and performance in real time. However, in practice, existing turnout fault diagnosis methods depend largely on human experience. In this paper, we propose a data-driven fault diagnosis method that monitors data from point machines collected using MMS. First, based on a derivative method, data features are extracted by segmenting the original sample. Then, we apply two methods for feature reduction: principal component analysis (PCA) and linear discriminant analysis (LDA). The results show that LDA gave a better performance in the cases studied. A problem that cannot be overlooked is that the imbalanced quantity of rare fault samples and abundant normal samples will reduce the accuracy of classic fault diagnosis models. To deal with this problem of imbalanced data, we propose a modified support vector machine (SVM) method. Finally, an experiment using real data collected from the Guangzhou Railway Line is presented, which demonstrates that our method is reliable and feasible in fault diagnosis. It can further assist engineers to perform timely repairs and maintenance work in the future.


Sign in / Sign up

Export Citation Format

Share Document