scholarly journals Pitting Judgment Model Based on Machine Learning and Feature Optimization Methods

2021 ◽  
Vol 8 ◽  
Author(s):  
Zhihao Qu ◽  
Dezhi Tang ◽  
Zhu Wang ◽  
Xiaqiao Li ◽  
Hongjian Chen ◽  
...  

Pitting corrosion seriously harms the service life of oil field gathering and transportation pipelines, which is an important subject of corrosion prevention. In this study, we collected the corrosion data of pipeline steel immersion experiment and established a pitting judgment model based on machine learning algorithm. Feature reduction methods, including feature importance calculation and pearson correlation analysis, were first adopted to find the important factors affecting pitting. Then, the best input feature set for pitting judgment was constructed by combining feature combination and feature creation. Through receiver operating characteristic (ROC) curve and area under curve (AUC) calculation, random forest algorithm was selected as the modeling algorithm. As a result, the pitting judgment model based on machine learning and high dimensional feature parameters (i.e., material factors, solution factors, environment factors) showed good prediction accuracy. This study provided an effective means for processing high-dimensional and complex corrosion data, and proved the feasibility of machine learning in solving material corrosion problems.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yuichi Okinaga ◽  
Daisuke Kyogoku ◽  
Satoshi Kondo ◽  
Atsushi J. Nagano ◽  
Kei Hirose

AbstractThe least absolute shrinkage and selection operator (lasso) and principal component regression (PCR) are popular methods of estimating traits from high-dimensional omics data, such as transcriptomes. The prediction accuracy of these estimation methods is highly dependent on the covariance structure, which is characterized by gene regulation networks. However, the manner in which the structure of a gene regulation network together with the sample size affects prediction accuracy has not yet been sufficiently investigated. In this study, Monte Carlo simulations are conducted to investigate the prediction accuracy for several network structures under various sample sizes. When the gene regulation network is a random graph, a sufficiently large number of observations are required to ensure good prediction accuracy with the lasso. The PCR provided poor prediction accuracy regardless of the sample size. However, a real gene regulation network is likely to exhibit a scale-free structure. In such cases, the simulation indicates that a relatively small number of observations, such as $$N=300$$ N = 300 , is sufficient to allow the accurate prediction of traits from a transcriptome with the lasso.


Author(s):  
Ho-Chul Park ◽  
Dong-Kyu Kim ◽  
Seung-Young Kho

Traffic state prediction is an important issue in traffic operations. One of the main purposes of traffic operations is to prevent a flow breakdown. Therefore, it is necessary to predict the traffic state in such a way as to reflect the stochastic process of traffic flow. To predict accurately the traffic state, machine learning-based models have been widely adopted, but they have difficulty in obtaining insights for traffic state prediction due to black-box procedures of the models. A Bayesian network (BN) is a methodology that is suitable for dealing with problems that involve uncertainty, and it can also improve the understanding of such problems. In this study, we develop a traffic state prediction model using a BN to reflect the dynamic and stochastic characteristics of traffic flow. To improve the BN, which has been used with a simple structure in transportation problems, we propose a modeling procedure using a mixture of Gaussians (MoGs). In the performance evaluation, the BN has better performance than a logistic regression, and it has the same level of performance as an artificial neural network based on machine learning. Also, by performing sensitivity analyses, we provide the understanding of traffic state prediction and the guidelines for improving the model. The BN developed in this study can be considered as a traffic state prediction model with good prediction accuracy and interpretability.


2011 ◽  
Vol 14 (01) ◽  
pp. 35-44 ◽  
Author(s):  
Hong Tang ◽  
Niall Toomey ◽  
W. Scott Meddaugh

Summary The Maastrichtian (Upper Cretaceous) reservoir is one of five prolific oil reservoirs in the giant Wafra oil field. The Maastrichtian oil production is largely from subtidal dolomites at an average depth of 2,500 ft. Carbonate deposition occurred on a very gently dipping, shallow, arid, and restricted ramp setting that transitioned between normal marine conditions to restricted lagoonal environments. The average porosity of the reservoir interval is approximately 15%, although productive zones have porosity values up to 30–40%. The average permeability of the reservoir interval is approximately 30 md. Individual core plugs have measured permeability up to 1,200 md. Efforts to predict sedimentary facies from well logs in carbonate reservoirs is difficult because of the complex carbonate sedimentary facies structures, strong diagenetic overprint, and challenging log analysis in part owing to the presence of vugs and fractures. In the study, a workflow including (1) core description preprocessing, (2) log- and core-data cleanup, and (3) probabilistic-neural-network (PNN) facies analysis was used to predict facies from log data accurately. After evaluation of a variety of statistical approaches, a PNN-based approach was used to predict facies from well-log data. The PNN was selected as a tool because it has the capability to delineate complex nonlinear relationships between facies and log data. The PNN method was shown to outperform multivariate statistical algorithms and, in this study, gave good prediction accuracy (above 70%). The prediction uncertainty was quantified by two probabilistic logs—discriminant ability and overall confidence. These probabilistic logs can be used to evaluate the prediction uncertainty during interpretation. Lithofacies were predicted for 15 key wells in the Wafra Maastrichtian reservoir and were effectively used to extend the understanding of the Maastrichtian stratigraphy, depositional setting, and facies distribution.


2021 ◽  
Vol 108 (Supplement_6) ◽  
Author(s):  
C W L Chia ◽  
S Bhatia ◽  
D Shastin ◽  
M Chamberland

Abstract Aim A third of epilepsy patients suffer from medically refractory seizures. In patients eligible for surgical treatment, seizure freedom rates remain variable. Machine learning (ML) utilises large datasets to detect patterns to make predictions. We systematically review studies employing ML models for prediction of outcome following resective epilepsy surgery to evaluate their efficacy, applicability and value in determining surgical candidacy. Method MEDLINE, Cochrane and EMBASE databases were searched for literature published between 2010 – 2020 according to PRISMA guidance. Non-refractory epilepsy, non-clinical outcome prediction, or non-human studies were excluded. Clinical and demographic data, ML features, discrimination and prediction accuracy metrics were extracted. Results 15 studies were included. Median cohort size was 49 (range 16 – 4211). Heterogeneous input data sources were utilised: MRI (n = 10) , electrophysiology (n = 4), PET (n = 2), clinical data (n = 2), and neuropsychological testing (n = 1). The most common ML model used was support vector machines (n = 7). All studies had good discrimination (AUC > 0.70, range: 0.79 [95% CI NR] - 0.94 [95% CI 0.92 – 0.96]), and good prediction accuracy (> 0.70, range: 0.76 [95% CI NR] – 0.95 [95% CI NR]). Limitations included small sample sizes, limited external validation and lack of comparison with clinician-predicted outcomes. Conclusions Machine Learning for outcome prediction could enhance clinical decision-making for surgical candidacy in epilepsy, and lead to improved precision medicine delivery. Outcome reporting remains inconsistent, and further work is required to externally validate such models to implement these to large-scale clinical populations.


Geofluids ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Liqiang Wang ◽  
Mingji Shao ◽  
Gen Kou ◽  
Maoxian Wang ◽  
Ruichao Zhang ◽  
...  

Classical decline methods, such as Arps yield decline curve analysis, have advantages of simple principles and convenient applications, and they are widely used for yield decline analysis. However, for carbonate reservoirs with high initial production, rapid decline, and large production fluctuations, with most wells having no stable production period, the adaptability of traditional decline methods is inadequate. Hence, there is an urgent need to develop a new decline analysis method. Although machine learning methods based on multiple regression and deep learning have been applied to unconventional oil reservoirs in recent years, their application effects have been unsatisfactory. For example, prediction errors based on multiple regression machine learning methods are relatively large, and deep learning sample requirements and the actual conditions of reservoir management do not match. In this study, a new equal probability gene expression programming (EP-GEP) method was developed to overcome the shortcomings of the conventional Arps decline model in the production decline analysis of carbonate reservoirs. Through model validation and comparative analysis of prediction effects, it was proven that the EP-GEP model exhibited good prediction accuracy, and the average relative error was significantly smaller than those of the traditional Arps model and existing machine learning methods. The successful application of the proposed method in the production decline analysis of carbonate reservoirs is expected to provide a new decline analysis tool for field reservoir engineers.


Author(s):  
Pieter Robyns ◽  
Peter Quax ◽  
Wim Lamotte

Sensitive cryptographic information, e.g. AES secret keys, can be extracted from the electromagnetic (EM) leakages unintentionally emitted by a device using techniques such as Correlation Electromagnetic Analysis (CEMA). In this paper, we introduce Correlation Optimization (CO), a novel approach that improves CEMA attacks by formulating the selection of useful EM leakage samples in a trace as a machine learning optimization problem. To this end, we propose the correlation loss function, which aims to maximize the Pearson correlation between a set of EM traces and the true AES key during training. We show that CO works with high-dimensional and noisy traces, regardless of time-domain trace alignment and without requiring prior knowledge of the power consumption characteristics of the cryptographic hardware. We evaluate our approach using the ASCAD benchmark dataset and a custom dataset of EM leakages from an Arduino Duemilanove, captured with a USRP B200 SDR. Our results indicate that the masked AES implementation used in all three ASCAD datasets can be broken with a shallow Multilayer Perceptron model, whilst requiring only 1,000 test traces on average. A similar methodology was employed to break the unprotected AES implementation from our custom dataset, using 22,000 unaligned and unfiltered test traces.


2014 ◽  
Vol 651-653 ◽  
pp. 1748-1752
Author(s):  
Fu Li Xie ◽  
Guang Quan Cheng

With the development of network science, the link prediction problem has attracted more and more attention. Among which, link prediction methods based on similarity has been most widely studied. Previous methods depicting similarity of nodes mainly consider their common neighbors. But in this paper, from the view of network environment of nodes, which is to analysis the links around the pair of nodes, derive nodes similarity through that of links, a new way to solve the link prediction problem is provided. This paper establishes a link prediction model based on similarity between links, presents the LE index. Finally, the LE index is tested on five real datasets, and compared with existing similarity-based link prediction methods, the experimental results show that LE index can achieve good prediction accuracy, especially outperforms the other methods in the Yeast network.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


2020 ◽  
Vol 10 (5) ◽  
pp. 1797 ◽  
Author(s):  
Mera Kartika Delimayanti ◽  
Bedy Purnama ◽  
Ngoc Giang Nguyen ◽  
Mohammad Reza Faisal ◽  
Kunti Robiatul Mahmudah ◽  
...  

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.


Agronomy ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 35
Author(s):  
Xiaodong Huang ◽  
Beth Ziniti ◽  
Michael H. Cosh ◽  
Michele Reba ◽  
Jinfei Wang ◽  
...  

Soil moisture is a key indicator to assess cropland drought and irrigation status as well as forecast production. Compared with the optical data which are obscured by the crop canopy cover, the Synthetic Aperture Radar (SAR) is an efficient tool to detect the surface soil moisture under the vegetation cover due to its strong penetration capability. This paper studies the soil moisture retrieval using the L-band polarimetric Phased Array-type L-band SAR 2 (PALSAR-2) data acquired over the study region in Arkansas in the United States. Both two-component model-based decomposition (SAR data alone) and machine learning (SAR + optical indices) methods are tested and compared in this paper. Validation using independent ground measurement shows that the both methods achieved a Root Mean Square Error (RMSE) of less than 10 (vol.%), while the machine learning methods outperform the model-based decomposition, achieving an RMSE of 7.70 (vol.%) and R2 of 0.60.


Sign in / Sign up

Export Citation Format

Share Document