Quantitative Analysis of the Main Controlling Factors of Oil Saturation Variation

With the high-speed development of artificial intelligence, machine learning methods have become key technologies for intelligent exploration, development, and production in oil and gas fields. This article presents a workflow analysing the main controlling factors of oil saturation variation utilizing machine learning algorithms based on static and dynamic data from actual reservoirs. The dataset in this study generated from 468 wells includes thickness, permeability, porosity, net-to-gross (NTG) ratio, oil production variation (OPV), water production variation (WPV), water cut variation (WCV), neighbouring liquid production variation (NLPV), neighbouring water injection variation (NWIV), and oil saturation variation (OSV). A data processing workflow has been implemented to replace outliers and to increase model accuracy. A total of 10 machine learning algorithms are tested and compared in the dataset. Random forest (RF) and gradient boosting (GBT) are optimal and selected to conduct quantitative analysis of the main controlling factors. Analysis results show that NWIV is the variable with the highest degree of impact on OSV; impact factor is 0.276. Optimization measures are proposed for the development of this kind of sandstone reservoir based on main controlling factor analysis. This study proposes a reference case for oil saturation quantitative analysis based on machine learning methods that will help reservoir engineers make better decision.

Download Full-text

A Comparison of Machine Learning and Bayesian Modelling for Molecular Serotyping

10.1101/138636 ◽

2017 ◽

Author(s):

Richard Newton ◽

Lorenz Wernisch

Keyword(s):

Machine Learning ◽

Bayesian Model ◽

Learning Algorithms ◽

Biological Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Molecular Serotyping

AbstractBackgroundStreptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model.ResultsWe compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays.ConclusionsWith the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological insights, which we illustrate with an example.

Download Full-text

APPLICATION OF MACHINE LEARNING METHODS TO APPROXIMATE THE EXPERIMENTAL CHARACTERISTICS OF A MEMRISTOR

Mathematical modeling in materials science of electronic component ◽

10.29003/m1536.mmmsec-2020/116-119 ◽

2020 ◽

Author(s):

V. Lopatenko

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Scientific Community ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Learning Methods ◽

Passive Element ◽

Machine Learning Methods ◽

Boosting Algorithm

Memristor is a passive element in microelectronics, similar in its properties to a biological synapse. The possibility of using a memristor as an analog element in neural networks increases the interest of the scientific community in the study of its properties. In this paper, we study the possibility of modeling some characteristics of a memristor using machine learning algorithms, in particular, the gradient boosting algorithm.

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Machine Learning Methods in Precision Medicine Targeting Epigenetic Diseases

Current Pharmaceutical Design ◽

10.2174/1381612824666181112114228 ◽

2019 ◽

Vol 24 (34) ◽

pp. 3998-4006

Author(s):

Shijie Fan ◽

Yu Chen ◽

Cheng Luo ◽

Fanwang Meng

Keyword(s):

Machine Learning ◽

Big Data ◽

Precision Medicine ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Advantages And Disadvantages ◽

Machine Learning Methods ◽

Accelerated Studies ◽

Applications Of Machine Learning

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).

Download Full-text

Eagle View: An Abstract Evaluation of Machine Learning Algorithms based on Data Properties

10.36227/techrxiv.14459361.v1 ◽

2021 ◽

Author(s):

Dhairya Vyas

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Learning Algorithms ◽

Numerical Data ◽

Machine Learning Algorithms ◽

Series Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Almost All

In terms of Machine Learning, the majority of the data can be grouped into four categories: numerical data, category data, time-series data, and text. We use different classifiers for different data properties, such as the Supervised; Unsupervised; and Reinforcement. Each Categorises has classifier we have tested almost all machine learning methods and make analysis among them.

Download Full-text

Learning Geographical Manifolds: A Kernel Trick for Geographical Machine Learning

10.31235/osf.io/75s8v ◽

2019 ◽

Author(s):

Levi John Wolf ◽

Elijah Knaap

Keyword(s):

Machine Learning ◽

Dimension Reduction ◽

Manifold Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Dimensional Structure ◽

Geographic Scale ◽

Learning Methods ◽

Machine Learning Methods ◽

Non Linear

Dimension reduction is one of the oldest concerns in geographical analysis. Despite significant, longstanding attention in geographical problems, recent advances in non-linear techniques for dimension reduction, called manifold learning, have not been adopted in classic data-intensive geographical problems. More generally, machine learning methods for geographical problems often focus more on applying standard machine learning algorithms to geographic data, rather than applying true "spatially-correlated learning," in the words of Kohonen. As such, we suggest a general way to incentivize geographical learning in machine learning algorithms, and link it to many past methods that introduced geography into statistical techniques. We develop a specific instance of this by specifying two geographical variants of Isomap, a non-linear dimension reduction, or "manifold learning," technique. We also provide a method for assessing what is added by incorporating geography and estimate the manifold's intrinsic geographic scale. To illustrate the concepts and provide interpretable results, we conducting a dimension reduction on geographical and high-dimensional structure of social and economic data on Brooklyn, New York. Overall, this paper's main endeavor--defining and explaining a way to "geographize" many machine learning methods--yields interesting and novel results for manifold learning the estimation of intrinsic geographical scale in unsupervised learning.

Download Full-text

Comparison of Selected Machine Learning Algorithms for Industrial Electrical Tomography

Sensors ◽

10.3390/s19071521 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1521 ◽

Cited By ~ 29

Author(s):

Tomasz Rymarczyk ◽

Grzegorz Kłosowski ◽

Edward Kozłowski ◽

Paweł Tchórzewski

Keyword(s):

Machine Learning ◽

Electrical Impedance Tomography ◽

Newton Method ◽

Electrical Impedance ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Electrical Tomography ◽

Learning Methods ◽

Impedance Tomography ◽

Machine Learning Methods

The main goal of this work was to compare the selected machine learning methods with the classic deterministic method in the industrial field of electrical impedance tomography. The research focused on the development and comparison of algorithms and models for the analysis and reconstruction of data using electrical tomography. The novelty was the use of original machine learning algorithms. Their characteristic feature is the use of many separately trained subsystems, each of which generates a single pixel of the output image. Artificial Neural Network (ANN), LARS and Elastic net methods were used to solve the inverse problem. These algorithms have been modified by a corresponding increase in equations (multiply) for electrical impedance tomography using the finite element method grid. The Gauss-Newton method was used as a reference to machine learning methods. The algorithms were trained using learning data obtained through computer simulation based on real models. The results of the experiments showed that in the considered cases the best quality of reconstructions was achieved by ANN. At the same time, ANN was the slowest in terms of both the training process and the speed of image generation. Other machine learning methods were comparable with the deterministic Gauss-Newton method and with each other.

Download Full-text

MARTT: Automatic Markup of Taxonomic Descriptions with XML

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais277 ◽

2013 ◽

Author(s):

Hong Cui

Keyword(s):

Machine Learning ◽

Information Content ◽

Large Scale ◽

Learning Algorithms ◽

General Purpose ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Taxonomic Descriptions ◽

Efficient Machine

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges…

Download Full-text

Supervised Machine Learning Algorithms for Evaluation of Solid Lipid Nanoparticles and Particle Size

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181218160704 ◽

2019 ◽

Vol 21 (9) ◽

pp. 693-699 ◽

Cited By ~ 2

Author(s):

A. Alper Öztürk ◽

A. Bilge Gündüz ◽

Ozan Ozisik

Keyword(s):

Machine Learning ◽

Particle Size ◽

High Speed ◽

Mean Absolute Error ◽

Mixing Time ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Aims and Objectives: Solid Lipid Nanoparticles (SLNs) are pharmaceutical delivery systems that have advantages such as controlled drug release, long-term stability etc. Particle Size (PS) is one of the important criteria of SLNs. These factors affect drug release rate, bio-distribution etc. In this study, the formulation of SLNs using high-speed homogenization technique has been evaluated. The main emphasis of the work is to study whether the effect of mixing time and formulation ingredients on PS can be modeled. For this purpose, different machine learning algorithms have been applied and evaluated using the mean absolute error metric. Materials and Methods: SLNs were prepared by high-speed homogenizaton. PS, size distribution and zeta potential measurements were performed on freshly prepared samples. In order to model the formulation of the particles in terms of mixing time and formulation ingredients and evaluate the predictability of PS depending on these parameters, different machine learning algorithms were applied on the prepared dataset and the performances of the algorithms were also evaluated. Results: PS of SLNs obtained was in the range of 263-498nm. The results present that PS of SLNs can be best estimated by decision tree based methods, among which Random Forest has the least mean absolute error value with 0.028. As a result, the estimation of machine learning algorithms demonstrates that particle size can be estimated by both decision rule-based machine learning methods and function fitting machine learning methods. Conclusion: Our findings present that machine learning methods can be highly useful for determining formulation parameters for further research.

Download Full-text