Reduced-Degrees-of-Freedom Gaussian-Mixture-Model Fitting for Large-Scale History-Matching Problems

SPE Journal ◽  
2019 ◽  
Vol 25 (01) ◽  
pp. 037-055
Author(s):  
Guohua Gao ◽  
Hao Jiang ◽  
Chaohui Chen ◽  
Jeroen C. Vink ◽  
Yaakoub El Khamra ◽  
...  

Summary It has been demonstrated that the Gaussian-mixture-model (GMM) fitting method can construct a GMM that more accurately approximates the posterior probability density function (PDF) by conditioning reservoir models to production data. However, the number of degrees of freedom (DOFs) for all unknown GMM parameters might become huge for large-scale history-matching problems. A new formulation of GMM fitting with a reduced number of DOFs is proposed in this paper to save memory use and reduce computational cost. The performance of the new method is benchmarked against other methods using test problems with different numbers of uncertain parameters. The new method performs more efficiently than the full-rank GMM fitting formulation, reducing the memory use and computational cost by a factor of 5 to 10. Although it is less efficient than the simple GMM approximation dependent on local linearization (L-GMM), it achieves much higher accuracy, reducing the error by a factor of 20 to 600. Finally, the new method together with the parallelized acceptance/rejection (A/R) algorithm is applied to a synthetic history-matching problem for demonstration.

2019 ◽  
Author(s):  
Guohua Gao ◽  
Hao Jiang ◽  
Chaohui Chen ◽  
Jeroen C. Vink ◽  
Yaakoub El Khamra ◽  
...  

2021 ◽  
Author(s):  
Guohua Gao ◽  
Jeroen Vink ◽  
Fredrik Saaf ◽  
Terence Wells

Abstract When formulating history matching within the Bayesian framework, we may quantify the uncertainty of model parameters and production forecasts using conditional realizations sampled from the posterior probability density function (PDF). It is quite challenging to sample such a posterior PDF. Some methods e.g., Markov chain Monte Carlo (MCMC), are very expensive (e.g., MCMC) while others are cheaper but may generate biased samples. In this paper, we propose an unconstrained Gaussian Mixture Model (GMM) fitting method to approximate the posterior PDF and investigate new strategies to further enhance its performance. To reduce the CPU time of handling bound constraints, we reformulate the GMM fitting formulation such that an unconstrained optimization algorithm can be applied to find the optimal solution of unknown GMM parameters. To obtain a sufficiently accurate GMM approximation with the lowest number of Gaussian components, we generate random initial guesses, remove components with very small or very large mixture weights after each GMM fitting iteration and prevent their reappearance using a dedicated filter. To prevent overfitting, we only add a new Gaussian component if the quality of the GMM approximation on a (large) set of blind-test data sufficiently improves. The unconstrained GMM fitting method with the new strategies proposed in this paper is validated using nonlinear toy problems and then applied to a synthetic history matching example. It can construct a GMM approximation of the posterior PDF that is comparable to the MCMC method, and it is significantly more efficient than the constrained GMM fitting formulation, e.g., reducing the CPU time by a factor of 800 to 7300 for problems we tested, which makes it quite attractive for large scale history matching problems.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Qi Sun ◽  
Liwen Jiang ◽  
Haitao Xu

A vehicle-commodity matching problem (VCMP) is presented for service providers to reduce the cost of the logistics system. The vehicle classification model is built as a Gaussian mixture model (GMM), and the expectation-maximization (EM) algorithm is designed to solve the parameter estimation of GMM. A nonlinear mixed-integer programming model is constructed to minimize the total cost of VCMP. The matching process between vehicle and commodity is realized by GMM-EM, as a preprocessing of the solution. The design of the vehicle-commodity matching platform for VCMP is designed to reduce and eliminate the information asymmetry between supply and demand so that the order allocation can work at the right time and the right place and use the optimal solution of vehicle-commodity matching. Furthermore, the numerical experiment of an e-commerce supply chain proves that a hybrid evolutionary algorithm (HEA) is superior to the traditional method, which provides a decision-making reference for e-commerce VCMP.


2021 ◽  
Author(s):  
Milana Gataric ◽  
Jun Sung Park ◽  
Tong Li ◽  
Vasy Vaskivskyi ◽  
Jessica Svedlund ◽  
...  

Realising the full potential of novel image-based spatial transcriptomic (IST) technologies requires robust and accurate algorithms for decoding the hundreds of thousand fluorescent signals each derived from single molecules of mRNA. In this paper, we introduce PoSTcode, a probabilistic method for transcript decoding from cyclic multi-channel images, whose effectiveness is demonstrated on multiple large-scale datasets generated using different versions of the in situ sequencing protocols. PoSTcode is based on a re-parametrised matrix-variate Gaussian mixture model designed to account for correlated noise across fluorescence channels and imaging cycles. PoSTcode is shown to recover up to 50% more confidently decoded molecules while simultaneously decreasing transcript mislabeling when compared to existing decoding techniques. In addition, we demonstrate its increased stability to various types of noise and tuning parameters, which makes this new approach reliable and easy to use in practice. Lastly, we show that PoSTcode produces fewer doublet signals compared to a pixel-based decoding algorithm.


SPE Journal ◽  
2021 ◽  
pp. 1-20
Author(s):  
Guohua Gao ◽  
Jeroen Vink ◽  
Fredrik Saaf ◽  
Terence Wells

Summary When formulating history matching within the Bayesian framework, we may quantify the uncertainty of model parameters and production forecasts using conditional realizations sampled from the posterior probability density function (PDF). It is quite challenging to sample such a posterior PDF. Some methods [e.g., Markov chain Monte Carlo (MCMC)] are very expensive, whereas other methods are cheaper but may generate biased samples. In this paper, we propose an unconstrained Gaussian mixture model (GMM) fitting method to approximate the posterior PDF and investigate new strategies to further enhance its performance. To reduce the central processing unit (CPU) time of handling bound constraints, we reformulate the GMM fitting formulation such that an unconstrained optimization algorithm can be applied to find the optimal solution of unknown GMM parameters. To obtain a sufficiently accurate GMM approximation with the lowest number of Gaussian components, we generate random initial guesses, remove components with very small or very large mixture weights after each GMM fitting iteration, and prevent their reappearance using a dedicated filter. To prevent overfitting, we add a new Gaussian component only if the quality of the GMM approximation on a (large) set of blind-test data sufficiently improves. The unconstrained GMM fitting method with the new strategies proposed in this paper is validated using nonlinear toy problems and then applied to a synthetic history-matching example. It can construct a GMM approximation of the posterior PDF that is comparable to the MCMC method, and it is significantly more efficient than the constrained GMM fitting formulation (e.g., reducing the CPU time by a factor of 800 to 7,300 for problems we tested), which makes it quite attractive for large-scalehistory-matchingproblems. NOTE: This paper is published as part of the 2021 SPE Reservoir Simulation Special Issue.


SPE Journal ◽  
2017 ◽  
Vol 22 (06) ◽  
pp. 1999-2011 ◽  
Author(s):  
Guohua Gao ◽  
Hao Jiang ◽  
Paul van Hagen ◽  
Jeroen C. Vink ◽  
Terence Wells

Summary Solving the Gauss-Newton trust-region subproblem (TRS) with traditional solvers involves solving a symmetric linear system with dimensions the same as the number of uncertain parameters, and it is extremely computational expensive for history-matching problems with a large number of uncertain parameters. A new trust-region (TR) solver is developed to save both memory usage and computational cost, and its performance is compared with the well-known direct TR solver using factorization and iterative TR solver using the conjugate-gradient approach. With application of the matrix inverse lemma, the original TRS is transformed to a new problem that involves solving a linear system with the number of observed data. For history-matching problems in which the number of uncertain parameters is much larger than the number of observed data, both memory usage and central-processing-unit (CPU) time can be significantly reduced compared with solving the original problem directly. An auto-adaptive power-law transformation technique is developed to transform the original strong nonlinear function to a new function that behaves more like a linear function. Finally, the Newton-Raphson method with some modifications is applied to solve the TRS. The proposed approach is applied to find best-match solutions in Bayesian-style assisted-history-matching (AHM) problems. It is first validated on a set of synthetic test problems with different numbers of uncertain parameters and different numbers of observed data. In terms of efficiency, the new approach is shown to significantly reduce both the computational cost and memory usage compared with the direct TR solver of the GALAHAD optimization library (see http://www.galahad.rl.ac.uk/doc.html). In terms of robustness, the new approach is able to reduce the risk of failure to find the correct solution significantly, compared with the iterative TR solver of the GALAHAD optimization library. Our numerical results indicate that the new solver can solve large-scale TRSs with reasonably small amounts of CPU time (in seconds) and memory (in MB). Compared with the CPU time and memory used for completing one reservoir simulation run for the same problem (in hours and in GB), the cost for finding the best-match parameter values using our new TR solver is negligible. The proposed approach has been implemented in our in-house reservoir simulation and history-matching system, and has been validated on a real-reservoir-simulation model. This illustrates the main result of this paper: the development of a robust Gauss-Newton TR approach, which is applicable for large-scale history-matching problems with negligible extra cost in CPU and memory.


2022 ◽  
Vol 355 ◽  
pp. 02024
Author(s):  
Haojing Wang ◽  
Yingjie Tian ◽  
An Li ◽  
Jihai Wu ◽  
Gaiping Sun

In view of the limitation of “hard assignment” of clusters in traditional clustering methods and the difficulty of meeting the requirements of clustering efficiency and clustering accuracy simultaneously in regard to massive data sets, a load classification method based on a Gaussian mixture model combining clustering and principal component analysis is proposed. The load data are fed into a Gaussian mixture model clustering algorithm after principal component analysis and dimensionality reduction to achieve classification of large-scale load datasets. The method in this paper is used to classify loads in the Canadian AMPds2 public dataset and is compared with K-Means, Gaussian mixed model clustering and other methods. The results show that the proposed method can not only achieve load classification more effectively and finely, but also save computational cost and improve computational efficiency.


2019 ◽  
Vol 1 (2) ◽  
pp. 145-153
Author(s):  
Jin-jun Tang ◽  
Jin Hu ◽  
Yi-wei Wang ◽  
He-lai Huang ◽  
Yin-hai Wang

Abstract The data collected from taxi vehicles using the global positioning system (GPS) traces provides abundant temporal-spatial information, as well as information on the activity of drivers. Using taxi vehicles as mobile sensors in road networks to collect traffic information is an important emerging approach in efforts to relieve congestion. In this paper, we present a hybrid model for estimating driving paths using a density-based spatial clustering of applications with noise (DBSCAN) algorithm and a Gaussian mixture model (GMM). The first step in our approach is to extract the locations from pick-up and drop-off records (PDR) in taxi GPS equipment. Second, the locations are classified into different clusters using DBSCAN. Two parameters (density threshold and radius) are optimized using real trace data recorded from 1100 drivers. A GMM is also utilized to estimate a significant number of locations; the parameters of the GMM are optimized using an expectation-maximum (EM) likelihood algorithm. Finally, applications are used to test the effectiveness of the proposed model. In these applications, locations distributed in two regions (a residential district and a railway station) are clustered and estimated automatically.


Sign in / Sign up

Export Citation Format

Share Document