Application of l 1 Estimation of Gaussian Mixture Model Parameters for Language Identification

Author(s):  
Danila Doroshin ◽  
Maxim Tkachenko ◽  
Nikolay Lubimov ◽  
Mikhail Kotov
2021 ◽  
Author(s):  
Guohua Gao ◽  
Jeroen Vink ◽  
Fredrik Saaf ◽  
Terence Wells

Abstract When formulating history matching within the Bayesian framework, we may quantify the uncertainty of model parameters and production forecasts using conditional realizations sampled from the posterior probability density function (PDF). It is quite challenging to sample such a posterior PDF. Some methods e.g., Markov chain Monte Carlo (MCMC), are very expensive (e.g., MCMC) while others are cheaper but may generate biased samples. In this paper, we propose an unconstrained Gaussian Mixture Model (GMM) fitting method to approximate the posterior PDF and investigate new strategies to further enhance its performance. To reduce the CPU time of handling bound constraints, we reformulate the GMM fitting formulation such that an unconstrained optimization algorithm can be applied to find the optimal solution of unknown GMM parameters. To obtain a sufficiently accurate GMM approximation with the lowest number of Gaussian components, we generate random initial guesses, remove components with very small or very large mixture weights after each GMM fitting iteration and prevent their reappearance using a dedicated filter. To prevent overfitting, we only add a new Gaussian component if the quality of the GMM approximation on a (large) set of blind-test data sufficiently improves. The unconstrained GMM fitting method with the new strategies proposed in this paper is validated using nonlinear toy problems and then applied to a synthetic history matching example. It can construct a GMM approximation of the posterior PDF that is comparable to the MCMC method, and it is significantly more efficient than the constrained GMM fitting formulation, e.g., reducing the CPU time by a factor of 800 to 7300 for problems we tested, which makes it quite attractive for large scale history matching problems.


2021 ◽  
Author(s):  
Hua Yuan

The objective of this thesis is to acquire abstract image features through statistical modelling in the wavelet domain and then based on the extracted image features, develop an effective content-based image retreival (CBIR) system and a fragile watermarking scheme. In this thesis, we first present a statistical modelling of images in the wavelet domain through a Gaussian mixture model (GMM) and a generalized Gaussian mixture model (GGMM). An Expectation Maximization (EM) algorithm is developed to help estimate the model parameters. A novel similarity measure based on the Kullback-Leibler divergence is also developed to calculate the distance of two distinct model distributions. We then apply the statistical modelling to two application areas: image retrieval and fragile watermarking. In image retrieval, the model parameters are employed as image features to compose the indexing feature space, while the feature distance of two compared images is computed using the novel similarity measure. The new image retrieval method has a better retrieval performance than most conventional methods. In fragile watermarking, the model parameters are utilized for the watermark embedding. The new watermarking scheme achieves a virtually imperceptible embedding of watermarks because it modifies only a few image data and embeds watermarks at image texture edges. A multiscale embedding of fragile watermarks is given to enhance the embeddability rate and on the other hand, to constitute a semi-fragile approach.


2014 ◽  
Vol 599-601 ◽  
pp. 814-818 ◽  
Author(s):  
Xue Yuan Chen ◽  
Xia Fu Lv ◽  
Jie Liu

Gaussian Mixture Model is a popular method to detect moving targets for static cameras. Since the traditional Gaussian Mixture Model has a poor adaptability when the illumination is changing in the scene and has passive learning rate, this paper describes a method that can detect illumination variation and update the learning rate adaptively. It proposes an approach which uses the color histogram matching algorithm and adjusts the learning rate automatically after introducing illumination variation factor and model parameters. Furthermore, the proposed method can select the number of describing model component adaptively, so this method reduced the computation complexity and improved the real-time performance. The experiment results indicate that the detection system gets better robustness, adaptability and stability.


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Yupeng Li ◽  
Jianhua Zhang ◽  
Ruisi He ◽  
Lei Tian ◽  
Hewen Wei

In this paper, the Gaussian mixture model (GMM) is introduced to the channel multipath clustering. In the GMM field, the expectation-maximization (EM) algorithm is usually utilized to estimate the model parameters. However, the EM widely converges into local optimization. To address this issue, a hybrid differential evolution (DE) and EM (DE-EM) algorithms are proposed in this paper. To be specific, the DE is employed to initialize the GMM parameters. Then, the parameters are estimated with the EM algorithm. Thanks to the global searching ability of DE, the proposed hybrid DE-EM algorithm is more likely to obtain the global optimization. Simulations demonstrate that our proposed DE-EM clustering algorithm can significantly improve the clustering performance.


The most of the existing LID systems based on the Gaussian Mixture model. The main requirement of the GMM based LID system is it require large amount of speech data to train the GMM model. Most of the Indian languages have the similarity because they are derived from Devanagari. Even though common phonemes exists in phoneme sets across the Indian languages, each language contain its unique phonotactic constraints imposed by the language. Any modeling technique capable of capturing all these slight variations imposed by the language is one of the important language identification cue. To model the GMM based LID system which captures above variations it require large number of mixture components.To model the large number of mixture components using Gaussian Mixture Model (GMM), the technique requires a large number of training data for each language class, which is very difficult to get for Indian languages. The main objective of GMM-UBM based LID system is it require less amount of training data to train(model) the system. In this paper, the importance of GMM-UBM modeling for language identification (LID) task for Indian languages are explored using new set of feature vectors. In GMM-UBM LID system based on the new feature vectors, the phonotactic variations imparted by different Indian languages are modeled using Gaussian Mixture model and Universal Background Model (GMM-UBM) technique. In this type of modeling, some amount of data from each class of language is pooled to create a universal background model. From this UBM model each model class is adapted. In this study, it is found that the performance of new feature vectors GMM-UBM based LID system is superior when compared to conventional new feature vectors based GMM LID system.


2021 ◽  
Author(s):  
Hua Yuan

The objective of this thesis is to acquire abstract image features through statistical modelling in the wavelet domain and then based on the extracted image features, develop an effective content-based image retreival (CBIR) system and a fragile watermarking scheme. In this thesis, we first present a statistical modelling of images in the wavelet domain through a Gaussian mixture model (GMM) and a generalized Gaussian mixture model (GGMM). An Expectation Maximization (EM) algorithm is developed to help estimate the model parameters. A novel similarity measure based on the Kullback-Leibler divergence is also developed to calculate the distance of two distinct model distributions. We then apply the statistical modelling to two application areas: image retrieval and fragile watermarking. In image retrieval, the model parameters are employed as image features to compose the indexing feature space, while the feature distance of two compared images is computed using the novel similarity measure. The new image retrieval method has a better retrieval performance than most conventional methods. In fragile watermarking, the model parameters are utilized for the watermark embedding. The new watermarking scheme achieves a virtually imperceptible embedding of watermarks because it modifies only a few image data and embeds watermarks at image texture edges. A multiscale embedding of fragile watermarks is given to enhance the embeddability rate and on the other hand, to constitute a semi-fragile approach.


SPE Journal ◽  
2021 ◽  
pp. 1-20
Author(s):  
Guohua Gao ◽  
Jeroen Vink ◽  
Fredrik Saaf ◽  
Terence Wells

Summary When formulating history matching within the Bayesian framework, we may quantify the uncertainty of model parameters and production forecasts using conditional realizations sampled from the posterior probability density function (PDF). It is quite challenging to sample such a posterior PDF. Some methods [e.g., Markov chain Monte Carlo (MCMC)] are very expensive, whereas other methods are cheaper but may generate biased samples. In this paper, we propose an unconstrained Gaussian mixture model (GMM) fitting method to approximate the posterior PDF and investigate new strategies to further enhance its performance. To reduce the central processing unit (CPU) time of handling bound constraints, we reformulate the GMM fitting formulation such that an unconstrained optimization algorithm can be applied to find the optimal solution of unknown GMM parameters. To obtain a sufficiently accurate GMM approximation with the lowest number of Gaussian components, we generate random initial guesses, remove components with very small or very large mixture weights after each GMM fitting iteration, and prevent their reappearance using a dedicated filter. To prevent overfitting, we add a new Gaussian component only if the quality of the GMM approximation on a (large) set of blind-test data sufficiently improves. The unconstrained GMM fitting method with the new strategies proposed in this paper is validated using nonlinear toy problems and then applied to a synthetic history-matching example. It can construct a GMM approximation of the posterior PDF that is comparable to the MCMC method, and it is significantly more efficient than the constrained GMM fitting formulation (e.g., reducing the CPU time by a factor of 800 to 7,300 for problems we tested), which makes it quite attractive for large-scalehistory-matchingproblems. NOTE: This paper is published as part of the 2021 SPE Reservoir Simulation Special Issue.


Sign in / Sign up

Export Citation Format

Share Document