gaussian mixture
Recently Published Documents





2022 ◽  
Vol 27 ◽  
pp. 70-93
John Patrick Fitzsimmons ◽  
Ruodan Lu ◽  
Ying Hong ◽  
Ioannis Brilakis

The UK commissions about £100 billion in infrastructure construction works every year. More than 50% of them finish later than planned, causing damage to the interests of stakeholders. The estimation of time-risk on construction projects is currently done subjectively, largely by experience despite there are many existing techniques available to analyse risk on the construction schedules. Unlike conventional methods that tend to depend on the accurate estimation of risk boundaries for each task, this research aims to proposes a hybrid method to assist planners in undertaking risk analysis using baseline schedules with improved accuracy. The proposed method is endowed with machine intelligence and is trained using a database of 293,263 tasks from a diverse sample of 302 completed infrastructure construction projects in the UK. It combines a Gaussian Mixture Modelling-based Empirical Bayesian Network and a Support Vector Machine followed by performing a Monte Carlo risk simulation. The former is used to investigate the uncertainty, correlated risk factors, and predict task duration deviations while the latter is used to return a time-risk simulated prediction. This study randomly selected 10 projects as case studies followed by comparing their results of the proposed hybrid method with Monte Carlo Simulation. Results indicated 54.4% more accurate prediction on project delays.

2022 ◽  
Vol 23 (1) ◽  
Lucille Lopez-Delisle ◽  
Jean-Baptiste Delisle

Abstract Background The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, sparsity of the data can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. Results We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling, i.e., Poisson noise. We present a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell data, which infers the intrinsic expression distribution in scRNA-seq data using a Gaussian mixture model. baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes’ expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution. Conclusion baredSC is a powerful tool which aims at retrieving the expression distribution of few genes of interest from scRNA-seq data.

2022 ◽  
Francesca Azzolini ◽  
Geir Berentsen ◽  
Hans Skaug ◽  
Jacob Hjelmborg ◽  
Jaakko Kaprio

The heritability of traits such as body mass index (BMI), a measure of obesity, is generally estimated using family, twin, and increasingly by molecular genetic approaches. These studies generally assume that genetic effects are uniform across all trait values, yet there is emerging evidence that this may not always be the case. This paper analyzes twin data using a recently developed measure of heritability called the heritability curve. Under the assumption that trait values in twin pairs are governed by a flexible Gaussian mixture distribution, heritability curves may vary across trait values. The data consist of repeated measures of BMI on 1506 monozygotic (MZ) and 2843 like-sexed dizygotic (DZ) adult twin pairs, gathered from multiple surveys in older Finnish Twin Cohorts. The heritability curve and BMI value-specific MZ and DZ pairwise correlations were estimated, and these varied across the range of BMI. MZ correlations were highest at BMI values from 21 to 24, with a stronger decrease for women than for men at higher values. Models with additive and dominance effects fit best at low and high BMI values, while models with additive genetic and common environmental effects fit best in the normal range of BMI. Thus, we demonstrate that twin and molecular genetic studies need to consider how genetic effects vary across trait values. Such variation may reconcile findings of traits with high heritabilities and major differences in mean values between countries or over time.

Ntogwa N. Bundala

This paper examined the hidden demographic barriers of economic growth. The study used a cross-sectional survey researches design. The primary data were collected by using a psychometric scale from 211 individuals who were randomly sampled from the Mwanza and Kagera regions in Tanzania. The data were linearly analysed by the weighted least squares (WLS) and Analysis weighted- automatic linear modelling (AW-ALM), and non-linearly analysed by Gaussian mixture model (GMM) and neural network analysis (NNA). The study found that the main hidden demographic barrier to economic growth is the negative subjective well-being of an individual’s current age and education level. Moreover, the GMM revealed that there is no significant data or regional clusters or classes in the study population. Furthermore, NNA evidenced the most effective predictor of economic growth is age, followed by education. The study concluded that the most hidden demographic factors that hinder economic growth are negative perceptions of an individual on his/her current age and level of education, not the age maturity, and education level. Operationally or practically, the paper implicates several socio-economical policies, mostly the national aging policy (NAP), the National Education and Training policy (NETP), the National Employment Policy (NEP), and regulations /laws on national social security funds schemes at national, regional and global levels. Therefore, the paper recommended that government and other education stakeholders increase the policy commitment on the mathematics, science, and technology subjects to be compulsory for primary and secondary schools, and the extension of the retirement age from 60 years (voluntary) to 65 years (compulsory)

Stephen Burns Menary ◽  
Darren David Price

Abstract We show that density models describing multiple observables with (i) hard boundaries and (ii) dependence on external parameters may be created using an auto-regressive Gaussian mixture model. The model is designed to capture how observable spectra are deformed by hypothesis variations, and is made more expressive by projecting data onto a configurable latent space. It may be used as a statistical model for scientific discovery in interpreting experimental observations, for example when constraining the parameters of a physical model or tuning simulation parameters according to calibration data. The model may also be sampled for use within a Monte Carlo simulation chain, or used to estimate likelihood ratios for event classification. The method is demonstrated on simulated high-energy particle physics data considering the anomalous electroweak production of a $Z$ boson in association with a dijet system at the Large Hadron Collider, and the accuracy of inference is tested using a realistic toy example. The developed methods are domain agnostic; they may be used within any field to perform simulation or inference where a dataset consisting of many real-valued observables has conditional dependence on external parameters.

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 525
Ran Duan ◽  
Jie Liu ◽  
Jianzhong Zhou ◽  
Pei Wang ◽  
Wei Liu

The prognostic is the key to the state-based maintenance of Francis turbine units (FTUs), which consists of performance state evaluation and degradation trend prediction. In practical engineering environments, there are three significant difficulties: low data quality, complex variable operation conditions, and prediction model parameter optimization. In order to effectively solve the above three problems, an ensemble prognostic method of FTUs using low-quality data under variable operation conditions is proposed in this study. Firstly, to consider the operation condition parameters, the running data set of the FTU is constructed by the water head, active power, and vibration amplitude of the top cover. Then, to improve the robustness of the proposed model against anomaly data, the density-based spatial clustering of applications with noise (DBSCAN) is introduced to clean outliers and singularities in the raw running data set. Next, considering the randomness of the monitoring data, the healthy state model based on the Gaussian mixture model is constructed, and the negative log-likelihood probability is calculated as the performance degradation indicator (PDI). Furthermore, to predict the trend of PDIs with confidence interval and automatically optimize the prediction model on both accuracy and certainty, the multiobjective prediction model is proposed based on the non-dominated sorting genetic algorithm and Gaussian process regression. Finally, monitoring data from an actual large FTU was used for effectiveness verification. The stability and smoothness of the PDI curve are improved by 3.2 times and 1.9 times, respectively, by DBSCAN compared with 3-sigma. The root-mean-squared error, the prediction interval normalized average, the prediction interval coverage probability, the mean absolute percentage error, and the R2 score of the proposed method achieved 0.223, 0.289, 1.000, 0.641%, and 0.974, respectively. The comparison experiments demonstrate that the proposed method is more robust to low-quality data and has better accuracy, certainty, and reliability for the prognostic of the FTU under complex operating conditions.

2022 ◽  
Vol 12 ◽  
Zhuang Xiong ◽  
Mengwei Li ◽  
Yingke Ma ◽  
Rujiao Li ◽  
Yiming Bao

The Illumina HumanMethylation BeadChip is one of the most cost-effective methods to quantify DNA methylation levels at single-base resolution across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, providing great support for data integration and further analysis. However, the majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here, we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probe bias in the HumanMethylation BeadChip. Availability and implementation:

Sign in / Sign up

Export Citation Format

Share Document