scholarly journals Training Image Free High-Order Stochastic Simulation Based on Aggregated Kernel Statistics

Author(s):  
Lingqing Yao ◽  
Roussos Dimitrakopoulos ◽  
Michel Gamache

AbstractA training image free, high-order sequential simulation method is proposed herein, which is based on the efficient inference of high-order spatial statistics from the available sample data. A statistical learning framework in kernel space is adopted to develop the proposed simulation method. Specifically, a new concept of aggregated kernel statistics is proposed to enable sparse data learning. The conditioning data in the proposed high-order sequential simulation method appear as data events corresponding to the attribute values associated with the so-called spatial templates of various geometric configurations. The replicates of the data events act as the training data in the learning framework for inference of the conditional probability distribution and generation of simulated values. These replicates are mapped into spatial Legendre moment kernel spaces, and the kernel statistics are computed thereafter, encapsulating the high-order spatial statistics from the available data. To utilize the incomplete information from the replicates, which partially match the spatial template of a given data event, the aggregated kernel statistics combine the ensemble of the elements in different kernel subspaces for statistical inference, embedding the high-order spatial statistics of the replicates associated with various spatial templates into the same kernel subspace. The aggregated kernel statistics are incorporated into a learning algorithm to obtain the target probability distribution in the underlying random field, while preserving in the simulations the high-order spatial statistics from the available data. The proposed method is tested using a synthetic dataset, showing the reproduction of the high-order spatial statistics of the sample data. The comparison with the corresponding high-order simulation method using TIs emphasizes the generalization capacity of the proposed method for sparse data learning.

2019 ◽  
Vol 52 (5) ◽  
pp. 693-723
Author(s):  
Lingqing Yao ◽  
Roussos Dimitrakopoulos ◽  
Michel Gamache

AbstractThe present work proposes a new high-order simulation framework based on statistical learning. The training data consist of the sample data together with a training image, and the learning target is the underlying random field model of spatial attributes of interest. The learning process attempts to find a model with expected high-order spatial statistics that coincide with those observed in the available data, while the learning problem is approached within the statistical learning framework in a reproducing kernel Hilbert space (RKHS). More specifically, the required RKHS is constructed via a spatial Legendre moment (SLM) reproducing kernel that systematically incorporates the high-order spatial statistics. The target distributions of the random field are mapped into the SLM-RKHS to start the learning process, where solutions of the random field model amount to solving a quadratic programming problem. Case studies with a known data set in different initial settings show that sequential simulation under the new framework reproduces the high-order spatial statistics of the available data and resolves the potential conflicts between the training image and the sample data. This is due to the characteristics of the spatial Legendre moment kernel and the generalization capability of the proposed statistical learning framework. A three-dimensional case study at a gold deposit shows practical aspects of the proposed method in real-life applications.


Author(s):  
Ilnur Minniakhmetov ◽  
Roussos Dimitrakopoulos

AbstractModern approaches for the spatial simulation of categorical variables are largely based on multi-point statistical methods, where a training image is used to derive complex spatial relationships using relevant patterns. In these approaches, simulated realizations are driven by the training image utilized, while the spatial statistics of the actual sample data are ignored. This paper presents a data-driven, high-order simulation approach based on the approximation of high-order spatial indicator moments. The high-order spatial statistics are expressed as functions of spatial distances that are similar to variogram models for two-point methods, while higher-order statistics are connected with lower-orders via boundary conditions. Using an advanced recursive B-spline approximation algorithm, the high-order statistics are reconstructed from the available data and are subsequently used for the construction of conditional distributions using Bayes’ rule. Random values are subsequently simulated for all unsampled grid nodes. The main advantages of the proposed technique are its ability to (a) simulate without a training image to reproduce the high-order statistics of the data, and (b) adapt the model’s complexity to the information available in the data. The practical intricacies and effectiveness of the proposed approach are demonstrated through applications at two copper deposits.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Justin Y. Lee ◽  
Britney Nguyen ◽  
Carlos Orosco ◽  
Mark P. Styczynski

Abstract Background The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms—two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR. Results We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6–27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification. Conclusions SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.


Author(s):  
Lekha Patel ◽  
David Williamson ◽  
Dylan M Owen ◽  
Edward A K Cohen

Abstract Motivation Many recent advancements in single-molecule localization microscopy exploit the stochastic photoswitching of fluorophores to reveal complex cellular structures beyond the classical diffraction limit. However, this same stochasticity makes counting the number of molecules to high precision extremely challenging, preventing key insight into the cellular structures and processes under observation. Results Modelling the photoswitching behaviour of a fluorophore as an unobserved continuous time Markov process transitioning between a single fluorescent and multiple dark states, and fully mitigating for missed blinks and false positives, we present a method for computing the exact probability distribution for the number of observed localizations from a single photoswitching fluorophore. This is then extended to provide the probability distribution for the number of localizations in a direct stochastic optical reconstruction microscopy experiment involving an arbitrary number of molecules. We demonstrate that when training data are available to estimate photoswitching rates, the unknown number of molecules can be accurately recovered from the posterior mode of the number of molecules given the number of localizations. Finally, we demonstrate the method on experimental data by quantifying the number of adapter protein linker for activation of T cells on the cell surface of the T-cell immunological synapse. Availability and implementation Software and data available at https://github.com/lp1611/mol_count_dstorm. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-24
Author(s):  
Yaojin Lin ◽  
Qinghua Hu ◽  
Jinghua Liu ◽  
Xingquan Zhu ◽  
Xindong Wu

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.


Author(s):  
Peilian Zhao ◽  
Cunli Mao ◽  
Zhengtao Yu

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.


1988 ◽  
Vol 4 (03) ◽  
pp. 155-168
Author(s):  
R.L. Storch ◽  
P.J. Giesy

In the modular construction of ships, significant productivity losses can occur during the erection stage, when the modules, or hull blocks, are joined together. Frequently, adjacent blocks do not fit together properly, and rework of one or both of the mating block interfaces is necessary to correct the problem. The specific cause of rework is the variation of plate edges at the block interface, which is itself a cumulative product of numerous manufacturing variations inherent in hull block construction. Variation in manufacturing is unavoidable, but not uncontrollable. The application of accuracy control techniques in shipbuilding has proven that a statistical analysis of variation makes possible an accurate prediction of its effects. This paper presents an examination of block interface variation, and the subsequent development of a computer simulation method of predicting rework levels on those blocks. The complex interaction of all the edges' random variations at the block interface gives rise to a unique rework probability distribution. This probability distribution is evaluated by means of the computer simulation program, which provides estimates of the average rework anticipated, the shape of the probability curve, and other parameters. Similar predictions are also available for cost and labor of required rework. In addition to predicting rework levels, the simulation program can be a useful tool for reducing those levels.


Author(s):  
Hongzuo Xu ◽  
Yongjun Wang ◽  
Zhiyue Wu ◽  
Yijie Wang

Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Lihong Huang ◽  
Canqiang Xu ◽  
Wenxian Yang ◽  
Rongshan Yu

Abstract Background Studies on metagenomic data of environmental microbial samples found that microbial communities seem to be geolocation-specific, and the microbiome abundance profile can be a differentiating feature to identify samples’ geolocations. In this paper, we present a machine learning framework to determine the geolocations from metagenomics profiling of microbial samples. Results Our method was applied to the multi-source microbiome data from MetaSUB (The Metagenomics and Metadesign of Subways and Urban Biomes) International Consortium for the CAMDA 2019 Metagenomic Forensics Challenge (the Challenge). The goal of the Challenge is to predict the geographical origins of mystery samples by constructing microbiome fingerprints.First, we extracted features from metagenomic abundance profiles. We then randomly split the training data into training and validation sets and trained the prediction models on the training set. Prediction performance was evaluated on the validation set. By using logistic regression with L2 normalization, the prediction accuracy of the model reaches 86%, averaged over 100 random splits of training and validation datasets.The testing data consists of samples from cities that do not occur in the training data. To predict the “mystery” cities that are not sampled before for the testing data, we first defined biological coordinates for sampled cities based on the similarity of microbial samples from them. Then we performed affine transform on the map such that the distance between cities measures their biological difference rather than geographical distance. After that, we derived the probabilities of a given testing sample from unsampled cities based on its predicted probabilities on sampled cities using Kriging interpolation. Results show that this method can successfully assign high probabilities to the true cities-of-origin of testing samples. Conclusion Our framework shows good performance in predicting the geographic origin of metagenomic samples for cities where training data are available. Furthermore, we demonstrate the potential of the proposed method to predict metagenomic samples’ geolocations for samples from locations that are not in the training dataset.


2020 ◽  
Vol 8 ◽  
Author(s):  
Adil Khadidos ◽  
Alaa O. Khadidos ◽  
Srihari Kannan ◽  
Yuvaraj Natarajan ◽  
Sachi Nandan Mohanty ◽  
...  

In this paper, a data mining model on a hybrid deep learning framework is designed to diagnose the medical conditions of patients infected with the coronavirus disease 2019 (COVID-19) virus. The hybrid deep learning model is designed as a combination of convolutional neural network (CNN) and recurrent neural network (RNN) and named as DeepSense method. It is designed as a series of layers to extract and classify the related features of COVID-19 infections from the lungs. The computerized tomography image is used as an input data, and hence, the classifier is designed to ease the process of classification on learning the multidimensional input data using the Expert Hidden layers. The validation of the model is conducted against the medical image datasets to predict the infections using deep learning classifiers. The results show that the DeepSense classifier offers accuracy in an improved manner than the conventional deep and machine learning classifiers. The proposed method is validated against three different datasets, where the training data are compared with 70%, 80%, and 90% training data. It specifically provides the quality of the diagnostic method adopted for the prediction of COVID-19 infections in a patient.


Sign in / Sign up

Export Citation Format

Share Document