scholarly journals Enhanced cancer subtyping via pan-transcriptomics data fusion, Monte-Carlo consensus clustering, and auto classifier creation

Author(s):  
Kristofer Linton-Reid ◽  
Harry Clifford ◽  
Joe Sneath Thompson
Author(s):  
Kristofer Linton-Reid ◽  
Harry Clifford ◽  
Joe Sneath Thompson

ABSTRACTSubtyping of tumor transcriptome expression profiles is a routine method used to distinguish tumor heterogeneity. Unsupervised clustering techniques are often combined with survival analysis to decipher the relationship between genes and the survival times of patients. However, the reproducibility of these subtyping based studies is poor. There are multiple reports which have conflicting subtype and gene-survival time relationship results. In this study, we introduce the issues underlying the lack of reproducibility in transcriptomic subtyping studies. This problem arises from the routine analysis of small cohorts (< 100 individuals) and use of biased traditional consensus clustering techniques. Our approach carefully combines multiple RNA-sequencing and microarray datasets, followed by subtyping via Monte-Carlo Consensus Clustering and creation of deep subtyping classifiers. This paper demonstrates an improved subtyping methodology by investigating pancreatic ductal adenocarcinoma. Importantly, our methodology identifies six biologically novel pancreatic ductal adenocarcinoma subtypes. Our approach also enables a degree of reproducibility, via our pancreatic ductal adenocarcinoma classifier PDACNet, which classical subtyping studies have failed to establish.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1427
Author(s):  
Zhihui Cai ◽  
Zhangmin Jin ◽  
Linyi Zhu ◽  
Yuebing Li ◽  
Yuebao Lei ◽  
...  

Ultrasonic testing is a useful approach for quantifying the flaws in mechanical components. The height of the flaws in ultrasonic angle beam testing is closely related to the calibration value of the probe refraction angle. In order to reduce the calibration error, some ignored data during the traditional calibration process are reanalyzed and fused to determine the refraction angle. Both arithmetical measurement fusion method and weighted measurement fusion method are applied and compared. Monte Carlo simulation is used to estimate the probability distribution of the refraction angle and obtain the optimal refraction angle weights. Experiments were carried out to verify the results of Monte Carlo simulation. The applicability of data fusion on refraction angles is investigated. It was found in the study that the data fused with the refraction angle is helpful for measuring the height of flaws.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Christopher R. John ◽  
David Watson ◽  
Dominic Russ ◽  
Katriona Goldmann ◽  
Michael Ehrenstein ◽  
...  

2019 ◽  
Vol 8 (4) ◽  
pp. 2751-2756

Clustering involves the grouping of similar objects into a set known as cluster. Objects in one cluster are likely to be different when compared to objects grouped under another cluster. Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. Subgroup classification is a basic task in high-throughput genomic data analysis, especially for gene expression and methylation data analysis. Mostly, unsupervised clustering methods are applied to predict new subgroups or test the consistency with known annotations. To get a stable classification of subgroups, consensus clustering is always performed. It clusters repeatedly with a randomly sampled subset of data and summarizes the robustness of the clustering. When faced with significant uncertainty in the process of making a forecast or estimation, the Monte Carlo Simulation might prove to be a better solution. Monte Carlo3C is a consensus clustering algorithm that uses a Monte Carlo simulation to eliminate overfitting and can reject the null hypothesis when only one cluster is there.


2018 ◽  
Author(s):  
Christopher R. John ◽  
David Watson ◽  
Dominic Russ ◽  
Katriona Goldmann ◽  
Michael Ehrenstein ◽  
...  

AbstractGenome-wide data is used to stratify patients into classes for precision medicine using clustering algorithms. A common problem in this area is selection of the number of clusters (K). The Monti consensus clustering algorithm is a widely used method which uses stability selection to estimate K. However, the method has bias towards higher values of K and yields high numbers of false positives. As a solution, we developed Monte Carlo reference-based consensus clustering (M3C), which is based on this algorithm. M3C simulates null distributions of stability scores for a range of K values thus enabling a comparison with real data to remove bias and statistically test for the presence of structure. M3C corrects the inherent bias of consensus clustering as demonstrated on simulated and real expression data from The Cancer Genome Atlas (TCGA). For testing M3C, we developed clusterlab, a new method for simulating multivariate Gaussian clusters.


2021 ◽  
Author(s):  
Fernando Otero

This article analyzes the performance of combining information from Scanning Electron Microscopy(SEM) micrographs with Static Light Scattering (SLS) measurements for retrieving the so-called ParticleSize Distribution (PSD) in terms of experimental features. The corresponding data fusion is implementedusing a novel Monte Carlo-based method consisting in a SMF (Sampling-Mapping-Filtering) approach.This approach provides an important reference to assess the strategy of the experiment for this specificproblem by means of solving an inverse problem. Furthermore, low levels of volume fraction and a PSDrepresented by log-normal distributions are considered in order to reduce processing and model errors dueto ill-posedness. The prior statistics corresponding to the SEM micrographs have been achieved by meansof the Jackknife procedure used as a resampling technique. The likelihood term considers iid normalmeasurements generated from the Local Monodisperse Approximation (LMA) and also makes use of thesame model as forward linear model, in an inversion case known as inverse crime. However, it has beenproved that the LMA performs well in practice for low fraction volume systems as considered here. ThePSD retrieval is measured in terms of improvement in precision with respect to one of the log-normalparameters in SEM micrographs, i.e., the desirability. Estimates are expressed as a function of a typicalsystem parameter such as polydispersity, as well as experimental variables, i.e., number of particles permicrograph (PPM) and noise level ε in the SLS measurements. These estimations are then analyzed bymeans of the Box-Behnken (BB) design and the response surface methodology (RSM) in order to generatea surrogate model from which rules for the optimization of the experiment are made when desirability ismaximized. Finally, a Rule-Based System (RBS) is proposed for future use.


1974 ◽  
Vol 22 ◽  
pp. 307 ◽  
Author(s):  
Zdenek Sekanina

AbstractIt is suggested that the outbursts of Periodic Comet Schwassmann-Wachmann 1 are triggered by impacts of interplanetary boulders on the surface of the comet’s nucleus. The existence of a cloud of such boulders in interplanetary space was predicted by Harwit (1967). We have used the hypothesis to calculate the characteristics of the outbursts – such as their mean rate, optically important dimensions of ejected debris, expansion velocity of the ejecta, maximum diameter of the expanding cloud before it fades out, and the magnitude of the accompanying orbital impulse – and found them reasonably consistent with observations, if the solid constituent of the comet is assumed in the form of a porous matrix of lowstrength meteoric material. A Monte Carlo method was applied to simulate the distributions of impacts, their directions and impact velocities.


Sign in / Sign up

Export Citation Format

Share Document