M3C: Monte Carlo reference-based consensus clustering

Clustering involves the grouping of similar objects into a set known as cluster. Objects in one cluster are likely to be different when compared to objects grouped under another cluster. Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. Subgroup classification is a basic task in high-throughput genomic data analysis, especially for gene expression and methylation data analysis. Mostly, unsupervised clustering methods are applied to predict new subgroups or test the consistency with known annotations. To get a stable classification of subgroups, consensus clustering is always performed. It clusters repeatedly with a randomly sampled subset of data and summarizes the robustness of the clustering. When faced with significant uncertainty in the process of making a forecast or estimation, the Monte Carlo Simulation might prove to be a better solution. Monte Carlo3C is a consensus clustering algorithm that uses a Monte Carlo simulation to eliminate overfitting and can reject the null hypothesis when only one cluster is there.

Download Full-text

Enhanced Cancer Subtyping via Pan-Transcriptomics Data Fusion, Monte-Carlo Consensus Clustering, and Auto Classifier Creation

10.1101/2019.12.16.870188 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kristofer Linton-Reid ◽

Harry Clifford ◽

Joe Sneath Thompson

Keyword(s):

Monte Carlo ◽

Pancreatic Ductal Adenocarcinoma ◽

Expression Profiles ◽

Ductal Adenocarcinoma ◽

Consensus Clustering ◽

Survival Times ◽

Clustering Techniques ◽

Transcriptomics Data ◽

Transcriptome Expression ◽

Microarray Datasets

ABSTRACTSubtyping of tumor transcriptome expression profiles is a routine method used to distinguish tumor heterogeneity. Unsupervised clustering techniques are often combined with survival analysis to decipher the relationship between genes and the survival times of patients. However, the reproducibility of these subtyping based studies is poor. There are multiple reports which have conflicting subtype and gene-survival time relationship results. In this study, we introduce the issues underlying the lack of reproducibility in transcriptomic subtyping studies. This problem arises from the routine analysis of small cohorts (< 100 individuals) and use of biased traditional consensus clustering techniques. Our approach carefully combines multiple RNA-sequencing and microarray datasets, followed by subtyping via Monte-Carlo Consensus Clustering and creation of deep subtyping classifiers. This paper demonstrates an improved subtyping methodology by investigating pancreatic ductal adenocarcinoma. Importantly, our methodology identifies six biologically novel pancreatic ductal adenocarcinoma subtypes. Our approach also enables a degree of reproducibility, via our pancreatic ductal adenocarcinoma classifier PDACNet, which classical subtyping studies have failed to establish.

Download Full-text

M3C: Monte Carlo reference-based consensus clustering

10.1101/377002 ◽

2018 ◽

Cited By ~ 4

Author(s):

Christopher R. John ◽

David Watson ◽

Dominic Russ ◽

Katriona Goldmann ◽

Michael Ehrenstein ◽

...

Keyword(s):

Monte Carlo ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

The Cancer Genome Atlas ◽

Consensus Clustering ◽

Null Distributions ◽

Genome Wide ◽

Genome Wide Data ◽

Cancer Genome Atlas

AbstractGenome-wide data is used to stratify patients into classes for precision medicine using clustering algorithms. A common problem in this area is selection of the number of clusters (K). The Monti consensus clustering algorithm is a widely used method which uses stability selection to estimate K. However, the method has bias towards higher values of K and yields high numbers of false positives. As a solution, we developed Monte Carlo reference-based consensus clustering (M3C), which is based on this algorithm. M3C simulates null distributions of stability scores for a range of K values thus enabling a comparison with real data to remove bias and statistically test for the presence of structure. M3C corrects the inherent bias of consensus clustering as demonstrated on simulated and real expression data from The Cancer Genome Atlas (TCGA). For testing M3C, we developed clusterlab, a new method for simulating multivariate Gaussian clusters.

Download Full-text

Outbursts of comet Schwassmann-Wachmann 1, and the cloud of interplanetary boulders

International Astronomical Union Colloquium ◽

10.1017/s0252921100034035 ◽

1974 ◽

Vol 22 ◽

pp. 307 ◽

Cited By ~ 3

Author(s):

Zdenek Sekanina

Keyword(s):

Monte Carlo ◽

Monte Carlo Method ◽

Interplanetary Space ◽

Porous Matrix ◽

Maximum Diameter ◽

Expansion Velocity ◽

Solid Constituent ◽

Meteoric Material ◽

Periodic Comet

AbstractIt is suggested that the outbursts of Periodic Comet Schwassmann-Wachmann 1 are triggered by impacts of interplanetary boulders on the surface of the comet’s nucleus. The existence of a cloud of such boulders in interplanetary space was predicted by Harwit (1967). We have used the hypothesis to calculate the characteristics of the outbursts – such as their mean rate, optically important dimensions of ejected debris, expansion velocity of the ejecta, maximum diameter of the expanding cloud before it fades out, and the magnitude of the accompanying orbital impulse – and found them reasonably consistent with observations, if the solid constituent of the comet is assumed in the form of a porous matrix of lowstrength meteoric material. A Monte Carlo method was applied to simulate the distributions of impacts, their directions and impact velocities.

Download Full-text

Monte Carlo Algorithms for Moments of Transition Arrays

International Astronomical Union Colloquium ◽

10.1017/s0252921100107468 ◽

1988 ◽

Vol 102 ◽

pp. 79-81

Author(s):

A. Goldberg ◽

S.D. Bloom

Keyword(s):

Monte Carlo ◽

State Vector ◽

Basis Vector ◽

Collective State ◽

Dispersion Characteristics ◽

Dipole Strength ◽

The Third ◽

Monte Carlo Algorithms ◽

Third Moment ◽

Very High

AbstractClosed expressions for the first, second, and (in some cases) the third moment of atomic transition arrays now exist. Recently a method has been developed for getting to very high moments (up to the 12th and beyond) in cases where a “collective” state-vector (i.e. a state-vector containing the entire electric dipole strength) can be created from each eigenstate in the parent configuration. Both of these approaches give exact results. Herein we describe astatistical(or Monte Carlo) approach which requires onlyonerepresentative state-vector |RV> for the entire parent manifold to get estimates of transition moments of high order. The representation is achieved through the random amplitudes associated with each basis vector making up |RV>. This also gives rise to the dispersion characterizing the method, which has been applied to a system (in the M shell) with≈250,000 lines where we have calculated up to the 5th moment. It turns out that the dispersion in the moments decreases with the size of the manifold, making its application to very big systems statistically advantageous. A discussion of the method and these dispersion characteristics will be presented.

Download Full-text

Electron Scattering in Solids

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100133618 ◽

1990 ◽

Vol 48 (2) ◽

pp. 4-5

Author(s):

Ryuichi Shimizu ◽

Ze-Jun Ding

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Angular Distribution ◽

Elastic Scattering ◽

Electron Scattering ◽

Cross Sections ◽

Expansion Method ◽

Electron Excitation ◽

Partial Wave Expansion ◽

Backscattered Electrons

Monte Carlo simulation has been becoming most powerful tool to describe the electron scattering in solids, leading to more comprehensive understanding of the complicated mechanism of generation of various types of signals for microbeam analysis.The present paper proposes a practical model for the Monte Carlo simulation of scattering processes of a penetrating electron and the generation of the slow secondaries in solids. The model is based on the combined use of Gryzinski’s inner-shell electron excitation function and the dielectric function for taking into account the valence electron contribution in inelastic scattering processes, while the cross-sections derived by partial wave expansion method are used for describing elastic scattering processes. An improvement of the use of this elastic scattering cross-section can be seen in the success to describe the anisotropy of angular distribution of elastically backscattered electrons from Au in low energy region, shown in Fig.l. Fig.l(a) shows the elastic cross-sections of 600 eV electron for single Au-atom, clearly indicating that the angular distribution is no more smooth as expected from Rutherford scattering formula, but has the socalled lobes appearing at the large scattering angle.

Download Full-text

Determination of Stoichiometry of GaAs Component in (Gaas)Ge Epitaxially Grown Thin Films by Electron Microprobe X-Ray Analysis

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100134922 ◽

1990 ◽

Vol 48 (2) ◽

pp. 264-265

Author(s):

D. R. Liu ◽

S. S. Shinozaki ◽

R. J. Baird

Keyword(s):

Thin Film ◽

Thin Films ◽

Monte Carlo Simulation ◽

Monte Carlo ◽

Compound Semiconductors ◽

Energy Dispersive Analysis ◽

X Ray ◽

Metastable Alloys ◽

Lower Voltage

The epitaxially grown (GaAs)Ge thin film has been arousing much interest because it is one of metastable alloys of III-V compound semiconductors with germanium and a possible candidate in optoelectronic applications. It is important to be able to accurately determine the composition of the film, particularly whether or not the GaAs component is in stoichiometry, but x-ray energy dispersive analysis (EDS) cannot meet this need. The thickness of the film is usually about 0.5-1.5 μm. If Kα peaks are used for quantification, the accelerating voltage must be more than 10 kV in order for these peaks to be excited. Under this voltage, the generation depth of x-ray photons approaches 1 μm, as evidenced by a Monte Carlo simulation and actual x-ray intensity measurement as discussed below. If a lower voltage is used to reduce the generation depth, their L peaks have to be used. But these L peaks actually are merged as one big hump simply because the atomic numbers of these three elements are relatively small and close together, and the EDS energy resolution is limited.

Download Full-text

Structures and crystallization of vacuum-deposited amorphous Se and Sb films

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100173832 ◽

1990 ◽

Vol 48 (4) ◽

pp. 140-141

Author(s):

Makoto Shiojiri ◽

Toshiyuki Isshiki ◽

Tetsuya Fudaba ◽

Yoshihiro Hirota

Keyword(s):

Monte Carlo ◽

Electron Microscope ◽

Monte Carlo Method ◽

Computer Program ◽

Radial Distribution ◽

Film Formation ◽

Amorphous State ◽

Two Dimensional ◽

Amorphous Films ◽

Binding Condition

In hexagonal Se crystal each atom is covalently bound to two others to form an endless spiral chain, and in Sb crystal each atom to three others to form an extended puckered sheet. Such chains and sheets may be regarded as one- and two- dimensional molecules, respectively. In this paper we investigate the structures in amorphous state of these elements and the crystallization.HRTEM and ED images of vacuum-deposited amorphous Se and Sb films were taken with a JEM-200CX electron microscope (Cs=1.2 mm). The structure models of amorphous films were constructed on a computer by Monte Carlo method. Generated atoms were subsequently deposited on a space of 2 nm×2 nm as they fulfiled the binding condition, to form a film 5 nm thick (Fig. 1a-1c). An improvement on a previous computer program has been made as to realize the actual film formation. Radial distribution fuction (RDF) curves, ED intensities and HRTEM images for the constructed structure models were calculated, and compared with the observed ones.

Download Full-text

Low-voltage EDS of magnesium ferrite Dendrites in a FEG-SEM

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100164854 ◽

1996 ◽

Vol 54 ◽

pp. 478-479

Author(s):

Matthew T. Johnson ◽

Ian M. Anderson ◽

Jim Bentley ◽

C. Barry Carter

Keyword(s):

Monte Carlo ◽

Quantitative Analysis ◽

Monte Carlo Simulations ◽

Cross Section ◽

Spatial Resolution ◽

Low Voltage ◽

Magnesium Ferrite ◽

X Ray ◽

Interaction Volume ◽

Ferrite Spinel

Energy-dispersive X-ray spectrometry (EDS) performed at low (≤ 5 kV) accelerating voltages in the SEM has the potential for providing quantitative microanalytical information with a spatial resolution of ∼100 nm. In the present work, EDS analyses were performed on magnesium ferrite spinel [(MgxFe1−x)Fe2O4] dendrites embedded in a MgO matrix, as shown in Fig. 1. spatial resolution of X-ray microanalysis at conventional accelerating voltages is insufficient for the quantitative analysis of these dendrites, which have widths of the order of a few hundred nanometers, without deconvolution of contributions from the MgO matrix. However, Monte Carlo simulations indicate that the interaction volume for MgFe2O4 is ∼150 nm at 3 kV accelerating voltage and therefore sufficient to analyze the dendrites without matrix contributions.Single-crystal {001}-oriented MgO was reacted with hematite (Fe2O3) powder for 6 h at 1450°C in air and furnace cooled. The specimen was then cleaved to expose a clean cross-section suitable for microanalysis.

Download Full-text