Accuracy of mutational signature software on correlated signatures

AbstractMutational signatures are characteristic patterns of mutations generated by exogenous mutagens or by endogenous mutational processes. Mutational signatures are important for research into DNA damage and repair, aging, cancer biology, genetic toxicology, and epidemiology. Unsupervised learning can infer mutational signatures from the somatic mutations in large numbers of tumors, and separating correlated signatures is a notable challenge for this task. To investigate which methods can best meet this challenge, we assessed 18 computational methods for inferring mutational signatures on 20 synthetic data sets that incorporated varying degrees of correlated activity of two common mutational signatures. Performance varied widely, and four methods noticeably outperformed the others: hdp (based on hierarchical Dirichlet processes), SigProExtractor (based on multiple non-negative matrix factorizations over resampled data), TCSM (based on an approach used in document topic analysis), and mutSpec.NMF (also based on non-negative matrix factorization). The results underscored the complexities of mutational signature extraction, including the importance and difficulty of determining the correct number of signatures and the importance of hyperparameters. Our findings indicate directions for improvement of the software and show a need for care when interpreting results from any of these methods, including the need for assessing sensitivity of the results to input parameters.

Download Full-text

Methods for estimating uncertainty in factor analytic solutions

Atmospheric Measurement Techniques ◽

10.5194/amt-7-781-2014 ◽

2014 ◽

Vol 7 (3) ◽

pp. 781-797 ◽

Cited By ~ 174

Author(s):

P. Paatero ◽

S. Eberly ◽

S. G. Brown ◽

G. A. Norris

Keyword(s):

Environmental Protection Agency ◽

Synthetic Data ◽

Analytic Solutions ◽

Data Sets ◽

Random Errors ◽

Data Set ◽

Factor Analytic ◽

Uncertainty Estimates ◽

Multilinear Engine ◽

Analytic Models

Abstract. The EPA PMF (Environmental Protection Agency positive matrix factorization) version 5.0 and the underlying multilinear engine-executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement of factor elements (BS-DISP). The goal of these methods is to capture the uncertainty of PMF analyses due to random errors and rotational ambiguity. It is shown that the three methods complement each other: depending on characteristics of the data set, one method may provide better results than the other two. Results are presented using synthetic data sets, including interpretation of diagnostics, and recommendations are given for parameters to report when documenting uncertainty estimates from EPA PMF or ME-2 applications.

Download Full-text

Reverse time migration

Geophysics ◽

10.1190/1.1441434 ◽

1983 ◽

Vol 48 (11) ◽

pp. 1514-1524 ◽

Cited By ~ 855

Author(s):

Edip Baysal ◽

Dan D. Kosloff ◽

John W. C. Sherwood

Keyword(s):

Wave Amplitude ◽

Synthetic Data ◽

Data Sets ◽

Field Observations ◽

Reverse Time ◽

Reverse Time Migration ◽

Time Migration ◽

Migration Method ◽

Zero Offset ◽

Time Extrapolation

Migration of stacked or zero‐offset sections is based on deriving the wave amplitude in space from wave field observations at the surface. Conventionally this calculation has been carried out through a depth extrapolation. We examine the alternative of carrying out the migration through a reverse time extrapolation. This approach may offer improvements over existing migration methods, especially in cases of steeply dipping structures with strong velocity contrasts. This migration method is tested using appropriate synthetic data sets.

Download Full-text

Quasi-2D inversion of DCR and TDEM data for shallow investigations

Geophysics ◽

10.1190/1.3587218 ◽

2011 ◽

Vol 76 (4) ◽

pp. F239-F250 ◽

Cited By ~ 10

Author(s):

Fernando A. Monteiro Santos ◽

Hesham M. El-Kaliouby

Keyword(s):

Joint Inversion ◽

Synthetic Data ◽

Data Sets ◽

Complex Environments ◽

Inversion Algorithm ◽

2D Inversion ◽

New Approach ◽

Earth Models ◽

Time Domain Electromagnetic ◽

Inversion Techniques

Joint or sequential inversion of direct current resistivity (DCR) and time-domain electromagnetic (TDEM) data commonly are performed for individual soundings assuming layered earth models. DCR and TDEM have different and complementary sensitivity to resistive and conductive structures, making them suitable methods for the application of joint inversion techniques. This potential joint inversion of DCR and TDEM methods has been used by several authors to reduce the ambiguities of the models calculated from each method separately. A new approach for joint inversion of these data sets, based on a laterally constrained algorithm, was found. The method was developed for the interpretation of soundings collected along a line over a 1D or 2D geology. The inversion algorithm was tested on two synthetic data sets, as well as on field data from Saudi Arabia. The results show that the algorithm is efficient and stable in producing quasi-2D models from DCR and TDEM data acquired in relatively complex environments.

Download Full-text

Boosting Instance Segmentation with Synthetic Data: A study to overcome the limits of real world data sets

10.1109/iccvw54120.2021.00110 ◽

2021 ◽

Author(s):

Florentin Poucin ◽

Andrea Kraus ◽

Martin Simon

Keyword(s):

Real World ◽

Synthetic Data ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Instance Segmentation

Download Full-text

A Non-Parametric Model for Accurate and Provably Private Synthetic Data Sets

Proceedings of the 12th International Conference on Availability, Reliability and Security - ARES '17 ◽

10.1145/3098954.3098962 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jordi Soria-Comas ◽

Josep Domingo-Ferrer

Keyword(s):

Synthetic Data ◽

Parametric Model ◽

Data Sets ◽

Non Parametric

Download Full-text

Comparison of iterative desmearing procedures for one-dimensional small-angle scattering data

Journal of Applied Crystallography ◽

10.1107/s0021889810049721 ◽

2011 ◽

Vol 44 (1) ◽

pp. 32-42 ◽

Cited By ~ 10

Author(s):

Thomas Vad ◽

Wiebke F. C. Sager

Keyword(s):

Small Angle ◽

Incident Beam ◽

Scattering Data ◽

Data Sets ◽

Convergence Criteria ◽

Small Momentum Transfer ◽

One Dimensional ◽

Noise Filter ◽

Large Numbers ◽

Angle Scattering

Two simple iterative desmearing procedures – the Lake algorithm and the Van Cittert method – have been investigated by introducing different convergence criteria using both synthetic and experimental small-angle neutron scattering data. Implementing appropriate convergence criteria resulted in stable and reliable solutions in correcting resolution errors originating from instrumental smearing,i.e.finite collimation and polychromaticity of the incident beam. Deviations at small momentum transfer for concentrated ensembles of spheres encountered in earlier studies are not observed. Amplification of statistical errors can be reduced by applying a noise filter after desmearing. In most cases investigated, the modified Lake algorithm yields better results with a significantly smaller number of iterations and is, therefore, suitable for automated desmearing of large numbers of data sets.

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

Sparse reflectivity inversion for nonstationary seismic data with surface-related multiples: Numerical and field-data experiments

Geophysics ◽

10.1190/geo2016-0520.1 ◽

2017 ◽

Vol 82 (3) ◽

pp. R199-R217 ◽

Cited By ~ 3

Author(s):

Xintao Chai ◽

Shangxu Wang ◽

Genyang Tang

Keyword(s):

Seismic Data ◽

Resolution Enhancement ◽

Synthetic Data ◽

Data Sets ◽

Data Set ◽

Anelastic Attenuation ◽

Seismic Resolution ◽

Text Filtering ◽

The Stability ◽

Reflectivity Inversion

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.

Download Full-text

Synthetic Data Sets

Encyclopedia of Social Network Analysis and Mining ◽

10.1007/978-1-4614-7163-9_110190-1 ◽

2017 ◽

pp. 1-4

Author(s):

Sargur N. Srihari

Keyword(s):

Synthetic Data ◽

Data Sets

Download Full-text

EVALUATION OF THE INFLUENCE OF SOCIAL AND ECONOMIC DETERMINANTS ON THE STATE OF PUBLIC HEALTH ON THE BASIS OF MICROIMITATION MODELING

Economic Analysis ◽

10.35774/econa2017.02.079 ◽

2017 ◽

pp. 79-90

Author(s):

Dmytro Shushpanov ◽

Volodymyr Sarioglo

Keyword(s):

Research Policy ◽

Synthetic Data ◽

Simulation Models ◽

The State ◽

Data Sets ◽

Economic Determinants ◽

Micro Simulation ◽

State Employment ◽

Health Research Policy ◽

Main Determinants

In the article the essence and peculiarities of microimitational modeling are considered. The advantages of microimitational models over the statistics models are substantiated. Micro-simulation models, that prognosticate somehow dynamic changes in health and which are most appropriate to use in development and health research policy, such as POHEM, CORSIM and Sife Paths, are outlined. It is proposed to use elements of statistical and dynamic microimitation modeling, agent modeling and the concept of a life course for the estimation of the influence social and economic determinants. The synthetic model of population which has been formed on the basis of representative data sets of sample surveys of living conditions of households and economic activity of the population of the State Employment Service of Ukraine, as well as microdata of the Multicultural Survey of the Population of Ukraine (2012) and the Medical and Demographic Survey (2013). The generalized scheme of the method of microimulation modeling of the influence of social and economic determinants on the health status of the population of Ukraine has been developed. The influence of the main determinants on the health of certain age, gender and social and economic groups of the population is estimated on the basis of the methodology of synthetic data.

Download Full-text