scholarly journals Accuracy of mutational signature software on correlated signatures

2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Yang Wu ◽  
Ellora Hui Zhen Chua ◽  
Alvin Wei Tian Ng ◽  
Arnoud Boot ◽  
Steven G. Rozen

AbstractMutational signatures are characteristic patterns of mutations generated by exogenous mutagens or by endogenous mutational processes. Mutational signatures are important for research into DNA damage and repair, aging, cancer biology, genetic toxicology, and epidemiology. Unsupervised learning can infer mutational signatures from the somatic mutations in large numbers of tumors, and separating correlated signatures is a notable challenge for this task. To investigate which methods can best meet this challenge, we assessed 18 computational methods for inferring mutational signatures on 20 synthetic data sets that incorporated varying degrees of correlated activity of two common mutational signatures. Performance varied widely, and four methods noticeably outperformed the others: hdp (based on hierarchical Dirichlet processes), SigProExtractor (based on multiple non-negative matrix factorizations over resampled data), TCSM (based on an approach used in document topic analysis), and mutSpec.NMF (also based on non-negative matrix factorization). The results underscored the complexities of mutational signature extraction, including the importance and difficulty of determining the correct number of signatures and the importance of hyperparameters. Our findings indicate directions for improvement of the software and show a need for care when interpreting results from any of these methods, including the need for assessing sensitivity of the results to input parameters.

2014 ◽  
Vol 7 (3) ◽  
pp. 781-797 ◽  
Author(s):  
P. Paatero ◽  
S. Eberly ◽  
S. G. Brown ◽  
G. A. Norris

Abstract. The EPA PMF (Environmental Protection Agency positive matrix factorization) version 5.0 and the underlying multilinear engine-executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement of factor elements (BS-DISP). The goal of these methods is to capture the uncertainty of PMF analyses due to random errors and rotational ambiguity. It is shown that the three methods complement each other: depending on characteristics of the data set, one method may provide better results than the other two. Results are presented using synthetic data sets, including interpretation of diagnostics, and recommendations are given for parameters to report when documenting uncertainty estimates from EPA PMF or ME-2 applications.


Geophysics ◽  
1983 ◽  
Vol 48 (11) ◽  
pp. 1514-1524 ◽  
Author(s):  
Edip Baysal ◽  
Dan D. Kosloff ◽  
John W. C. Sherwood

Migration of stacked or zero‐offset sections is based on deriving the wave amplitude in space from wave field observations at the surface. Conventionally this calculation has been carried out through a depth extrapolation. We examine the alternative of carrying out the migration through a reverse time extrapolation. This approach may offer improvements over existing migration methods, especially in cases of steeply dipping structures with strong velocity contrasts. This migration method is tested using appropriate synthetic data sets.


Geophysics ◽  
2011 ◽  
Vol 76 (4) ◽  
pp. F239-F250 ◽  
Author(s):  
Fernando A. Monteiro Santos ◽  
Hesham M. El-Kaliouby

Joint or sequential inversion of direct current resistivity (DCR) and time-domain electromagnetic (TDEM) data commonly are performed for individual soundings assuming layered earth models. DCR and TDEM have different and complementary sensitivity to resistive and conductive structures, making them suitable methods for the application of joint inversion techniques. This potential joint inversion of DCR and TDEM methods has been used by several authors to reduce the ambiguities of the models calculated from each method separately. A new approach for joint inversion of these data sets, based on a laterally constrained algorithm, was found. The method was developed for the interpretation of soundings collected along a line over a 1D or 2D geology. The inversion algorithm was tested on two synthetic data sets, as well as on field data from Saudi Arabia. The results show that the algorithm is efficient and stable in producing quasi-2D models from DCR and TDEM data acquired in relatively complex environments.


2011 ◽  
Vol 44 (1) ◽  
pp. 32-42 ◽  
Author(s):  
Thomas Vad ◽  
Wiebke F. C. Sager

Two simple iterative desmearing procedures – the Lake algorithm and the Van Cittert method – have been investigated by introducing different convergence criteria using both synthetic and experimental small-angle neutron scattering data. Implementing appropriate convergence criteria resulted in stable and reliable solutions in correcting resolution errors originating from instrumental smearing,i.e.finite collimation and polychromaticity of the incident beam. Deviations at small momentum transfer for concentrated ensembles of spheres encountered in earlier studies are not observed. Amplification of statistical errors can be reduced by applying a noise filter after desmearing. In most cases investigated, the modified Lake algorithm yields better results with a significantly smaller number of iterations and is, therefore, suitable for automated desmearing of large numbers of data sets.


Author(s):  
Danlei Xu ◽  
Lan Du ◽  
Hongwei Liu ◽  
Penghui Wang

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.


Geophysics ◽  
2017 ◽  
Vol 82 (3) ◽  
pp. R199-R217 ◽  
Author(s):  
Xintao Chai ◽  
Shangxu Wang ◽  
Genyang Tang

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.


2017 ◽  
pp. 79-90
Author(s):  
Dmytro Shushpanov ◽  
Volodymyr Sarioglo

In the article the essence and peculiarities of microimitational modeling are considered. The advantages of microimitational models over the statistics models are substantiated. Micro-simulation models, that prognosticate somehow dynamic changes in health and which are most appropriate to use in development and health research policy, such as POHEM, CORSIM and Sife Paths, are outlined. It is proposed to use elements of statistical and dynamic microimitation modeling, agent modeling and the concept of a life course for the estimation of the influence social and economic determinants. The synthetic model of population which has been formed on the basis of representative data sets of sample surveys of living conditions of households and economic activity of the population of the State Employment Service of Ukraine, as well as microdata of the Multicultural Survey of the Population of Ukraine (2012) and the Medical and Demographic Survey (2013). The generalized scheme of the method of microimulation modeling of the influence of social and economic determinants on the health status of the population of Ukraine has been developed. The influence of the main determinants on the health of certain age, gender and social and economic groups of the population is estimated on the basis of the methodology of synthetic data.


Sign in / Sign up

Export Citation Format

Share Document