scholarly journals Variance in Variants: Propagating Genome Sequence Uncertainty into Phylogenetic Lineage Assignment

2021 ◽  
Author(s):  
David Champredon ◽  
Devan G Becker ◽  
Connor Chato ◽  
Gopi Gugan ◽  
Art G Poon

Genetic sequencing is subject to many different types of errors, but most analyses treat the resultant sequences as if they are known without error. Next generation sequencing methods rely on significantly larger numbers of reads than previous sequencing methods in exchange for a loss of accuracy in each individual read. Still, the coverage of such machines is imperfect and leaves uncertainty in many of the base calls. On top of this machine-level uncertainty, there is uncertainty induced by human error, such as errors in data entry or incorrect parameter settings. In this work, we demonstrate that the uncertainty in sequencing techniques will affect downstream analysis and propose a straightforward method to propagate the uncertainty. Our method uses a probabilistic matrix representation of individual sequences which incorporates base quality scores as a measure of uncertainty that naturally lead to resampling and replication as a framework for uncertainty propagation. With the matrix representation, resampling possible base calls according to quality scores provides a bootstrap- or prior distribution-like first step towards genetic analysis. Analyses based on these re-sampled sequences will include a more complete evaluation of the error involved in such analyses. We demonstrate our resampling method on SARS-CoV-2 data. The resampling procedures adds a linear computational cost to the analyses, but the large impact on the variance in downstream estimates makes it clear that ignoring this uncertainty may lead to overly confident conclusions. We show that SARS-CoV-2 lineage designations via Pangolin are much less certain than the bootstrap support reported by Pangolin would imply and the clock rate estimates for SARS-CoV-2 are much more variable than reported.

Author(s):  
Alessandra Cuneo ◽  
Alberto Traverso ◽  
Shahrokh Shahpar

In engineering design, uncertainty is inevitable and can cause a significant deviation in the performance of a system. Uncertainty in input parameters can be categorized into two groups: aleatory and epistemic uncertainty. The work presented here is focused on aleatory uncertainty, which can cause natural, unpredictable and uncontrollable variations in performance of the system under study. Such uncertainty can be quantified using statistical methods, but the main obstacle is often the computational cost, because the representative model is typically highly non-linear and complex. Therefore, it is necessary to have a robust tool that can perform the uncertainty propagation with as few evaluations as possible. In the last few years, different methodologies for uncertainty propagation and quantification have been proposed. The focus of this study is to evaluate four different methods to demonstrate strengths and weaknesses of each approach. The first method considered is Monte Carlo simulation, a sampling method that can give high accuracy but needs a relatively large computational effort. The second method is Polynomial Chaos, an approximated method where the probabilistic parameters of the response function are modelled with orthogonal polynomials. The third method considered is Mid-range Approximation Method. This approach is based on the assembly of multiple meta-models into one model to perform optimization under uncertainty. The fourth method is the application of the first two methods not directly to the model but to a response surface representing the model of the simulation, to decrease computational cost. All these methods have been applied to a set of analytical test functions and engineering test cases. Relevant aspects of the engineering design and analysis such as high number of stochastic variables and optimised design problem with and without stochastic design parameters were assessed. Polynomial Chaos emerges as the most promising methodology, and was then applied to a turbomachinery test case based on a thermal analysis of a high-pressure turbine disk.


2019 ◽  
Vol 141 (6) ◽  
Author(s):  
M. Giselle Fernández-Godino ◽  
S. Balachandar ◽  
Raphael T. Haftka

When simulations are expensive and multiple realizations are necessary, as is the case in uncertainty propagation, statistical inference, and optimization, surrogate models can achieve accurate predictions at low computational cost. In this paper, we explore options for improving the accuracy of a surrogate if the modeled phenomenon presents symmetries. These symmetries allow us to obtain free information and, therefore, the possibility of more accurate predictions. We present an analytical example along with a physical example that has parametric symmetries. Although imposing parametric symmetries in surrogate models seems to be a trivial matter, there is not a single way to do it and, furthermore, the achieved accuracy might vary. We present four different ways of using symmetry in surrogate models. Three of them are straightforward, but the fourth is original and based on an optimization of the subset of points used. The performance of the options was compared with 100 random designs of experiments (DoEs) where symmetries were not imposed. We found that each of the options to include symmetries performed the best in one or more of the studied cases and, in all cases, the errors obtained imposing symmetries were substantially smaller than the worst cases among the 100. We explore the options for using symmetries in two surrogates that present different challenges and opportunities: Kriging and linear regression. Kriging is often used as a black box; therefore, we consider approaches to include the symmetries without changes in the main code. On the other hand, since linear regression is often built by the user; owing to its simplicity, we consider also approaches that modify the linear regression basis functions to impose the symmetries.


2018 ◽  
Vol 12 (3) ◽  
pp. 143-157 ◽  
Author(s):  
Håvard Raddum ◽  
Pavol Zajac

Abstract We show how to build a binary matrix from the MRHS representation of a symmetric-key cipher. The matrix contains the cipher represented as an equation system and can be used to assess a cipher’s resistance against algebraic attacks. We give an algorithm for solving the system and compute its complexity. The complexity is normally close to exhaustive search on the variables representing the user-selected key. Finally, we show that for some variants of LowMC, the joined MRHS matrix representation can be used to speed up regular encryption in addition to exhaustive key search.


Author(s):  
A. Javed ◽  
R. Pecnik ◽  
J. P. van Buijtenen

Compressor impellers for mass-market turbochargers are die-casted and machined with an aim to achieve high dimensional accuracy and acquire specific performance. However, manufacturing uncertainties result in dimensional deviations causing incompatible operational performance and assembly errors. Process capability limitations of the manufacturer can cause an increase in part rejections, resulting in high production cost. This paper presents a study on a centrifugal impeller with focus on the conceptual design phase to obtain a turbomachine that is robust to manufacturing uncertainties. The impeller has been parameterized and evaluated using a commercial computational fluid dynamics (CFD) solver. Considering the computational cost of CFD, a surrogate model has been prepared for the impeller by response surface methodology (RSM) using space-filling Latin hypercube designs. A sensitivity analysis has been performed initially to identify the critical geometric parameters which influence the performance mainly. Sensitivity analysis is followed by the uncertainty propagation and quantification using the surrogate model based Monte Carlo simulation. Finally a robust design optimization has been carried out using a stochastic optimization algorithm leading to a robust impeller design for which the performance is relatively insensitive to variability in geometry without reducing the sources of inherent variation i.e. the manufacturing noise.


Nematology ◽  
2011 ◽  
Vol 13 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Blanca Landa ◽  
Carolina Cantalapiedra-Navarrete ◽  
Juan Palomares-Rius ◽  
Pablo Castillo ◽  
Carlos Gutiérrez-Gutiérrez

AbstractDuring a recent nematode survey in natural environments of the Los Alcornocales Regional Park narrow valleys, viz., the renowned 'canutos' excavated in the mountains that maintain a humid microclimate, in southern Spain, an amphimictic population of Xiphinema globosum was identified. Morphological and morphometric studies on this population fit the original and previous descriptions and represent the first report from Spain and southern Europe. Molecular characterisation of X. globosum from Spain using D2-D3 expansion regions of 28S rRNA, 18S rRNA and ITS1-rRNA is provided and maximum likelihood and Bayesian inference analysis were used to reconstruct phylogenetic relationships within X. globosum and other Xiphinema species. A supertree solution of the different phylogenetic trees obtained in this study and in other published studies using rDNA genes are presented using the matrix representation parsimony method (MRP) and the most similar supertree method (MSSA). The results revealed a closer phylogenetic relationship of X. globosum with X. diversicaudatum, X. bakeri and with some sequences of unidentified Xiphinema spp. deposited in GenBank.


Geophysics ◽  
2016 ◽  
Vol 81 (3) ◽  
pp. S101-S117 ◽  
Author(s):  
Alba Ordoñez ◽  
Walter Söllner ◽  
Tilman Klüver ◽  
Leiv J. Gelius

Several studies have shown the benefits of including multiple reflections together with primaries in the structural imaging of subsurface reflectors. However, to characterize the reflector properties, there is a need to compensate for propagation effects due to multiple scattering and to properly combine the information from primaries and all orders of multiples. From this perspective and based on the wave equation and Rayleigh’s reciprocity theorem, recent works have suggested computing the subsurface image from the Green’s function reflection response (or reflectivity) by inverting a Fredholm integral equation in the frequency-space domain. By following Claerbout’s imaging principle and assuming locally reacting media, the integral equation may be reduced to a trace-by-trace deconvolution imaging condition. For a complex overburden and considering that the structure of the subsurface is angle-dependent, this trace-by-trace deconvolution does not properly solve the Fredholm integral equation. We have inverted for the subsurface reflectivity by solving the matrix version of the Fredholm integral equation at every subsurface level, based on a multidimensional deconvolution of the receiver wavefields with the source wavefields. The total upgoing pressure and the total filtered downgoing vertical velocity were used as receiver and source wavefields, respectively. By selecting appropriate subsets of the inverted reflectivity matrix and by performing an inverse Fourier transform over the frequencies, the process allowed us to obtain wavefields corresponding to virtual sources and receivers located in the subsurface, at a given level. The method has been applied on two synthetic examples showing that the computed reflectivity wavefields are free of propagation effects from the overburden and thus are suited to extract information of the image point location in the angular and spatial domains. To get the computational cost down, our approach is target-oriented; i.e., the reflectivity may only be computed in the area of most interest.


Author(s):  
S.N. Masaev

The purpose of the study was to determine the problem of control of a dynamic system of higher dimension. Relying on Leontev input-output balance, we formalized the dynamic system and synthesized its control. Within the research, we developed a mathematical model that combines different working objects that consume and release various resources. The value of the penalty for all nodes and objects is introduced into the matrix representation of the problem, taking into account various options for their interaction, i.e., the observation problem. A matrix representation of the planning task at each working object is formed. For the formed system, a control loop is created; the influencing parameters of the external environment are indicated. We calculated the system operational mode, taking into account the interaction of the nodes of objects with each other when the parameters of the external environment influence them. Findings of research show that in achieving a complex result, the system is inefficient without optimal planning and accounting for the matrix of penalties for the interaction of nodes and objects of the dynamic system with each other. In a specific example, for a dynamic system with a dimension of 4.8 million parameters, we estimated the control taking into account the penalty matrix, which made it possible to increase the inflow of additional resources from the outside by 2.4 times from 130 billion conv. units up to 310 conv. units in 5 years. Taking into account the maximum optimization of control in the nodes, an increase of 3.66 times in the inflow of additional resources was ensured --- from 200.46 to 726.62 billion rubles


Author(s):  
A. D. Chowdhury ◽  
S. K. Bhattacharyya ◽  
C. P. Vendhan

The normal mode method is widely used in ocean acoustic propagation. Usually, finite difference and finite element methods are used in its solution. Recently, a method has been proposed for heterogeneous layered waveguides where the depth eigenproblem is solved using the classical Rayleigh–Ritz approximation. The method has high accuracy for low to high frequency problems. However, the matrices that appear in the eigenvalue problem for radial wavenumbers require numerical integration of the matrix elements since the sound speed and density profiles are numerically defined. In this paper, a technique is proposed to reduce the computational cost of the Rayleigh–Ritz method by expanding the sound speed profile in a Fourier series using nonlinear least square fit so that the integrals of the matrix elements can be computed in closed form. This technique is tested in a variety of problems and found to be sufficiently accurate in obtaining the radial wavenumbers as well as the transmission loss in a waveguide. The computational savings obtained by this approach is remarkable, the improvements being one or two orders of magnitude.


Sign in / Sign up

Export Citation Format

Share Document