scholarly journals Non-parametric and Semi-parametric Support Estimation Using SEquential RESampling Random Walks on Biomolecular Sequences

Author(s):  
Wei Wang ◽  
Jack Smith ◽  
Hussein A. Hejase ◽  
Kevin J. Liu
2018 ◽  
Author(s):  
Wei Wang ◽  
Jack Smith ◽  
Hussein A. Hejase ◽  
Kevin J. Liu

AbstractNon-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors.To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method.To demonstrate the flexibility of the new technique, we apply SERES to two different applications – one involving aligned inputs and the other involving unaligned inputs. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods for both applications.


2020 ◽  
Author(s):  
Wei Wang ◽  
Kevin J. Liu

AbstractMotivationThe standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate “phylogenetic support”). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted. Special-purpose fully parametric or semi-parametric methods for phylogenetic support estimation have since been introduced, some of which are intended to address this concern.ResultsIn this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (“RAndom Walk Resampling”). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the “mirrored inputs” idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state of the art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support as well as GUIDANCE2, a state-of-the-art purpose-built fully parametric method. Additional simulation study experiments help to clarify practical considerations regarding RAWR support estimation. We conclude with thoughts on future research directions and the untapped potential for sequence-aware non-parametric resampling and re-estimation.AvailabilityData and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/[email protected]


2018 ◽  
Vol 5 (1) ◽  
pp. 52 ◽  
Author(s):  
Bishart Chang

The main purpose of this study is to determine the weak form efficiency of the emerging gold markets such as China, India and Russia with the special focus on testing random walks (RWS) and martingale difference sequence (MDS) hypotheses during different periods of time. This study uses biased free statistical techniques such as runs test, parametric variance ratio tests and recent modified non-parametric variance ratio tests based on ranks and signs by using daily spot gold prices from January 12, 1993 to October 28, 2016. Findings of the study suggest that Russian gold market is weak form efficient throughout the period whereas other two markets are found weak form efficient during second sub period only that is, January 2000 to December 2005.


Author(s):  
Mikhail Menshikov ◽  
Serguei Popov ◽  
Andrew Wade
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document