scholarly journals Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

2018 ◽  
Author(s):  
Wei Wang ◽  
Jack Smith ◽  
Hussein A. Hejase ◽  
Kevin J. Liu

AbstractNon-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors.To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method.To demonstrate the flexibility of the new technique, we apply SERES to two different applications – one involving aligned inputs and the other involving unaligned inputs. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods for both applications.

2020 ◽  
Author(s):  
Wei Wang ◽  
Kevin J. Liu

AbstractMotivationThe standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate “phylogenetic support”). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted. Special-purpose fully parametric or semi-parametric methods for phylogenetic support estimation have since been introduced, some of which are intended to address this concern.ResultsIn this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (“RAndom Walk Resampling”). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the “mirrored inputs” idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state of the art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support as well as GUIDANCE2, a state-of-the-art purpose-built fully parametric method. Additional simulation study experiments help to clarify practical considerations regarding RAWR support estimation. We conclude with thoughts on future research directions and the untapped potential for sequence-aware non-parametric resampling and re-estimation.AvailabilityData and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/[email protected]


Author(s):  
Suman Debnath ◽  
Anirban Banik ◽  
Tarun Kanti Bandyopadhyay ◽  
Mrinmoy Majumder ◽  
Apu Kumar Saha

2021 ◽  
pp. 1-14
Author(s):  
Ana López-Cheda ◽  
María-Amalia Jácome ◽  
Ricardo Cao ◽  
Pablo M. De Salazar

Author(s):  
Mehdi Ahmadian ◽  
Xubin Song

Abstract A non-parametric model for magneto-rheological (MR) dampers is presented. After discussing the merits of parametric and non-parametric models for MR dampers, the test data for a MR damper is used to develop a non-parametric model. The results of the model are compared with the test data to illustrate the accuracy of the model. The comparison shows that the non-parametric model is able to accurately predict the damper force characteristics, including the damper non-linearity and electro-magnetic saturation. It is further shown that the parametric model can be numerically solved more efficiently than the parametric models.


Author(s):  
Tatsunori B. Hashimoto ◽  
David Alvarez-Melis ◽  
Tommi S. Jaakkola

Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are indeed consistent with an Euclidean semantic space hypothesis. Framing word embedding as metric recovery of a semantic space unifies existing word embedding algorithms, ties them to manifold learning, and demonstrates that existing algorithms are consistent metric recovery methods given co-occurrence counts from random walks. Furthermore, we propose a simple, principled, direct metric recovery algorithm that performs on par with the state-of-the-art word embedding and manifold learning methods. Finally, we complement recent focus on analogies by constructing two new inductive reasoning datasets—series completion and classification—and demonstrate that word embeddings can be used to solve them as well.


Author(s):  
Gavin C. Hudson-Lamb ◽  
Johan P. Schoeman ◽  
Emma H. Hooijberg ◽  
Sonja K. Heinrich ◽  
Adrian S.W. Tordiffe

Published haematologic and serum biochemistry reference intervals are very scarce for captive cheetahs and even more for free-ranging cheetahs. The current study was performed to establish reference intervals for selected serum biochemistry analytes in cheetahs. Baseline serum biochemistry analytes were analysed from 66 healthy Namibian cheetahs. Samples were collected from 30 captive cheetahs at the AfriCat Foundation and 36 free-ranging cheetahs from central Namibia. The effects of captivity-status, age, sex and haemolysis score on the tested serum analytes were investigated. The biochemistry analytes that were measured were sodium, potassium, magnesium, chloride, urea and creatinine. The 90% confidence interval of the reference limits was obtained using the non-parametric bootstrap method. Reference intervals were preferentially determined by the non-parametric method and were as follows: sodium (128 mmol/L – 166 mmol/L), potassium (3.9 mmol/L – 5.2 mmol/L), magnesium (0.8 mmol/L – 1.2 mmol/L), chloride (97 mmol/L – 130 mmol/L), urea (8.2 mmol/L – 25.1 mmol/L) and creatinine (88 µmol/L – 288 µmol/L). Reference intervals from the current study were compared with International Species Information System values for cheetahs and found to be narrower. Moreover, age, sex and haemolysis score had no significant effect on the serum analytes in this study. Separate reference intervals for captive and free-ranging cheetahs were also determined. Captive cheetahs had higher urea values, most likely due to dietary factors. This study is the first to establish reference intervals for serum biochemistry analytes in cheetahs according to international guidelines. These results can be used for future health and disease assessments in both captive and free-ranging cheetahs.


Sign in / Sign up

Export Citation Format

Share Document