scholarly journals A unified genealogy of modern and ancient genomes

Author(s):  
Anthony Wilder Wohns ◽  
Yan Wong ◽  
Ben Jeffery ◽  
Ali Akbari ◽  
Swapan Mallick ◽  
...  

AbstractThe sequencing of modern and ancient genomes from around the world has revolutionised our understanding of human history and evolution1,2. However, the general problem of how best to characterise the full complexity of ancestral relationships from the totality of human genomic variation remains unsolved. Patterns of variation in each data set are typically analysed independently, and often using parametric models or data reduction techniques that cannot capture the full complexity of human ancestry3,4. Moreover, variation in sequencing technology5,6, data quality7and in silico processing8,9, coupled with complexities of data scale10, limit the ability to integrate data sources. Here, we introduce a non-parametric approach to inferring human genealogical history that overcomes many of these challenges and enables us to build the largest genealogy of both modern and ancient humans yet constructed. The genealogy provides a lossless and compact representation of multiple datasets, addresses the challenges of missing and erroneous data, and benefits from using ancient samples to constrain and date relationships. Using simulations and empirical analyses, we demonstrate the power of the method to recover relationships between individuals and populations, as well as to identify descendants of ancient samples. Finally, we show how applying a simple non-parametric estimator of ancestor geographical location to the inferred genealogy recapitulates key events in human history. Our results demonstrate that whole-genome genealogies are a powerful means of synthesising genetic data and provide rich insights into human evolution.

Metrika ◽  
2021 ◽  
Author(s):  
Jorge Navarro

AbstractThe purpose of the paper is to provide a general method based on conditional quantile curves to predict record values from preceding records. The predictions are based on conditional median (or median regression) curves. Moreover, conditional quantiles curves are used to provide confidence bands for these predictions. The method is based on the recently introduced concept of multivariate distorted distributions that are used instead of copulas to represent the dependence structure. This concept allows us to compute the conditional quantile curves in a simple way. The theoretical findings are illustrated with a non-parametric model (standard uniform), two parametric models (exponential and Pareto), and a non-parametric procedure for the general case. A real data set and a simulated case study in reliability are analysed.


2007 ◽  
Vol 19 (3) ◽  
pp. 672-705 ◽  
Author(s):  
Wilson Truccolo ◽  
John P. Donoghue

Statistical nonparametric modeling tools that enable the discovery and approximation of functional forms (e.g., tuning functions) relating neural spiking activity to relevant covariates are desirable tools in neuroscience. In this article, we show how stochastic gradient boosting regression can be successfully extended to the modeling of spiking activity data while preserving their point process nature, thus providing a robust nonparametric modeling tool. We formulate stochastic gradient boosting in terms of approximating the conditional intensity function of a point process in discrete time and use the standard likelihood of the process to derive the loss function for the approximation problem. To illustrate the approach, we apply the algorithm to the modeling of primary motor and parietal spiking activity as a function of spiking history and kinematics during a two-dimensional reaching task. Model selection, goodness of fit via the time rescaling theorem, model interpretation via partial dependence plots, ranking of covariates according to their relative importance, and prediction of peri-event time histograms are illustrated and discussed. Additionally, we use the tenfold cross-validated log likelihood of the modeled neural processes (67 cells) to compare the performance of gradient boosting regression to two alternative approaches: standard generalized linear models (GLMs) and Bayesian P-splines with Markov chain Monte Carlo (MCMC) sampling. In our data set, gradient boosting outperformed both Bayesian P-splines (in approximately 90% of the cells) and GLMs (100%). Because of its good performance and computational efficiency, we propose stochastic gradient boosting regression as an off-the-shelf nonparametric tool for initial analyses of large neural data sets (e.g., more than 50 cells; more than 105 samples per cell) with corresponding multidimensional covariate spaces (e.g., more than four covariates). In the cases where a functional form might be amenable to a more compact representation, gradient boosting might also lead to the discovery of simpler, parametric models.


2015 ◽  
Vol 0 (0) ◽  
Author(s):  
Nikola Gradojevic

AbstractThis paper builds a novel multi-criteria, non-parametric classification framework in order to improve the accuracy of pricing European options. The proposed approach is based on classifying financial options according to their implied volatility, time to maturity and moneyness. Using a recent data set for the daily S&P 500 index call options, the multi-criteria modular neural network model demonstrates its superior out-of-sample pricing performance relative to competing parametric and non-parametric models. By observing the model’s pricing errors across various option types, the analysis provides additional insights into pricing biases and stresses the importance of selecting appropriate classification criteria.


2021 ◽  
pp. 135481662110088
Author(s):  
Sefa Awaworyi Churchill ◽  
John Inekwe ◽  
Kris Ivanovski

Using a historical data set and recent advances in non-parametric time series modelling, we investigate the nexus between tourism flows and house prices in Germany over nearly 150 years. We use time-varying non-parametric techniques given that historical data tend to exhibit abrupt changes and other forms of non-linearities. Our findings show evidence of a time-varying effect of tourism flows on house prices, although with mixed effects. The pre-World War II time-varying estimates of tourism show both positive and negative effects on house prices. While changes in tourism flows contribute to increasing housing prices over the post-1950 period, this is short-lived, and the effect declines until the mid-1990s. However, we find a positive and significant relationship after 2000, where the impact of tourism on house prices becomes more pronounced in recent years.


2021 ◽  
Author(s):  
Zaynab Shaik ◽  
Nicola Georgina Bergh ◽  
Bengt Oxelman ◽  
Anthony George Verboom

We applied species delimitation methods based on the Multi-Species Coalescent (MSC) model to 500+ loci derived from genotyping-by-sequencing on the South African Seriphium plumosum (Asteraceae) species complex. The loci were represented either as multiple sequence alignments or single nucleotide polymorphisms (SNPs), and analysed by the STACEY and Bayes Factor Delimitation (BFD)/SNAPP methods, respectively. Both methods supported species taxonomies where virtually all of the 32 sampled individuals, each representing its own geographical population, were identified as separate species. Computational efforts required to achieve adequate mixing of MCMC chains were considerable, and the species/minimal cluster trees identified similar strongly supported clades in replicate runs. The resolution was, however, higher in the STACEY trees than in the SNAPP trees, which is consistent with the higher information content of full sequences. The computational efficiency, measured as effective sample sizes of likelihood and posterior estimates per time unit, was consistently higher for STACEY. A random subset of 56 alignments had similar resolution to the 524-locus SNP data set. The STRUCTURE-like sparse Non-negative Matrix Factorisation (sNMF) method was applied to six individuals from each of 48 geographical populations and 28023 SNPs. Significantly fewer (13) clusters were identified as optimal by this analysis compared to the MSC methods. The sNMF clusters correspond closely to clades consistently supported by MSC methods, and showed evidence of admixture, especially in the western Cape Floristic Region. We discuss the significance of these findings, and conclude that it is important to a priori consider the kind of species one wants to identify when using genome-scale data, the assumptions behind the parametric models applied, and the potential consequences of model violations may have.


Author(s):  
Mehdi Ahmadian ◽  
Xubin Song

Abstract A non-parametric model for magneto-rheological (MR) dampers is presented. After discussing the merits of parametric and non-parametric models for MR dampers, the test data for a MR damper is used to develop a non-parametric model. The results of the model are compared with the test data to illustrate the accuracy of the model. The comparison shows that the non-parametric model is able to accurately predict the damper force characteristics, including the damper non-linearity and electro-magnetic saturation. It is further shown that the parametric model can be numerically solved more efficiently than the parametric models.


2008 ◽  
Vol 35 (5) ◽  
pp. 567-582 ◽  
Author(s):  
Adam J. Branscum ◽  
Timothy E. Hanson ◽  
Ian A. Gardner

2017 ◽  
Vol 77 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Maria Bampasidou ◽  
Ashok K. Mishra ◽  
Charles B. Moss

Purpose The purpose of this paper is to investigate the endogeneity of asset values and how it relates to farm financial stress in US agriculture. The authors conceptualize an implied measure of farm financial stress as a function of debt position. The authors posit that there are variations in the asset values that are beyond the farmer’s control and therefore have implications on farm debt. Design/methodology/approach The framework recognizes the endogeneity of return on assets (ROA). It uses a non-parametric technique to approximate the variance of expected ROA (VEROA). The authors model the rate of return on agricultural assets and interest rate with a formulation that focuses on macroeconomic policy. Further, the authors use a dynamic balanced panel data set from 1960 to 2011 for 15 US agricultural states from the Agricultural Resource Management Survey, and information from traditional state-level financial statements. Findings Estimation of linear dynamic debt panel data models accounting for the endogeneity of ROA and VEROA is a challenging task. Estimated variances are unstable. Hence, the authors focus on variance specification that uses the residuals squared from the ARIMA specification and non-parametric estimators. Arellano-Bover/Blundell-Bond generalized method of moments estimation procedures, although may be biased, show that VEROA has a negative and significant effect on the total amount of debt in the agricultural sector. Research limitations/implications The instruments used in this analysis are lagged regressors which may be weakly correlated with the relevant first-order condition, hence not properly identifying the parameters of interest. Future research could include the identification of better instruments, potentially use of sequential moment conditions. Originality/value Unlike previous study, the authors use non-parametric approximation of VEROA. The authors model the rate of return on agricultural assets and interest rate with a formulation that focuses on macroeconomic policy. Second, the authors make use of a large dynamic balanced panel data set from 1960 to 2011 for 15 agricultural states in the USA. To the best of the authors’ knowledge, this study is one of the few that provides evidence on risk-balancing behavior at the agricultural sector level, of the USA.


Sign in / Sign up

Export Citation Format

Share Document