scholarly journals Composite Likelihood Method for Inferring Local Pedigrees

2017 ◽  
Author(s):  
Amy Ko ◽  
Rasmus Nielsen

AbstractPedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit.Author SummaryPedigrees contain information about the genealogical relationships among individuals. This information can be used in many areas of genetic studies such as disease association studies, conservation efforts, and learning about the demographic history and social structure of a population. Despite their importance, pedigrees are often unknown and must be estimated from genetic information. However, pedigree inference remains a difficult problem due to the high cost of likelihood computation and the enormous number of possible pedigrees we must consider. These difficulties limit existing methods in their ability to infer pedigrees when the sample size or the number of markers is large, or when the sample contains only distant relatives. In this report, we present a method that circumvents these computational barriers in order to infer pedigrees of complex structure for a large number of individuals. From our simulation studies, we found that our method can infer distant relatives much more accurately than existing methods. Our ability to infer pedigrees with a greater accuracy opens up possibilities for developing or improving pedigree-based methods in many areas research such as linkage analysis, demographic inference, association studies, and conservation.

2011 ◽  
Vol 19 (2) ◽  
pp. 188-204 ◽  
Author(s):  
Jong Hee Park

In this paper, I introduce changepoint models for binary and ordered time series data based on Chib's hidden Markov model. The extension of the changepoint model to a binary probit model is straightforward in a Bayesian setting. However, detecting parameter breaks from ordered regression models is difficult because ordered time series data often have clustering along the break points. To address this issue, I propose an estimation method that uses the linear regression likelihood function for the sampling of hidden states of the ordinal probit changepoint model. The marginal likelihood method is used to detect the number of hidden regimes. I evaluate the performance of the introduced methods using simulated data and apply the ordinal probit changepoint model to the study of Eichengreen, Watson, and Grossman on violations of the “rules of the game” of the gold standard by the Bank of England during the interwar period.


2018 ◽  
Vol 3 ◽  
pp. 82
Author(s):  
Benard W. Kulohoma

Paucity of data from African populations due to under-representation in human genetic studies has impeded detailed understanding of the heritable human genome variation. This is despite the fact that Africa has sizeable genetic, cultural and linguistic diversity. There are renewed efforts to understand health problems relevant to African populations using more comprehensive datasets, and by improving expertise in health-related genomics among African scientists. We emphasise that careful consideration of the sampled populations from national and within-continental cohorts in large multi-ethnic genetic research efforts is required to maximise the prospects of identifying and fine-mapping novel risk variants in indigenous populations. We caution that human demographic history should be taken into consideration in such prospective genetic-association studies.


2019 ◽  
Author(s):  
Ruichen Rong ◽  
Shuang Jiang ◽  
Lin Xu ◽  
Guanghua Xiao ◽  
Yang Xie ◽  
...  

AbstractSimulation is a critical component of experimental design and evaluation of analysis methods in microbiome association studies. However, statistically modeling the microbiome data is challenging since that the complex structure in the real data is difficult to be fully represented by statistical models. To address this challenge, we designed a novel simulation framework for microbiome data using a generative adversarial network (GAN), called MB-GAN, by utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from a given dataset and compute simulated datasets that are indistinguishable from it. When MB-GAN was applied to a case-control microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high fidelity microbiome data are needed.


2018 ◽  
Vol 3 ◽  
pp. 82
Author(s):  
Benard W. Kulohoma

Paucity of data from African populations due to under-representation in human genetic studies has impeded detailed understanding of the heritable human genome variation. This is despite the fact that Africa has sizeable genetic, cultural and linguistic diversity. There are renewed efforts to understand health problems relevant to African populations using more comprehensive datasets, and by improving expertise in health-related genomics among African scientists. We emphasise that careful consideration of the sampled populations from national and within-continental cohorts in large multi-ethnic genetic research efforts is required to maximise the prospects of identifying and fine-mapping novel risk variants in indigenous populations. We caution that human demographic history should be taken into consideration in such prospective genetic-association studies.


Biometrika ◽  
2020 ◽  
Vol 107 (4) ◽  
pp. 907-917
Author(s):  
Jing Huang ◽  
Yang Ning ◽  
Nancy Reid ◽  
Yong Chen

Summary Composite likelihood functions are often used for inference in applications where the data have a complex structure. While inference based on the composite likelihood can be more robust than inference based on the full likelihood, the inference is not valid if the associated conditional or marginal models are misspecified. In this paper, we propose a general class of specification tests for composite likelihood inference. The test statistics are motivated by the fact that the second Bartlett identity holds for each component of the composite likelihood function when these components are correctly specified. We construct the test statistics based on the discrepancy between the so-called composite information matrix and the sensitivity matrix. As an illustration, we study three important cases of the proposed tests and establish their limiting distributions under both null and local alternative hypotheses. Finally, we evaluate the finite-sample performance of the proposed tests in several examples.


Genetics ◽  
2004 ◽  
Vol 166 (4) ◽  
pp. 1963-1979 ◽  
Author(s):  
Jinliang Wang

Abstract Likelihood methods have been developed to partition individuals in a sample into full-sib and half-sib families using genetic marker data without parental information. They invariably make the critical assumption that marker data are free of genotyping errors and mutations and are thus completely reliable in inferring sibships. Unfortunately, however, this assumption is rarely tenable for virtually all kinds of genetic markers in practical use and, if violated, can severely bias sibship estimates as shown by simulations in this article. I propose a new likelihood method with simple and robust models of typing error incorporated into it. Simulations show that the new method can be used to infer full- and half-sibships accurately from marker data with a high error rate and to identify typing errors at each locus in each reconstructed sib family. The new method also improves previous ones by adopting a fresh iterative procedure for updating allele frequencies with reconstructed sibships taken into account, by allowing for the use of parental information, and by using efficient algorithms for calculating the likelihood function and searching for the maximum-likelihood configuration. It is tested extensively on simulated data with a varying number of marker loci, different rates of typing errors, and various sample sizes and family structures and applied to two empirical data sets to demonstrate its usefulness.


Author(s):  
Eduardo de Freitas Costa ◽  
Silvana Schneider ◽  
Giulia Bagatini Carlotto ◽  
Tainá Cabalheiro ◽  
Mauro Ribeiro de Oliveira Júnior

AbstractThe dynamics of the wild boar population has become a pressing issue not only for ecological purposes, but also for agricultural and livestock production. The data related to the wild boar dispersal distance can have a complex structure, including excess of zeros and right-censored observations, thus being challenging for modeling. In this sense, we propose two different zero-inflated-right-censored regression models, assuming Weibull and gamma distributions. First, we present the construction of the likelihood function, and then, we apply both models to simulated datasets, demonstrating that both regression models behave well. The simulation results point to the consistency and asymptotic unbiasedness of the developed methods. Afterwards, we adjusted both models to a simulated dataset of wild boar dispersal, including excess of zeros, right-censored observations, and two covariates: age and sex. We showed that the models were useful to extract inferences about the wild boar dispersal, correctly describing the data mimicking a situation where males disperse more than females, and age has a positive effect on the dispersal of the wild boars. These results are useful to overcome some limitations regarding inferences in zero-inflated-right-censored datasets, especially concerning the wild boar’s population. Users will be provided with an R function to run the proposed models.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1835
Author(s):  
Antonio Barrera ◽  
Patricia Román-Román ◽  
Francisco Torres-Ruiz

A joint and unified vision of stochastic diffusion models associated with the family of hyperbolastic curves is presented. The motivation behind this approach stems from the fact that all hyperbolastic curves verify a linear differential equation of the Malthusian type. By virtue of this, and by adding a multiplicative noise to said ordinary differential equation, a diffusion process may be associated with each curve whose mean function is said curve. The inference in the resulting processes is presented jointly, as well as the strategies developed to obtain the initial solutions necessary for the numerical resolution of the system of equations resulting from the application of the maximum likelihood method. The common perspective presented is especially useful for the implementation of the necessary procedures for fitting the models to real data. Some examples based on simulated data support the suitability of the development described in the present paper.


Sign in / Sign up

Export Citation Format

Share Document