Detecting rare haplotypes associated with complex diseases using both population and family data: Combined logistic Bayesian Lasso

2020 ◽  
Vol 29 (11) ◽  
pp. 3340-3350
Author(s):  
Xiaofei Zhou ◽  
Meng Wang ◽  
Shili Lin

Haplotype-based association methods have been developed to understand the genetic architecture of complex diseases. Compared to single-variant-based methods, haplotype methods are thought to be more biologically relevant, since there are typically multiple non-independent genetic variants involved in complex diseases, and the use of haplotypes implicitly accounts for non-independence caused by linkage disequilibrium. In recent years, with the focus moving from common to rare variants, haplotype-based methods have also evolved accordingly to uncover the roles of rare haplotypes. One particular approach is regularization-based, with the use of Bayesian least absolute shrinkage and selection operator (Lasso) as an example. This type of methods has been developed for either case-control population data (the logistic Bayesian Lasso (LBL)) or family data (family-triad-based logistic Bayesian Lasso (famLBL)). In some situations, both family data and case-control data are available; therefore, it would be a waste of resources if only one of them could be analyzed. To make full usage of available data to increase power, we propose a unified approach that can combine both case-control and family data (combined logistic Bayesian Lasso (cLBL)). Through simulations, we characterized the performance of cLBL and showed the advantage of cLBL over existing methods. We further applied cLBL to the Framingham Heart Study data to demonstrate its utility in real data applications.

2015 ◽  
Vol 14s2 ◽  
pp. CIN.S17290 ◽  
Author(s):  
Yuan Zhang ◽  
Swati Biswas

The importance of haplotype association and gene-environment interactions (GxE) in the context of rare variants has been underlined in voluminous literature. Recently, a software based on logistic Bayesian LASSO (LBL) was proposed for detecting GxE, where G is a rare (or common) haplotype variant (rHTV)-it is called LBL-GxE. However, it required relatively long computation time and could handle only one environmental covariate with two levels. Here we propose an improved version of LBL-GxE, which is not only computationally faster but can also handle multiple covariates, each with multiple levels. We also discuss details of the software, including input, output, and some options. We apply LBL-GxE to a lung cancer dataset and find a rare haplotype with protective effect for current smokers. Our results indicate that LBL-GxE, especially with the improvements proposed here, is a useful and computationally viable tool for investigating rare haplotype interactions.


2018 ◽  
Vol 12 (S9) ◽  
Author(s):  
Xiaofei Zhou ◽  
Meng Wang ◽  
Han Zhang ◽  
William C. L. Stewart ◽  
Shili Lin

Genetics ◽  
2000 ◽  
Vol 155 (3) ◽  
pp. 1369-1378 ◽  
Author(s):  
Grant A Walling ◽  
Peter M Visscher ◽  
Leif Andersson ◽  
Max F Rothschild ◽  
Lizhen Wang ◽  
...  

Abstract For many species several similar QTL mapping populations have been produced and analyzed independently. Joint analysis of such data could be used to increase power to detect QTL and evaluate population differences. In this study, data were collated on almost 3000 pigs from seven different F2 crosses between Western commercial breeds and either the European wild boar or the Chinese Meishan breed. Genotypes were available for 31 markers on chromosome 4 (on average 8.3 markers per population). Data from three traits common to all populations (birth weight, mean backfat depth at slaughter or end of test, and growth rate from birth to slaughter or end of test) were analyzed for individual populations and jointly. A QTL influencing birth weight was detected in one individual population and in the combined data, with no significant interaction of the QTL effect with population. A QTL affecting backfat that had a significantly greater effect in wild boar than in Meishan crosses was detected. Some evidence for a QTL affecting growth rate was detected in all populations, with no significant differences between populations. This study is the largest F2 QTL analysis achieved in a livestock species and demonstrates the potential of joint analysis.


2017 ◽  
Vol 111 (4) ◽  
pp. 637-652 ◽  
Author(s):  
BRYN ROSENFELD

A large literature expects rising middle classes to promote democracy. However, few studies provide direct evidence on this group in nondemocratic settings. This article focuses on politically important differentiation within the middle classes, arguing that middle-class growth in state-dependent sectors weakens potential coalitions in support of democratization. I test this argument using surveys conducted at mass demonstrations in Russia and detailed population data. I also present a new approach to studying protest based on case-control methods from epidemiology. The results reveal that state-sector professionals were significantly less likely to mobilize against electoral fraud, even after controlling for ideology. If this group had participated at the same rate as middle-class professionals from the private sector, I estimate that another 90,000 protesters would have taken to the streets. I trace these patterns of participation to the interaction of individual resources and selective incentives. These findings have implications for authoritarian stability and democratic transitions.


2019 ◽  
Vol 29 (2) ◽  
pp. 589-602
Author(s):  
Chan Wang ◽  
Shufang Deng ◽  
Leiming Sun ◽  
Liming Li ◽  
Yue-Qing Hu

The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance–covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Bing Song ◽  
August E. Woerner ◽  
John Planz

Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html.


2012 ◽  
Vol 53 ◽  
Author(s):  
Gintautas Jakimauskas ◽  
Leonidas Sakalauskas

The efficiency of adding an auxiliary regression variable to the logit model in estimation of small probabilities in large populations is considered. Let us consider two models of distribution of unknown probabilities: the probabilities have gamma distribution (model (A)), or logits of the probabilities have Gaussian distribution (model (B)). In modification of model (B) we will use additional regression variable for Gaussian mean (model (BR)). We have selected real data from Database of Indicators of Statistics Lithuania – Working-age persons recognized as disabled for the first time by administrative territory, year 2010 (number of populations K = 60). Additionally, we have used average annual population data by administrative territory. The auxiliary regression variable was based on data – Number of hospital discharges by administrative territory, year 2010. We obtained initial parameters using simple iterative procedures for models (A), (B) and (BR). At the second stage we performed various tests using Monte-Carlo simulation (using models (A), (B) and (BR)). The main goal was to select an appropriate model and to propose some recommendations for using gamma and logit (with or without auxiliary regression variable) models for Bayesian estimation. The results show that a Monte Carlo simulation method enables us to determine which estimation model is preferable.


Author(s):  
Sana Amanat ◽  
Teresa Requena ◽  
Jose Antonio Lopez-Escamez

Exome sequencing has been commonly used in rare diseases by selecting multiplex families or singletons with an extreme phenotype (EP) to search for rare variants in coding regions. The EP strategy covers both extreme ends of a disease spectrum and it has been also used to investigate the contribution of rare variants to heritability in complex clinical traits. We have conducted a systematic review to find evidence supporting the use of EP strategies to search for rare variants in genetic studies of complex diseases, to highlight the contribution of rare variation to the genetic structure of multiallelic conditions. After performing the quality assessment of the retrieved records, we selected 19 genetic studies considering EP to demonstrate genetic association. All the studies successfully identified several rare variants, de novo mutations and many novel candidate genes were also identified by selecting an EP. There is enough evidence to support that the EP approach in patients with an early onset of the disease can contribute to the identification of rare variants in candidate genes or pathways involved in complex diseases. EP patients may contribute to a better understanding of the underlying genetic architecture of common heterogeneous disorders such as tinnitus or age-related hearing loss.


Sign in / Sign up

Export Citation Format

Share Document