scholarly journals A performance assessment of relatedness inference methods using genome-wide data from thousands of relatives

2017 ◽  
Author(s):  
Monica D. Ramstetter ◽  
Thomas D. Dyer ◽  
Donna M. Lehman ◽  
Joanne E. Curran ◽  
Ravindranath Duggirala ◽  
...  

AbstractInferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a dataset with 2,485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (~92% – 99%) when detecting first and second degree relationships, but their accuracy dwindles to less than 43% for seventh degree relationships. However, most IBD segment-based methods inferred seventh degree relatives correct to within one relatedness degree for more than 76% of relative pairs. Overall, the most accurate methods are ERSA and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches—such as new methods that leverage relatedness signals from multiple samples—are needed to achieve a sizeable jump in performance.

GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


2007 ◽  
Vol 16 (20) ◽  
pp. 2494-2505 ◽  
Author(s):  
Yasuhito Nannya ◽  
Kenjiro Taura ◽  
Mineo Kurokawa ◽  
Shigeru Chiba ◽  
Seishi Ogawa

Author(s):  
Guia Guffanti ◽  
Milissa L. Kaufman ◽  
Lauren A. M. Lebois ◽  
Kerry J. Ressler

Post-traumatic stress disorder (PTSD) is a debilitating psychiatric disorder with an estimated genetic component accounting for 30%–40% of the variance contributing to risk for the disease. This chapter starts with a review of the biological hypotheses and related genetic mechanisms currently proposed to be associated with PTSD and trauma-related disorders. It will follow with a description of the state-of-the-art on the methodologies and their application to map genetic loci and identify biomarkers associated with PTSD. Finally, we will review the latest results from genome-wide association studies of genetic variants as well as those derived from the emerging fields of epigenetics and gene expression.


Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1300 ◽  
Author(s):  
Elisabetta Manca ◽  
Alberto Cesarani ◽  
Giustino Gaspa ◽  
Silvia Sorbolini ◽  
Nicolò P.P. Macciotta ◽  
...  

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.


2014 ◽  
Vol 618 ◽  
pp. 278-282
Author(s):  
Tao Peng ◽  
Hao Wang ◽  
Yi Ran Wang ◽  
Wen Wen Xie ◽  
Jia Wei Luo

With the completion of the international HapMap project and the development of high-throughput technologies, designing more effective epistasis detection algorithm for genome-wide data poses a significant challenge. This paper proposes a new method based on the Markov blanket to solve the limitations of the existing algorithm, such as a large false-positive proportion and low accuracy. The algorithm uses G2 to judge the strength of correlation between variables of self-adaptive remove strategy and SNP matching method; to effectively eliminate variables that are unrelated to the target, as well as weak correlation between variables; to significantly reduce the search space and time; to prevent unnecessary retrieval analysis; and to improve the accuracy of the detection algorithm to a certain extent.


Cephalalgia ◽  
2014 ◽  
Vol 35 (6) ◽  
pp. 489-499 ◽  
Author(s):  
Dale R Nyholt ◽  
Verneri Anttila ◽  
Bendik S Winsvold ◽  
Tobias Kurth ◽  
Hreinn Stefansson ◽  
...  

Background There has been intensive debate whether migraine with aura (MA) and migraine without aura (MO) should be considered distinct subtypes or part of the same disease spectrum. There is also discussion to what extent migraine cases collected in specialised headache clinics differ from cases from population cohorts, and how female cases differ from male cases with respect to their migraine. To assess the genetic overlap between these migraine subgroups, we examined genome-wide association (GWA) results from analysis of 23,285 migraine cases and 95,425 population-matched controls. Methods Detailed heterogeneity analysis of single-nucleotide polymorphism (SNP) effects (odds ratios) between migraine subgroups was performed for the 12 independent SNP loci significantly associated ( p < 5 × 10−8; thus surpassing the threshold for genome-wide significance) with migraine susceptibility. Overall genetic overlap was assessed using SNP effect concordance analysis (SECA) at over 23,000 independent SNPs. Results Significant heterogeneity of SNP effects ( phet < 1.4 × 10−3) was observed between the MA and MO subgroups (for SNP rs9349379), and between the clinic- and population-based subgroups (for SNPs rs10915437, rs6790925 and rs6478241). However, for all 12 SNPs the risk-increasing allele was the same, and SECA found the majority of genome-wide SNP effects to be in the same direction across the subgroups. Conclusions Any differences in common genetic risk across these subgroups are outweighed by the similarities. Meta-analysis of additional migraine GWA datasets, regardless of their major subgroup composition, will identify new susceptibility loci for migraine.


2010 ◽  
Vol 25 (5) ◽  
pp. 307-309 ◽  
Author(s):  
J. Lasky-Su ◽  
C. Lange

AbstractThe etiology of suicide is complex in nature with both environmental and genetic causes that are extremely diverse. This extensive heterogeneity weakens the relationship between genotype and phenotype and as a result, we face many challenges when studying the genetic etiology of suicide. We are now in the midst of a genetics revolution, where genotyping costs are decreasing and genotyping speed is increasing at a fast rate, allowing genetic association studies to genotype thousands to millions of SNPs that cover the entire human genome. As such, genome-wide association studies (GWAS) are now the norm. In this article we address several statistical challenges that occur when studying the genetic etiology of suicidality in the age of the genetics revolution. These challenges include: (1) the large number of statistical tests; (2) complex phenotypes that are difficult to quantify; and (3) modest genetic effect sizes. We address these statistical issues in the context of family-based study designs. Specifically, we discuss several statistical extensions of family-based association tests (FBATs) that work to alleviate these challenges. As our intention is to describe how statistical methodology may work to identify disease variants for suicidality, we avoid the mathematical details of the methodologies presented.


2011 ◽  
Vol 09 (05) ◽  
pp. 631-645 ◽  
Author(s):  
WENLONG TANG ◽  
HONGBAO CAO ◽  
JUNBO DUAN ◽  
YU-PING WANG

With the development of genomic techniques, the demand for new methods that can handle high-throughput genome-wide data effectively is becoming stronger than ever before. Compressed sensing (CS) is an emerging approach in statistics and signal processing. With the CS theory, a signal can be uniquely reconstructed or approximated from its sparse representations, which can therefore better distinguish different types of signals. However, the application of CS approach to genome-wide data analysis has been rarely investigated. We propose a novel CS-based approach for genomic data classification and test its performance in the subtyping of leukemia through gene expression analysis. The detection of subtypes of cancers such as leukemia according to different genetic markups is significant, which holds promise for the individualization of therapies and improvement of treatments. In our work, four statistical features were employed to select significant genes for the classification. With our selected genes out of 7,129 ones, the proposed CS method achieved a classification accuracy of 97.4% when evaluated with the cross validation and 94.3% when evaluated with another independent data set. The robustness of the method to noise was also tested, giving good performance. Therefore, this work demonstrates that the CS method can effectively detect subtypes of leukemia, implying improved accuracy of diagnosis of leukemia.


Genetics ◽  
2017 ◽  
Vol 207 (1) ◽  
pp. 75-82 ◽  
Author(s):  
Monica D. Ramstetter ◽  
Thomas D. Dyer ◽  
Donna M. Lehman ◽  
Joanne E. Curran ◽  
Ravindranath Duggirala ◽  
...  

2017 ◽  
Vol 28 (7) ◽  
pp. 1927-1941
Author(s):  
Jiyuan Hu ◽  
Wei Zhang ◽  
Xinmin Li ◽  
Dongdong Pan ◽  
Qizhai Li

In the past decade, genome-wide association studies have identified thousands of susceptible variants associated with complex human diseases and traits. Conducting follow-up genetic association studies has become a standard approach to validate the findings of genome-wide association studies. One problem of high interest in genetic association studies is to accurately estimate the strength of the association, which is often quantified by odds ratios in case-control studies. However, estimating the association directly by follow-up studies is inefficient since this approach ignores information from the genome-wide association studies. In this article, an estimator called GFcom, which integrates information from genome-wide association studies and follow-up studies, is proposed. The estimator includes both the point estimate and corresponding confidence interval. GFcom is more efficient than competing estimators regarding MSE and the length of confidence intervals. The superiority of GFcom is particularly evident when the genome-wide association study suffers from severe selection bias. Comprehensive simulation studies and applications to three real follow-up studies demonstrate the performance of the proposed estimator. An R package, “GFcom”, implementing our method is publicly available at https://github.com/JiyuanHu/GFcom .


Sign in / Sign up

Export Citation Format

Share Document