scholarly journals Unbalanced Sample Size Introduces Spurious Correlations to Genome-wide Heterozygosity Analyses

2020 ◽  
Author(s):  
Li Liu ◽  
Richard J Caselli

AbstractExcess of heterozygosity (H) is a widely used measure of genetic diversity of a population. As high-throughput sequencing and genotyping data become readily available, it has been applied to investigating the associations of genome-wide genetic diversity with human diseases and traits. However, these studies often report contradictory results. In this paper, we present a meta-analysis of five whole-exome studies to examine the association of H scores with Alzheimer’s disease. We show that the mean H score of a group is not associated with the disease status, but is associated with the sample size. Across all five studies, the group with more samples has a significantly lower H score than the group with fewer samples. To remove potential confounders in empirical data sets, we perform computer simulations to create artificial genomes controlled for the number of polymorphic loci, the sample size and the allele frequency. Analyses of these simulated data confirm the negative correlation between the sample size and the H score. Furthermore, we find that genomes with a large number of rare variants also have inflated H scores. These biases altogether can lead to spurious associations between genetic diversity and the phenotype of interest. Based on these findings, we advocate that studies shall balance the sample sizes when using genome-wide H scores to assess genetic diversities of different populations, which helps improve the reproducibility of future research.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jaakko Laaksonen ◽  
Pashupati P. Mishra ◽  
Ilkka Seppälä ◽  
Leo-Pekka Lyytikäinen ◽  
Emma Raitoharju ◽  
...  

AbstractHigh blood pressure (BP) is a major risk factor for many noncommunicable diseases. The effect of mitochondrial DNA single-nucleotide polymorphisms (mtSNPs) on BP is less known than that of nuclear SNPs. We investigated the mitochondrial genetic determinants of systolic, diastolic, and mean arterial BP. MtSNPs were determined from peripheral blood by sequencing or with genome-wide association study SNP arrays in two independent Finnish cohorts, the Young Finns Study and the Finnish Cardiovascular Study, respectively. In total, over 4200 individuals were included. The effects of individual common mtSNPs, with an additional focus on sex-specificity, and aggregates of rare mtSNPs grouped by mitochondrial genes were evaluated by meta-analysis of linear regression and a sequence kernel association test, respectively. We accounted for the predicted pathogenicity of the rare variants within protein-encoding and the tRNA regions. In the meta-analysis of 87 common mtSNPs, we did not observe significant associations with any of the BP traits. Sex-specific and rare-variant analyses did not pinpoint any significant associations either. Our results are in agreement with several previous studies suggesting that mtDNA variation does not have a significant role in the regulation of BP. Future studies might need to reconsider the mechanisms thought to link mtDNA with hypertension.


2005 ◽  
Vol 37 (12) ◽  
pp. 1320-1322 ◽  
Author(s):  
Eleftheria Zeggini ◽  
William Rayner ◽  
Andrew P Morris ◽  
Andrew T Hattersley ◽  
Mark Walker ◽  
...  

2018 ◽  
Vol 21 (2) ◽  
pp. 84-88 ◽  
Author(s):  
W. David Hill

Intelligence and educational attainment are strongly genetically correlated. This relationship can be exploited by Multi-Trait Analysis of GWAS (MTAG) to add power to Genome-wide Association Studies (GWAS) of intelligence. MTAG allows the user to meta-analyze GWASs of different phenotypes, based on their genetic correlations, to identify association's specific to the trait of choice. An MTAG analysis using GWAS data sets on intelligence and education was conducted by Lam et al. (2017). Lam et al. (2017) reported 70 loci that they described as ‘trait specific’ to intelligence. This article examines whether the analysis conducted by Lam et al. (2017) has resulted in genetic information about a phenotype that is more similar to education than intelligence.


2015 ◽  
Vol 2015 ◽  
pp. 1-5 ◽  
Author(s):  
Yuxiang Tan ◽  
Yann Tambouret ◽  
Stefano Monti

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.


2020 ◽  
Author(s):  
Samuel Hokin ◽  
Alan Cleary ◽  
Joann Mudge

Complex diseases, with many associated genetic and environmental factors, are a challenging target for genomic risk assessment. Genome-wide association studies (GWAS) associate disease status with, and compute risk from, individual common variants, which can be problematic for diseases with many interacting or rare variants. In addition, GWAS typically employ a reference genome which is not built from the subjects of the study, whose genetic background may differ from the reference and whose genetic characterization may be limited. We present a complementary method based on disease association with collections of genotypes, called frequented regions, on a pangenomic graph built from subjects' genomes. We introduce the pangenomic genotype graph, which is better suited than sequence graphs to human disease studies. Our method draws out collections of features, across multiple genomic segments, which are associated with disease status. We show that the frequented regions method consistently improves machine-learning classification of disease status over GWAS classification, allowing incorporation of rare or interacting variants. Notably, genomic segments that have few or no variants of genome-wide significance (p<5x10-8) provide much-improved classification with frequented regions, encouraging their application across the entire genome. Frequented regions may also be utilized for purposes such as choice of treatment in addition to prediction of disease risk.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hung-Hsin Chen ◽  
Lauren E. Petty ◽  
Jin Sha ◽  
Yi Zhao ◽  
Amanda Kuzma ◽  
...  

AbstractLate-onset Alzheimer disease (LOAD) is highly polygenic, with a heritability estimated between 40 and 80%, yet risk variants identified in genome-wide studies explain only ~8% of phenotypic variance. Due to its increased power and interpretability, genetically regulated expression (GReX) analysis is an emerging approach to investigate the genetic mechanisms of complex diseases. Here, we conducted GReX analysis within and across 51 tissues on 39 LOAD GWAS data sets comprising 58,713 cases and controls from the Alzheimer’s Disease Genetics Consortium (ADGC) and the International Genomics of Alzheimer’s Project (IGAP). Meta-analysis across studies identified 216 unique significant genes, including 72 with no previously reported LOAD GWAS associations. Cross-brain-tissue and cross-GTEx models revealed eight additional genes significantly associated with LOAD. Conditional analysis of previously reported loci using established LOAD-risk variants identified eight genes reaching genome-wide significance independent of known signals. Moreover, the proportion of SNP-based heritability is highly enriched in genes identified by GReX analysis. In summary, GReX-based meta-analysis in LOAD identifies 216 genes (including 72 novel genes), illuminating the role of gene regulatory models in LOAD.


2018 ◽  
Author(s):  
BW Kunkle ◽  
B Grenier-Boley ◽  
R Sims ◽  
JC Bis ◽  
AC Naj ◽  
...  

IntroductionLate-onset Alzheimer’s disease (LOAD, onset age > 60 years) is the most prevalent dementia in the elderly1, and risk is partially driven by genetics2. Many of the loci responsible for this genetic risk were identified by genome-wide association studies (GWAS)3–8. To identify additional LOAD risk loci, the we performed the largest GWAS to date (89,769 individuals), analyzing both common and rare variants. We confirm 20 previous LOAD risk loci and identify four new genome-wide loci (IQCK, ACE, ADAM10, and ADAMTS1). Pathway analysis of these data implicates the immune system and lipid metabolism, and for the first time tau binding proteins and APP metabolism. These findings show that genetic variants affecting APP and Aβ processing are not only associated with early-onset autosomal dominant AD but also with LOAD. Analysis of AD risk genes and pathways show enrichment for rare variants (P = 1.32 × 10−7) indicating that additional rare variants remain to be identified.


2020 ◽  
Vol 7 (4) ◽  
pp. 55
Author(s):  
Marcel Grunert ◽  
Sandra Appelt ◽  
Paul Grossfeld ◽  
Silke R. Sperling

Congenital heart defects (CHDs) are the most common birth defect in human with an incidence of almost 1% of all live births. Most cases have a multifactorial origin with both genetics and the environment playing a role in its development and progression. Adding an epigenetic component to this aspect is exemplified by monozygotic twins which share the same genetic background but have a different disease status. As a result, the interplay between the genetic, epigenetic and the environmental conditions might contribute to the etiology and phenotype. To date, the underlying causes of the majority of CHDs remain poorly understood. In this study, we performed genome-wide high-throughput sequencing to examine the genetic, structural genomic and epigenetic differences of two identical twin pairs discordant for Tetralogy of Fallot (TOF), representing the most common cyanotic form of CHDs. Our results show the almost identical genetic and structural genomic identity of the twins. In contrast, several epigenetic alterations could be observed given by DNA methylation changes in regulatory regions of known cardiac-relevant genes. Overall, this study provides first insights into the impact of genetic and especially epigenetic factors underlying monozygotic twins discordant for CHD like TOF.


2016 ◽  
Vol 34 (7) ◽  
pp. 1042-1068 ◽  
Author(s):  
Mohammad Nejad

Purpose The purpose of this paper is to present a systematic overview of the current state of research on innovations in financial services and identifies the areas that have received less attention, and hence offer opportunities for future research. Design/methodology/approach An extensive search identified 121 research papers that have studied innovations in financial services from January 1990 to March 2015. A thorough content analysis objectively organized and coded the studies based on various aspects including publication year, focus of study, methodology, unit of analysis, sample, data analysis method, and geographical region. Analysis of the resulting data presents an overview of the research and identifies areas for future research. Findings The findings indicate that research on innovations in financial services is diverse and has explored various topics. The findings summarize the research papers with regards to each of the aforementioned aspects and offer researchers directions for future research. Research limitations/implications The sample size of 121 articles is an adequate sample size for the purpose of the study and it is in line with similar studies on innovations in other areas. However, future research can expand the study to include more academic journals in addition to reviewing and synthesizing the qualitative aspects of studies and meta-analysis of the identified relationships. Originality/value The study is the first to present a holistic overview of the current state of research on innovations in financial services. The findings offer clear directions to researchers for future research and hence can be used to promote research in these areas.


Sign in / Sign up

Export Citation Format

Share Document