Unbalanced Sample Size Introduces Spurious Correlations to Genome-wide Heterozygosity Analyses

Mapping Intimacies ◽

10.1101/2020.02.06.937599 ◽

2020 ◽

Author(s):

Li Liu ◽

Richard J Caselli

Keyword(s):

Genetic Diversity ◽

Sample Size ◽

High Throughput Sequencing ◽

Rare Variants ◽

Meta Analysis ◽

Simulated Data ◽

Disease Status ◽

Future Research ◽

Data Sets ◽

Genome Wide

AbstractExcess of heterozygosity (H) is a widely used measure of genetic diversity of a population. As high-throughput sequencing and genotyping data become readily available, it has been applied to investigating the associations of genome-wide genetic diversity with human diseases and traits. However, these studies often report contradictory results. In this paper, we present a meta-analysis of five whole-exome studies to examine the association of H scores with Alzheimer’s disease. We show that the mean H score of a group is not associated with the disease status, but is associated with the sample size. Across all five studies, the group with more samples has a significantly lower H score than the group with fewer samples. To remove potential confounders in empirical data sets, we perform computer simulations to create artificial genomes controlled for the number of polymorphic loci, the sample size and the allele frequency. Analyses of these simulated data confirm the negative correlation between the sample size and the H score. Furthermore, we find that genomes with a large number of rare variants also have inflated H scores. These biases altogether can lead to spurious associations between genetic diversity and the phenotype of interest. Based on these findings, we advocate that studies shall balance the sample sizes when using genome-wide H scores to assess genetic diversities of different populations, which helps improve the reproducibility of future research.

Download Full-text

Examining the effect of mitochondrial DNA variants on blood pressure in two Finnish cohorts

Scientific Reports ◽

10.1038/s41598-020-79931-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jaakko Laaksonen ◽

Pashupati P. Mishra ◽

Ilkka Seppälä ◽

Leo-Pekka Lyytikäinen ◽

Emma Raitoharju ◽

...

Keyword(s):

Blood Pressure ◽

Mitochondrial Dna ◽

Genome Wide Association Study ◽

Rare Variants ◽

Meta Analysis ◽

Noncommunicable Diseases ◽

Nucleotide Polymorphisms ◽

Genetic Determinants ◽

Genome Wide ◽

Mitochondrial Dna Variants

AbstractHigh blood pressure (BP) is a major risk factor for many noncommunicable diseases. The effect of mitochondrial DNA single-nucleotide polymorphisms (mtSNPs) on BP is less known than that of nuclear SNPs. We investigated the mitochondrial genetic determinants of systolic, diastolic, and mean arterial BP. MtSNPs were determined from peripheral blood by sequencing or with genome-wide association study SNP arrays in two independent Finnish cohorts, the Young Finns Study and the Finnish Cardiovascular Study, respectively. In total, over 4200 individuals were included. The effects of individual common mtSNPs, with an additional focus on sex-specificity, and aggregates of rare mtSNPs grouped by mitochondrial genes were evaluated by meta-analysis of linear regression and a sequence kernel association test, respectively. We accounted for the predicted pathogenicity of the rare variants within protein-encoding and the tRNA regions. In the meta-analysis of 87 common mtSNPs, we did not observe significant associations with any of the BP traits. Sex-specific and rare-variant analyses did not pinpoint any significant associations either. Our results are in agreement with several previous studies suggesting that mtDNA variation does not have a significant role in the regulation of BP. Future studies might need to reconsider the mechanisms thought to link mtDNA with hypertension.

Download Full-text

An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets

Nature Genetics ◽

10.1038/ng1670 ◽

2005 ◽

Vol 37 (12) ◽

pp. 1320-1322 ◽

Cited By ~ 76

Author(s):

Eleftheria Zeggini ◽

William Rayner ◽

Andrew P Morris ◽

Andrew T Hattersley ◽

Mark Walker ◽

...

Keyword(s):

Sample Size ◽

Large Scale ◽

Simulated Data ◽

Data Sets ◽

Hapmap Sample ◽

Tagging Snp ◽

Simulated Data Sets

Download Full-text

Comment on ‘Large-Scale Cognitive GWAS Meta-Analysis Reveals Tissue-Specific Neural Expression and Potential Nootropic Drug Targets’ by Lam et al.

Twin Research and Human Genetics ◽

10.1017/thg.2018.12 ◽

2018 ◽

Vol 21 (2) ◽

pp. 84-88 ◽

Cited By ~ 6

Author(s):

W. David Hill

Keyword(s):

Genetic Information ◽

Drug Targets ◽

Large Scale ◽

Association Studies ◽

Meta Analysis ◽

Genetic Correlations ◽

Data Sets ◽

Genome Wide Association Studies ◽

Nootropic Drug ◽

Genome Wide

Intelligence and educational attainment are strongly genetically correlated. This relationship can be exploited by Multi-Trait Analysis of GWAS (MTAG) to add power to Genome-wide Association Studies (GWAS) of intelligence. MTAG allows the user to meta-analyze GWASs of different phenotypes, based on their genetic correlations, to identify association's specific to the trait of choice. An MTAG analysis using GWAS data sets on intelligence and education was conducted by Lam et al. (2017). Lam et al. (2017) reported 70 loci that they described as ‘trait specific’ to intelligence. This article examines whether the analysis conducted by Lam et al. (2017) has resulted in genetic information about a phenotype that is more similar to education than intelligence.

Download Full-text

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

BioMed Research International ◽

10.1155/2015/780519 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Yuxiang Tan ◽

Yann Tambouret ◽

Stefano Monti

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Simulated Data ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Detection Algorithms ◽

Fusion Detection

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text

Evaluation of association tests for rare variants using simulated data sets in the Genetic Analysis Workshop 17 data

BMC Proceedings ◽

10.1186/1753-6561-5-s9-s86 ◽

2011 ◽

Vol 5 (S9) ◽

Cited By ~ 2

Author(s):

Wenan Chen ◽

Xi Gao ◽

Jiexun Wang ◽

Chuanyu Sun ◽

Wen Wan ◽

...

Keyword(s):

Genetic Analysis ◽

Genetic Analysis Workshop ◽

Rare Variants ◽

Simulated Data ◽

Data Sets ◽

Association Tests ◽

Simulated Data Sets

Download Full-text

Disease association with frequented regions of genotype graphs

10.1101/2020.09.25.20201640 ◽

2020 ◽

Author(s):

Samuel Hokin ◽

Alan Cleary ◽

Joann Mudge

Keyword(s):

Rare Variants ◽

Disease Risk ◽

Association Studies ◽

Disease Status ◽

Disease Association ◽

Genome Wide Association Studies ◽

Entire Genome ◽

Machine Learning Classification ◽

Complementary Method ◽

Genome Wide

Complex diseases, with many associated genetic and environmental factors, are a challenging target for genomic risk assessment. Genome-wide association studies (GWAS) associate disease status with, and compute risk from, individual common variants, which can be problematic for diseases with many interacting or rare variants. In addition, GWAS typically employ a reference genome which is not built from the subjects of the study, whose genetic background may differ from the reference and whose genetic characterization may be limited. We present a complementary method based on disease association with collections of genotypes, called frequented regions, on a pangenomic graph built from subjects' genomes. We introduce the pangenomic genotype graph, which is better suited than sequence graphs to human disease studies. Our method draws out collections of features, across multiple genomic segments, which are associated with disease status. We show that the frequented regions method consistently improves machine-learning classification of disease status over GWAS classification, allowing incorporation of rare or interacting variants. Notably, genomic segments that have few or no variants of genome-wide significance (p<5x10-8) provide much-improved classification with frequented regions, encouraging their application across the entire genome. Frequented regions may also be utilized for purposes such as choice of treatment in addition to prediction of disease risk.

Download Full-text

Genetically regulated expression in late-onset Alzheimer’s disease implicates risk genes within known and novel loci

Translational Psychiatry ◽

10.1038/s41398-021-01677-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hung-Hsin Chen ◽

Lauren E. Petty ◽

Jin Sha ◽

Yi Zhao ◽

Amanda Kuzma ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Late Onset ◽

Meta Analysis ◽

Data Sets ◽

Phenotypic Variance ◽

Risk Variants ◽

Genome Wide ◽

Regulated Expression ◽

Genetic Mechanisms

AbstractLate-onset Alzheimer disease (LOAD) is highly polygenic, with a heritability estimated between 40 and 80%, yet risk variants identified in genome-wide studies explain only ~8% of phenotypic variance. Due to its increased power and interpretability, genetically regulated expression (GReX) analysis is an emerging approach to investigate the genetic mechanisms of complex diseases. Here, we conducted GReX analysis within and across 51 tissues on 39 LOAD GWAS data sets comprising 58,713 cases and controls from the Alzheimer’s Disease Genetics Consortium (ADGC) and the International Genomics of Alzheimer’s Project (IGAP). Meta-analysis across studies identified 216 unique significant genes, including 72 with no previously reported LOAD GWAS associations. Cross-brain-tissue and cross-GTEx models revealed eight additional genes significantly associated with LOAD. Conditional analysis of previously reported loci using established LOAD-risk variants identified eight genes reaching genome-wide significance independent of known signals. Moreover, the proportion of SNP-based heritability is highly enriched in genes identified by GReX analysis. In summary, GReX-based meta-analysis in LOAD identifies 216 genes (including 72 novel genes), illuminating the role of gene regulatory models in LOAD.

Download Full-text

Meta-analysis of genetic association with diagnosed Alzheimer’s disease identifies novel risk loci and implicates Abeta, Tau, immunity and lipid processing

10.1101/294629 ◽

2018 ◽

Cited By ~ 9

Author(s):

BW Kunkle ◽

B Grenier-Boley ◽

R Sims ◽

JC Bis ◽

AC Naj ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Rare Variants ◽

Late Onset ◽

Association Studies ◽

Meta Analysis ◽

The Elderly ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Load Risk

IntroductionLate-onset Alzheimer’s disease (LOAD, onset age > 60 years) is the most prevalent dementia in the elderly1, and risk is partially driven by genetics2. Many of the loci responsible for this genetic risk were identified by genome-wide association studies (GWAS)3–8. To identify additional LOAD risk loci, the we performed the largest GWAS to date (89,769 individuals), analyzing both common and rare variants. We confirm 20 previous LOAD risk loci and identify four new genome-wide loci (IQCK, ACE, ADAM10, and ADAMTS1). Pathway analysis of these data implicates the immune system and lipid metabolism, and for the first time tau binding proteins and APP metabolism. These findings show that genetic variants affecting APP and Aβ processing are not only associated with early-onset autosomal dominant AD but also with LOAD. Analysis of AD risk genes and pathways show enrichment for rare variants (P = 1.32 × 10−7) indicating that additional rare variants remain to be identified.

Download Full-text

The Needle in the Haystack—Searching for Genetic and Epigenetic Differences in Monozygotic Twins Discordant for Tetralogy of Fallot

Journal of Cardiovascular Development and Disease ◽

10.3390/jcdd7040055 ◽

2020 ◽

Vol 7 (4) ◽

pp. 55

Author(s):

Marcel Grunert ◽

Sandra Appelt ◽

Paul Grossfeld ◽

Silke R. Sperling

Keyword(s):

Tetralogy Of Fallot ◽

High Throughput Sequencing ◽

Heart Defects ◽

Monozygotic Twins ◽

Disease Status ◽

Epigenetic Alterations ◽

Structural Genomic ◽

Genome Wide ◽

Underlying Causes ◽

The Impact

Congenital heart defects (CHDs) are the most common birth defect in human with an incidence of almost 1% of all live births. Most cases have a multifactorial origin with both genetics and the environment playing a role in its development and progression. Adding an epigenetic component to this aspect is exemplified by monozygotic twins which share the same genetic background but have a different disease status. As a result, the interplay between the genetic, epigenetic and the environmental conditions might contribute to the etiology and phenotype. To date, the underlying causes of the majority of CHDs remain poorly understood. In this study, we performed genome-wide high-throughput sequencing to examine the genetic, structural genomic and epigenetic differences of two identical twin pairs discordant for Tetralogy of Fallot (TOF), representing the most common cyanotic form of CHDs. Our results show the almost identical genetic and structural genomic identity of the twins. In contrast, several epigenetic alterations could be observed given by DNA methylation changes in regulatory regions of known cardiac-relevant genes. Overall, this study provides first insights into the impact of genetic and especially epigenetic factors underlying monozygotic twins discordant for CHD like TOF.

Download Full-text

Research on financial services innovations

International Journal of Bank Marketing ◽

10.1108/ijbm-08-2015-0129 ◽

2016 ◽

Vol 34 (7) ◽

pp. 1042-1068 ◽

Cited By ~ 4

Author(s):

Mohammad Nejad

Keyword(s):

Sample Size ◽

Financial Services ◽

Meta Analysis ◽

Future Research ◽

Research Papers ◽

Content Type ◽

Current State ◽

Sample Data ◽

Study Methodology ◽

State Of Research

Purpose The purpose of this paper is to present a systematic overview of the current state of research on innovations in financial services and identifies the areas that have received less attention, and hence offer opportunities for future research. Design/methodology/approach An extensive search identified 121 research papers that have studied innovations in financial services from January 1990 to March 2015. A thorough content analysis objectively organized and coded the studies based on various aspects including publication year, focus of study, methodology, unit of analysis, sample, data analysis method, and geographical region. Analysis of the resulting data presents an overview of the research and identifies areas for future research. Findings The findings indicate that research on innovations in financial services is diverse and has explored various topics. The findings summarize the research papers with regards to each of the aforementioned aspects and offer researchers directions for future research. Research limitations/implications The sample size of 121 articles is an adequate sample size for the purpose of the study and it is in line with similar studies on innovations in other areas. However, future research can expand the study to include more academic journals in addition to reviewing and synthesizing the qualitative aspects of studies and meta-analysis of the identified relationships. Originality/value The study is the first to present a holistic overview of the current state of research on innovations in financial services. The findings offer clear directions to researchers for future research and hence can be used to promote research in these areas.

Download Full-text