Cohort Profile: Genetic data in the German Socio-Economic Panel Innovation Sample (Gene-SOEP)

The German Socio-Economic Panel (SOEP) serves a global research community by providing representative annual longitudinal data of private households in Germany. The sample provides a detailed life course perspective based on a rich collection of information about living conditions, socio-economic status, family relationships, personality, values, preferences, and health. We collected genetic data from 2,598 individuals in the SOEP Innovation Sample, yielding the first genotyped sample that is representative of the entire German population (Gene-SOEP). The Gene-SOEP sample is a longitudinal study that includes 107 full-sibling pairs, 501 parent-offspring pairs, and 152 parent-offspring trios that are overlapping with the parent-offspring pairs. We constructed a repository of 66 polygenic indices in the Gene-SOEP sample based on results from well-powered genome-wide association studies. The Gene-SOEP data provides a valuable resource to study individual differences, inequalities, life-course development, health, and interactions between genetic predispositions and environment.

Download Full-text

Life Course Adiposity and Alzheimer’s Disease: A Mendelian Randomization Study

Journal of Alzheimer s Disease ◽

10.3233/jad-210345 ◽

2021 ◽

pp. 1-10

Author(s):

Xian Li ◽

Yan Tian ◽

Yu-Xiang Yang ◽

Ya-Hui Ma ◽

Xue-Ning Shen ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Life Course ◽

Mendelian Randomization ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Fat Percentage ◽

Regression Methods ◽

Weighted Median

Background: Several studies showed that life course adiposity was associated with Alzheimer’s disease (AD). However, the underlying causality remains unclear. Objective: We aimed to examine the causal relationship between life course adiposity and AD using Mendelian randomization (MR) analysis. Methods: Instrumental variants were obtained from large genome-wide association studies (GWAS) for life course adiposity, including birth weight (BW), childhood body mass index (BMI), adult BMI, waist circumference (WC), waist-to-hip ratio (WHR), and body fat percentage (BFP). A meta-analysis of GWAS for AD including 71,880 cases and 383,378 controls was used in this study. MR analyses were performed using inverse variance weighted (IVW), weighted median, and MR-Egger regression methods. We calculated odds ratios (ORs) per genetically predicted standard deviation (1-SD) unit increase in each trait for AD. Results: Genetically predicted 1-SD increase in adult BMI was significantly associated with higher risk of AD (IVW: OR = 1.03, 95% confidence interval [CI] = 1.01–1.05, p = 2.7×10–3) after Bonferroni correction. The weighted median method indicated a significant association between BW and AD (OR = 0.94, 95% CI = 0.90–0.98, p = 1.8×10–3). We also found suggestive associations of AD with WC (IVW: OR = 1.03, 95% CI = 1.00–1.07, p = 0.048) and WHR (weighted median: OR = 1.04, 95% CI = 1.00–1.07, p = 0.029). No association was detected of AD with childhood BMI and BFP. Conclusion: Our study demonstrated that lower BW and higher adult BMI had causal effects on increased AD risk.

Download Full-text

Gene4PD: A Comprehensive Genetic Database of Parkinson’s Disease

Frontiers in Neuroscience ◽

10.3389/fnins.2021.679568 ◽

2021 ◽

Vol 15 ◽

Author(s):

Bin Li ◽

Guihu Zhao ◽

Qiao Zhou ◽

Yali Xie ◽

Zheng Wang ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Neurodegenerative Disorder ◽

Late Onset ◽

Association Studies ◽

Age At Onset ◽

Genetic Data ◽

Genome Wide Association Studies ◽

Onset Age ◽

Loss Of Function

Parkinson’s disease (PD) is a complex neurodegenerative disorder with a strong genetic component. A growing number of variants and genes have been reported to be associated with PD; however, there is no database that integrate different type of genetic data, and support analyzing of PD-associated genes (PAGs). By systematic review and curation of multiple lines of public studies, we integrate multiple layers of genetic data (rare variants and copy-number variants identified from patients with PD, associated variants identified from genome-wide association studies, differentially expressed genes, and differential DNA methylation genes) and age at onset in PD. We integrated five layers of genetic data (8302 terms) with different levels of evidences from more than 3,000 studies and prioritized 124 PAGs with strong or suggestive evidences. These PAGs were identified to be significantly interacted with each other and formed an interconnected functional network enriched in several functional pathways involved in PD, suggesting these genes may contribute to the pathogenesis of PD. Furthermore, we identified 10 genes were associated with a juvenile-onset (age ≤ 30 years), 11 genes were associated with an early-onset (age of 30–50 years), whereas another 10 genes were associated with a late-onset (age > 50 years). Notably, the AAOs of patients with loss of function variants in five genes were significantly lower than that of patients with deleterious missense variants, while patients with VPS13C (P = 0.01) was opposite. Finally, we developed an online database named Gene4PD (http://genemed.tech/gene4pd) which integrated published genetic data in PD, the PAGs, and 63 popular genomic data sources, as well as an online pipeline for prioritize risk variants in PD. In conclusion, Gene4PD provides researchers and clinicians comprehensive genetic knowledge and analytic platform for PD, and would also improve the understanding of pathogenesis in PD.

Download Full-text

Genetic Data Analysis

Handbook of Research on Disease Prediction Through Data Analytics and Machine Learning - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-7998-2742-9.ch017 ◽

2021 ◽

pp. 358-372

Author(s):

M. Shamila ◽

Amit Kumar Tyagi

Keyword(s):

Data Analysis ◽

Association Studies ◽

Single Gene ◽

Genetic Data ◽

Building Blocks ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Human Beings ◽

Genome Wide ◽

Genetic Data Analysis

Genome-wide association studies (GWAS) or genetic data analysis is used to discover common genetic factors which influence the health of human beings and become a part of a disease. The concept of using genomics has increased in recent years, especially in e-healthcare. Today there is huge improvement required in this field or genomics. Note that the terms genomics and genetics are not similar terms here. Basically, the human genome is made up of DNA, which consists of four different chemical building blocks (called bases and abbreviated A, T, C, and G). Based on this, we differentiate each and every human being living on earth. The term ‘genetics' originated from the Greek word ‘genetikos'. It means ‘origin'. In simple terms, genetics can be defined as a branch of biology, which deals with the study of the functionalities and composition of a single gene in an organism. There are mainly three branches of genetics, which include classical genetics, molecular genetics, and population genetics.

Download Full-text

Identification of Genetic Modifiers Associated with Risk of Stroke in Children with Sickle Cell Anemia

Blood ◽

10.1182/blood.v120.21.3228.3228 ◽

2012 ◽

Vol 120 (21) ◽

pp. 3228-3228

Author(s):

Jonathan Michael Flanagan ◽

Heidi Linder ◽

Vivien Sheehan ◽

Thad A Howard ◽

Banu Aygun ◽

...

Keyword(s):

Cerebrovascular Disease ◽

Sickle Cell ◽

Sickle Cell Anemia ◽

Conflicts Of Interest ◽

Association Studies ◽

Snp Markers ◽

Genome Wide Association Studies ◽

Genetic Modifiers ◽

Sibling Pairs ◽

Synonymous Mutations

Abstract Abstract 3228 Introduction: Stroke is one of the most catastrophic acute complications of sickle cell anemia (SCA), occurring in 11% of patients before 20 years of age. A further 20 to 30% of children with SCA will develop less clinically overt cerebrovascular disease events such as transient ischemic attacks (TIA) and silent infarcts. There is a definite need for biomarkers that could determine the cause of these irreversible cerebrovascular events and which might predict children at greatest risk. Previous studies of sibling pairs have shown that there is a genetic component to cerebrovascular disease development but few genetic modifiers have been validated as having a substantial effect on risk of stroke. The aim of this study was to perform an unbiased whole genome search for genetic modifiers of stroke risk in SCA. Methods: Pediatric patients with SCA and documented primary stroke (n=177) were compared to a pediatric control non-stroke group with SCA (n=335). All control patients were over 5 years old and without previous clinical stroke prior to beginning any clinical treatment. Genome wide association studies (GWAS) were performed using genotype data obtained from Affymetrix SNP6.0 arrays. A pooled DNA approach was used to perform whole exome sequencing (WES) by Illumina next generation sequencing of pooled control (n=104) and pooled stroke (n=120) groups. Results: From the Affymetrix SNP6.0 GWAS data, 139 single nucleotide polymorphisms (SNP) were identified with stroke association. From the WES, 294 non-synonymous mutations were found to be significantly associated with stroke. In combination, 11 mutations identified by WES were located within 250kb of a SNP identified by GWAS (Table 1). These 11 mutations represent key areas of the genome that are targets for further in depth study. To next validate the genetic variants identified by WES with association with risk of stroke, 21 candidate mutations were genotyped in an independent cohort of control (n=231) and stroke (n=57) patients with SCA. One mutation in GOLGB1 (Y1212C) was corroborated as having significant association with lower risk of stroke (p=0.02). Conclusion: This mutation in GOLGB1 is predicted to effect the golgi associated function of the encoded protein and future studies will focus on how this functional mutation may protect against development of cerebrovascular disease in the context of SCA. For all variants with significant association with stroke, the chromosomal position of each variant identified by WES (n=300, p<0.001) was compared to the location of all SNP markers (n=139, p<0.0001). We identified 11 variants by WES where there was at least one SNP marker within 250kb. These variants all represent excellent regions of the genome for future study. The four variants highlighted with a asterisk (*) are variants predicted by PolyPhen2 or SIFT to be deleterious. Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations

10.1101/503144 ◽

2018 ◽

Cited By ~ 3

Author(s):

Yang Luo ◽

Xinyi Li ◽

Xin Wang ◽

Steven Gazal ◽

Josep Maria Mercader ◽

...

Keyword(s):

African American ◽

Complex Traits ◽

Association Studies ◽

Genetic Data ◽

Age At Menarche ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Diverse Populations ◽

Tissue Specific ◽

Different Populations

AbstractThe increasing size and diversity of genome-wide association studies provide an exciting opportunity to study how the genetics of complex traits vary among diverse populations. Here, we introduce covariate-adjusted LD score regression (cov-LDSC), a method to accurately estimate genetic heritability and its enrichment in both homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the GWAS samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates by 10% − 60% in admixed populations; in contrast, cov-LDSC is robust to all simulation parameters. We apply cov-LDSC to genotyping data from approximately 170,000 Latino, 47,000 African American and 135,000 European individuals. We estimate and detect heritability enrichment in three quantitative and five dichotomous phenotypes respectively, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals. Our results show that most traits have high concordance of and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of . We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size τ* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in Latino, African American and European populations respectively. Our results demonstrate that our approach is a powerful way to analyze genetic data for complex traits from underrepresented populations.Author summaryAdmixed populations such as African Americans and Hispanic Americans bear a disproportionately high burden of disease but remain underrepresented in current genetic studies. It is important to extend current methodological advancements for understanding the genetic basis of complex traits in homogeneous populations to individuals with admixed genetic backgrounds. Here, we develop a computationally efficient method to answer two specific questions. First, does genetic variation contribute to the same amount of phenotypic variation (heritability) across diverse populations? Second, are the genetic mechanisms shared among different populations? To answer these questions, we use our novel method to conduct the first comprehensive heritability-based analysis of a large number of admixed individuals. We show that there is a high degree of concordance in total heritability and tissue-specific enrichment between different ancestral groups. However, traits such as age at menarche show a noticeable differences among populations. Our work provides a powerful way to analyze genetic data in admixed populations and may contribute to the applicability of genomic medicine to admixed population groups.

Download Full-text

Leveraging human genetic data to investigate the cardiometabolic effects of glucose-dependent insulinotropic polypeptide signalling

Diabetologia ◽

10.1007/s00125-021-05564-7 ◽

2021 ◽

Author(s):

Ville Karhunen ◽

Iyas Daghlas ◽

Verena Zuber ◽

Marijana Vujkovic ◽

Anette K. Olsen ◽

...

Keyword(s):

Type 2 Diabetes ◽

Clinical Investigation ◽

Association Studies ◽

Genetic Data ◽

Data Availability ◽

Mendelian Randomisation ◽

Cardiometabolic Health ◽

Genome Wide Association Studies ◽

Genetic Associations

Abstract Aims/hypothesis The aim of this study was to leverage human genetic data to investigate the cardiometabolic effects of glucose-dependent insulinotropic polypeptide (GIP) signalling. Methods Data were obtained from summary statistics of large-scale genome-wide association studies. We examined whether genetic associations for type 2 diabetes liability in the GIP and GIPR genes co-localised with genetic associations for 11 cardiometabolic outcomes. For those outcomes that showed evidence of co-localisation (posterior probability >0.8), we performed Mendelian randomisation analyses to estimate the association of genetically proxied GIP signalling with risk of cardiometabolic outcomes, and to test whether this exceeded the estimate observed when considering type 2 diabetes liability variants from other regions of the genome. Results Evidence of co-localisation with genetic associations of type 2 diabetes liability at both the GIP and GIPR genes was observed for five outcomes. Mendelian randomisation analyses provided evidence for associations of lower genetically proxied type 2 diabetes liability at the GIP and GIPR genes with lower BMI (estimate in SD units −0.16, 95% CI −0.30, −0.02), C-reactive protein (−0.13, 95% CI −0.19, −0.08) and triacylglycerol levels (−0.17, 95% CI −0.22, −0.12), and higher HDL-cholesterol levels (0.19, 95% CI 0.14, 0.25). For all of these outcomes, the estimates were greater in magnitude than those observed when considering type 2 diabetes liability variants from other regions of the genome. Conclusions/interpretation This study provides genetic evidence to support a beneficial role of sustained GIP signalling on cardiometabolic health greater than that expected from improved glycaemic control alone. Further clinical investigation is warranted. Data availability All data used in this study are publicly available. The scripts for the analysis are available at: https://github.com/vkarhune/GeneticallyProxiedGIP. Graphical abstract

Download Full-text

Genome-wide genetic data on ~500,000 UK Biobank participants

10.1101/166298 ◽

2017 ◽

Cited By ~ 303

Author(s):

Clare Bycroft ◽

Colin Freeman ◽

Desislava Petkova ◽

Gavin Band ◽

Lloyd T. Elliott ◽

...

Keyword(s):

Quality Control ◽

Allelic Variation ◽

Association Studies ◽

Genetic Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genotype Data ◽

Uk Biobank ◽

Genome Wide ◽

Wide Range

AbstractThe UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.

Download Full-text

Ancestral haplotype reconstruction in endogamous populations using identity-by-descent

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008638 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008638

Author(s):

Kelly Finke ◽

Michael Kourakos ◽

Gabriela Brown ◽

Huyen Trang Dang ◽

Shi Jie Samuel Tan ◽

...

Keyword(s):

Family Relationships ◽

Sequence Data ◽

Association Studies ◽

Haplotype Diversity ◽

Building Blocks ◽

The United States ◽

Genome Wide Association Studies ◽

Ancestral Reconstruction ◽

Identity By Descent ◽

Endogamous Populations

In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs. We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to the United States from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families.

Download Full-text

Mendelian imputation of parental genotypes for genome-wide estimation of direct and indirect genetic effects

10.1101/2020.07.02.185199 ◽

2020 ◽

Cited By ~ 3

Author(s):

Alexander I. Young ◽

Seyed Moeen Nehzati ◽

Chanwook Lee ◽

Stefania Benonisdottir ◽

David Cesarini ◽

...

Keyword(s):

Association Studies ◽

Genetic Material ◽

Genetic Data ◽

Smoking Initiation ◽

Genetic Effects ◽

Effective Sample Size ◽

Genome Wide Association Studies ◽

Degree Relative ◽

Indirect Genetic Effects ◽

Genome Wide

AbstractAssociations between genotype and phenotype derive from four sources: direct genetic effects, indirect genetic effects from relatives, population stratification, and correlations with other variants affecting the phenotype through assortative mating. Genome-wide association studies (GWAS) of unrelated individuals have limited ability to distinguish the different sources of genotype-phenotype association, confusing interpretation of results and potentially leading to bias when those results are applied – in genetic prediction of traits, for example. With genetic data on families, the randomisation of genetic material during meiosis can be used to distinguish direct genetic effects from other sources of genotype-phenotype association. Genetic data on siblings is the most common form of genetic data on close relatives. We develop a method that takes advantage of identity-by-descent sharing between siblings to impute missing parental genotypes. Compared to no imputation, this increases the effective sample size for estimation of direct genetic effects and indirect parental effects by up to one third and one half respectively. We develop a related method for imputing missing parental genotypes when a parent-offspring pair is observed. We provide the imputation methods in a software package, SNIPar (single nucleotide imputation of parents), that also estimates genome-wide direct and indirect effects of SNPs. We apply this to a sample of 45,826 White British individuals in the UK Biobank who have at least one genotyped first degree relative. We estimate direct and indirect genetic effects for ∼5 million genome-wide SNPs for five traits. We estimate the correlation between direct genetic effects and effects estimated by standard GWAS to be 0.61 (S.E. 0.09) for years of education, 0.68 (S.E. 0.10) for neuroticism, 0.72 (S.E. 0.09) for smoking initiation, 0.87 (S.E. 0.04) for BMI, and 0.96 (S.E. 0.01) for height. These results suggest that GWAS based on unrelated individuals provides an inaccurate picture of direct genetic effects for certain human traits.

Download Full-text

The Contribution of the Life-Course Perspective to the Study of Family Relationships: Advances, Challenges, and Limitations

The Palgrave Handbook of Family Sociology in Europe ◽

10.1007/978-3-030-73306-3_28 ◽

2021 ◽

pp. 557-574

Author(s):

Gaëlle Aeby ◽

Jacques-Antoine Gauthier

Keyword(s):

Life Course ◽

Family Relationships ◽

Life Course Perspective ◽

The Life Course

Download Full-text