Genetic Data Analysis

Author(s):  
M. Shamila ◽  
Amit Kumar Tyagi

Genome-wide association studies (GWAS) or genetic data analysis is used to discover common genetic factors which influence the health of human beings and become a part of a disease. The concept of using genomics has increased in recent years, especially in e-healthcare. Today there is huge improvement required in this field or genomics. Note that the terms genomics and genetics are not similar terms here. Basically, the human genome is made up of DNA, which consists of four different chemical building blocks (called bases and abbreviated A, T, C, and G). Based on this, we differentiate each and every human being living on earth. The term ‘genetics' originated from the Greek word ‘genetikos'. It means ‘origin'. In simple terms, genetics can be defined as a branch of biology, which deals with the study of the functionalities and composition of a single gene in an organism. There are mainly three branches of genetics, which include classical genetics, molecular genetics, and population genetics.

2017 ◽  
Author(s):  
Clare Bycroft ◽  
Colin Freeman ◽  
Desislava Petkova ◽  
Gavin Band ◽  
Lloyd T. Elliott ◽  
...  

AbstractThe UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.


2021 ◽  
Author(s):  
Guy Hindley ◽  
Kevin S O'Connell ◽  
Zillur Rahman ◽  
Oleksandr Frei ◽  
Shahram Bahrami ◽  
...  

Mood instability (MOOD) is a transdiagnostic phenomenon with a prominent neurobiological basis. Recent genome-wide association studies found significant positive genetic correlation between MOOD and major depression (DEP) and weak correlations with other psychiatric disorders. We investigated the polygenic overlap between MOOD and psychiatric disorders beyond genetic correlation to better characterize putative shared genetic determinants. Summary statistics for schizophrenia (SCZ, n=105,318), bipolar disorder (BIP, n=413,466), DEP (n=450,619), attention-deficit hyperactivity disorder (ADHD, n=53,293) and MOOD (n=363,705), were analysed using the bivariate causal mixture model and conjunctional false discovery rate methods to estimate the proportion of shared variants influencing MOOD and each disorder, and identify jointly associated genomic loci. MOOD correlated positively with all psychiatric disorders, but with wide variation in strength (rg=0.10-0.62). Of 10.4K genomic variants influencing MOOD, 4K-9.4K were estimated to influence psychiatric disorders. MOOD was jointly associated with DEP at 163 loci, SCZ at 110, BIP at 60 and ADHD at 25, with consistent genetic effects in independent samples. Fifty-three jointly associated loci were overlapping across two or more disorders (transdiagnostic), seven of which had discordant effect directions on psychiatric disorders. Genes mapped to loci associated with MOOD and all four disorders were enriched in a single gene-set, synapse organization. The extensive polygenic overlap indicates shared molecular underpinnings across MOOD and psychiatric disorders. However, distinct patterns of genetic correlation and effect directions of shared loci suggest divergent effects on corresponding neurobiological mechanisms which may relate to differences in the core clinical features of each disorder.


2019 ◽  
Author(s):  
C.J. Battey ◽  
Peter L. Ralph ◽  
Andrew D. Kern

ABSTRACTReal geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies. We find that most common summary statistics have distributions that differ substantially from that seen in well-mixed populations, especially when Wright’s neighborhood size is less than 100 and sampling is spatially clustered. Stepping-stone models reproduce some of these effects, but discretizing the landscape introduces artifacts which in some cases are exacerbated at higher resolutions. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations were surprisingly robust to isolation by distance. We also show that the combination of spatially autocorrelated environments and limited dispersal causes genome-wide association studies to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.


2020 ◽  
Author(s):  
Janet C. Harwood ◽  
Ganna Leonenko ◽  
Rebecca Sims ◽  
Valentina Escott-Price ◽  
Julie Williams ◽  
...  

AbstractMore than 50 genetic loci have been identified as being associated with Alzheimer’s disease (AD) from genome-wide association studies (GWAS) and many of these are involved in immune pathways and lipid metabolism. Therefore, we performed a transcriptome-wide association study (TWAS) of immune-relevant cells, to study the mis-regulation of genes implicated in AD. We used expression and genetic data from naive and induced CD14+ monocytes and two GWAS of AD to study genetically controlled gene expression in monocytes at different stages of differentiation and compared the results with those from TWAS of brain and blood. We identified nine genes with statistically independent TWAS signals, seven are known AD risk genes from GWAS: BIN1, PTK2B, SPI1, MS4A4A, MS4A6E, APOE and PVR and two, LACTB2 and PLIN2/ADRP, are novel candidate genes for AD. Three genes, SPI1, PLIN2 and LACTB2, are TWAS significant specifically in monocytes. LACTB2 is a mitochondrial endoribonuclease and PLIN2/ADRP associates with intracellular neutral lipid storage droplets (LSDs) which have been shown to play a role in the regulation of the immune response. Notably, LACTB2 and PLIN2 were not detected from GWAS alone.


Sign in / Sign up

Export Citation Format

Share Document