Utilizing population-based genomic data to expedite the curation of genes and genomic regions for the ClinGen “dosage sensitivity unlikely” classification

2021 ◽  
Vol 132 ◽  
pp. S228-S229
Author(s):  
Molly Good ◽  
Erica Andersen ◽  
Adam Clayton ◽  
Jian Zhao ◽  
Christa Martin ◽  
...  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chathura J. Gunasekara ◽  
Eilis Hannon ◽  
Harry MacKay ◽  
Cristian Coarfa ◽  
Andrew McQuillin ◽  
...  

AbstractEpigenetic dysregulation is thought to contribute to the etiology of schizophrenia (SZ), but the cell type-specificity of DNA methylation makes population-based epigenetic studies of SZ challenging. To train an SZ case–control classifier based on DNA methylation in blood, therefore, we focused on human genomic regions of systemic interindividual epigenetic variation (CoRSIVs), a subset of which are represented on the Illumina Human Methylation 450K (HM450) array. HM450 DNA methylation data on whole blood of 414 SZ cases and 433 non-psychiatric controls were used as training data for a classification algorithm with built-in feature selection, sparse partial least squares discriminate analysis (SPLS-DA); application of SPLS-DA to HM450 data has not been previously reported. Using the first two SPLS-DA dimensions we calculated a “risk distance” to identify individuals with the highest probability of SZ. The model was then evaluated on an independent HM450 data set on 353 SZ cases and 322 non-psychiatric controls. Our CoRSIV-based model classified 303 individuals as cases with a positive predictive value (PPV) of 80%, far surpassing the performance of a model based on polygenic risk score (PRS). Importantly, risk distance (based on CoRSIV methylation) was not associated with medication use, arguing against reverse causality. Risk distance and PRS were positively correlated (Pearson r = 0.28, P = 1.28 × 10−12), and mediational analysis suggested that genetic effects on SZ are partially mediated by altered methylation at CoRSIVs. Our results indicate two innate dimensions of SZ risk: one based on genetic, and the other on systemic epigenetic variants.


2018 ◽  
Vol 53 (5) ◽  
pp. 527-539 ◽  
Author(s):  
Tiago do Prado Paim ◽  
Patrícia Ianella ◽  
Samuel Rezende Paiva ◽  
Alexandre Rodrigues Caetano ◽  
Concepta Margaret McManus Pimentel

Abstract: The recent development of genome-wide single nucleotide polymorphism (SNP) arrays made it possible to carry out several studies with different species. The selection process can increase or reduce allelic (or genic) frequencies at specific loci in the genome, besides dragging neighboring alleles in the chromosome. This way, genomic regions with increased frequencies of specific alleles are formed, caracterizing selection signatures or selective sweeps. The detection of these signatures is important to characterize genetic resources, as well as to identify genes or regions involved in the control and expression of important production and economic traits. Sheep are an important species for theses studies as they are dispersed worldwide and have great phenotypic diversity. Due to the large amounts of genomic data generated, specific statistical methods and softwares are necessary for the detection of selection signatures. Therefore, the objectives of this review are to address the main statistical methods and softwares currently used for the analysis of genomic data and the identification of selection signatures; to describe the results of recent works published on selection signatures in sheep; and to discuss some challenges and opportunities in this research field.


2018 ◽  
Vol 63 (No. 4) ◽  
pp. 136-143
Author(s):  
N. Moravčíková ◽  
M. Simčič ◽  
G. Mészáros ◽  
J. Sölkner ◽  
V. Kukučková ◽  
...  

The aim of this study was to analyse the genomic regions that have been target of natural selection with respect to identifying the loci responsible mainly for fitness traits across six alpine cattle breeds. The genome-wide scan for selection signatures was performed using genotyping data from totally 465 animals. After applying data quality control, overall 35 873 single nucleotide polymorphisms were useable for the subsequent analysis. The detection of genomic regions affected by natural selection was carried out using the approach of principal component analysis. The analysis was based on the assumption that markers extremely related to the population structure are also candidates for local adaptation potential of the population. Based on the expected false discovery rate equal to 10% up to 1138 loci were identified as outliers. The strongest signals of selection were found in genomic regions on BTA 1, 2, 3, 6, 9, 11, 13, and 22. Most genes located in the identified regions have been previously associated with immunity system as well as body growth and muscle formation that mainly reflect the pressure of both natural and artificial selection in respect to adaptation of analysed breeds to the local environmental conditions. The results also signalized that those regions represent a correlated selection response in way to maintain the fitness of analysed breeds.


2011 ◽  
Vol 7 (6) ◽  
pp. 896-898 ◽  
Author(s):  
Alison G. Scoville ◽  
Young Wha Lee ◽  
John H. Willis ◽  
John K. Kelly

Most natural populations display substantial genetic variation in behaviour, morphology, physiology, life history and the susceptibility to disease. A major challenge is to determine the contributions of individual loci to variation in complex traits. Quantitative trait locus (QTL) mapping has identified genomic regions affecting ecologically significant traits of many species. In nearly all cases, however, the importance of these QTLs to population variation remains unclear. In this paper, we apply a novel experimental method to parse the genetic variance of floral traits of the annual plant Mimulus guttatus into contributions of individual QTLs. We first use QTL-mapping to identify nine loci and then conduct a population-based breeding experiment to estimate V Q , the genetic variance attributable to each QTL. We find that three QTLs with moderate effects explain up to one-third of the genetic variance in the natural population. Variation at these loci is probably maintained by some form of balancing selection. Notably, the largest effect QTLs were relatively minor in their contribution to heritability.


2015 ◽  
Author(s):  
Karen Y. Oróstica ◽  
Ricardo A. Verdugo

ABSTRACTSummary: Visualizing genomic data in chromosomal context can help detecting errors in data generation or analysis and can suggest new hypotheses to be tested. Here we report a new tool for displaying large and diverse genomic data in idiograms of one or multiple chromosomes. The package is implemented in R so that visualization can be easily integrated with its numerous packages for processing genomic data. It supports simultaneous visualization of multiples tracks of data, each of potentially different nature. Large genomic regions such as QTLs or synteny tracts may be shown along histograms of number of genes, genetic variants, or any other type of genomic element. Tracks can also contain values for continuous or categorical variables and the user can choose among points, points connected by lines, line segments, barplots or histograms for representing data. chromPlot reads data from tables in BED format which are imported in R using its builtin functions. The information necessary to draw chromosomes for mouse and human is included with the package. Chromosomes for other organisms are downloaded automatically from the Ensembl website or can be provided by the user. We present common use cases here, and a full tutorial is included as the packages's vignette.Availability: chromPlot is distributed under a GLP2 licence at Genomed Lab: http://genomed.med.uchile.cl.Contact:[email protected]


2020 ◽  
Vol 37 (11) ◽  
pp. 3267-3291 ◽  
Author(s):  
Xiaoheng Cheng ◽  
Michael DeGiorgio

Abstract Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169–SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.


Author(s):  
Minakshi Bhardwaj

AbstractScientists are making attempts to develop a detailed understanding of the heritable variation in the human genome at individual and population level. It is claimed that population based genomics research will be crucial in understanding the differences in the human susceptibility to diseases, drug responses and the complex interaction of genetic and environmental factors in the production of particular phenotypes (DoH white paper 2003). These efforts led to the phenomenon of biobanking and setting up mega- genetic databases. Since the development of the Icelandic Health Sector Database, the biobanking phenomenon is taking up pace and several countries of the world are starting several large-scale population based genomics projects under the umbrella of biobanking and genetic databases. At the time of an increased interest in acquiring information on gene-gene interactions and gene-environmental interactions, it is important to acknowledge that although promising, these prospects face major scientific hurdles and ethical questions in practice. The definitions, applications and prospective implications of genetic databases and biobanks on health and healthcare are understood and implied in several ways both in clinical and biological research. The term genetic database can imply to a collection of biological material (biobank) from which genetic information, for example genealogical and clinical information can be derived, systematically organized and used for research purposes. Different countries have used different terminologies for their collections of biological materials and have organised genetic information in large databases. Sometimes the term biobank also includes a complex network of databases assembled and accredited in one system. COGENE, the ‘Coordination of Genome Research Across Europe’ refers to biobanks as cohort studies. Cohort studies involve comparative studies between a diseased group with some common parameters such as geography, age, employment, a disease condition or any other determinant within a general group. The groups are compared for a long period of time for specific tests. For instance, the unified database of the Latvian Population is popularly called the Latvian Genome Project. The Latvian Project aims to create ‘a unified national network of genetic information and data processing, to collect representative amount of genetic material for genotyping of the Latvian population and to compare genomic data with the clinical information and the information available about specific pedigrees’ (Pirags and Grens 2005). Genomic data will contain the sum-total of genetic information in the entire DNA of the Latvian population. Other examples are the UK population-based genetic database, which has started under the name of the UK Biobank. It involves systematic collections of biological samples and medical and genetic information. The Estonian Genome Project uses the concept of Gene bank to refer to its genetic database. Gene banks and genomic banks are similar in that they contain large sets of genetic information in the form of datasets and sequences of the population.


2017 ◽  
Vol 13 (7) ◽  
pp. P1494 ◽  
Author(s):  
Vadim A. Stepanov ◽  
Oksana A. Makeeva ◽  
Andrey V. Marusin ◽  
Anna V. Bocharova ◽  
Kseniya V. Vagaitseva ◽  
...  

2018 ◽  
Author(s):  
Shao-Pei Chou ◽  
Charles G. Danko

AbstractHow DNA sequence variation influences gene expression remains poorly understood. Diploid organisms have two homologous copies of their DNA sequence in the same nucleus, providing a rich source of information about how genetic variation affects a wealth of biochemical processes. However, few computational methods have been developed to discover allele-specific differences in functional genomic data. Existing methods either treat each SNP independently, limiting statistical power, or combine SNPs across gene annotations, preventing the discovery of allele specific differences in unexpected genomic regions. Here we introduce AlleleHMM, a new computational method to identify blocks of neighboring SNPs that share similar allele-specific differences in mark abundance. AlleleHMM uses a hidden Markov model to divide the genome among three hidden states based on allele frequencies in genomic data: a symmetric state (state ‘S’) which shows no difference between alleles, and regions with a higher signal on the maternal (state M) or paternal (state P) allele. AlleleHMM substantially outperformed naive methods using both simulated and real genomic data, particularly when input data had realistic levels of overdispersion. Using PRO-seq data, AlleleHMM identified thousands of allele specific blocks of transcription in both coding and non-coding genomic regions. AlleleHMM is a powerful tool for discovering allele-specific regions in functional genomic datasets.


Sign in / Sign up

Export Citation Format

Share Document