scholarly journals Genome Diversity in Ukraine

2020 ◽  
Author(s):  
Taras K. Oleksyk ◽  
Walter W. Wolfsberger ◽  
Alexandra Weber ◽  
Khrystyna Shchubelka ◽  
Olga T. Oleksyk ◽  
...  

AbstractThe main goal of this collaborative effort is to provide genome wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for the public data release. DNBSEQ-G50 sequences, and genotypes by an Illumina GWAS chip were cross-validated on multiple samples, and additionally referenced to one sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. The genome data has been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, CNVs, SNPs and microsatellites. This study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for historic and medical research in a large understudied population. While most of the common variation is shared with other European populations, this survey of population variation contributes a number of novel SNPs and structural variants that have not been reported in the gnomAD/1KG databases representing global distribution of genomic variation. These endemic variants will become a valuable resource for designing future population and clinical studies, help address questions about ancestry and admixture, and will fill a missing place in the puzzle characterizing human population diversity in Eastern Europe. Our results indicate that genetic diversity of the Ukrainian population is uniquely shaped by the evolutionary and demographic forces, and cannot be ignored in the future genetic and biomedical studies. This data will contribute a wealth of new information bringing forth different risk and/or protective alleles. The newly discovered low frequency and local variants can be added to the current genotyping arrays for genome wide association studies, clinical trials, and in genome assessment of proliferating cancer cells.

GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


2021 ◽  
Author(s):  
Marsha M. Wheeler ◽  
Adrienne M Stilp ◽  
Shuquan Rao ◽  
Bjarni V Halldorsson ◽  
Doruk V Beyter ◽  
...  

Genome-wide association studies (GWAS) have identified thousands of single nucleotide variants and small indels that contribute to the genetic architecture of hematologic traits. While structural variants (SVs) are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of SVs to quantitative blood cell trait variation is unknown. Here we utilized SVs detected from whole genome sequencing (WGS) in ancestrally diverse participants of the NHLBI TOPMed program (N=50,675). Using single variant tests, we assessed the association of common and rare SVs with red cell-, white cell-, and platelet-related quantitative traits. The results show 33 independent SVs (23 common and 10 rare) reaching genome-wide significance. The majority of significant association signals (N=27) replicated in independent datasets from deCODE genetics and the UK BioBank. Moreover, most trait-associated SVs (N=24) are within 1Mb of previously-reported GWAS loci. SV analyses additionally discovered an association between a complex structural variant on 17p11.2 and white blood cell-related phenotypes. Based on functional annotation, the majority of significant SVs are located in non-coding regions (N=26) and predicted to impact regulatory elements and/or local chromatin domain boundaries in blood cells. We predict that several trait-associated SVs represent the causal variant. This is supported by genome-editing experiments which provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.


2021 ◽  
Author(s):  
Zhishuang Yang ◽  
Xueqin Yang ◽  
Mingshu Wang ◽  
Renyong Jia ◽  
Shun Chen ◽  
...  

The disease caused by Riemerella anatipestifer (R. anatipestifer) causes large economic losses to the global duck industry every year. Serotype-related genomic variation (such as in O-antigen and capsular polysaccharide gene clusters) has been widely used for the serotyping in many gram-negative bacteria. To date, there have been few studies focused on genetic basis of serotypes in R. anatipestifer. Here, we used pan-genome-wide association studies (Pan-GWAS) to identify the serotype-specific genetic loci of 38 R. anatipestifers strain. Analyses of the loci of 11 serotypes showed that the loci could be well mapped with the serotypes of the corresponding strains. We constructed the knockout strain for the wzy gene at the locus, and the results showed that the mutant lost the agglutination characteristics to positive antisera. Based on the of Pan-GWAS results, we developed a multiple PCR method to identify serotypes 1, 2, and 11 of R. anatipestifer. Our study provides a precedent for systematically analysing the genetic basis of the R anatipestifer serotypes and establishing a complete serotyping system in the future.


Author(s):  
Anubha Mahajan ◽  
Cassandra N Spracklen ◽  
Weihua Zhang ◽  
Maggie CY Ng ◽  
Lauren E Petty ◽  
...  

We assembled an ancestrally diverse collection of genome-wide association studies of type 2 diabetes (T2D) in 180,834 cases and 1,159,055 controls (48.9% non-European descent). We identified 277 loci at genome-wide significance (p<5x10-8), including 237 attaining a more stringent trans-ancestry threshold (p<5x10-9), which were delineated to 338 distinct association signals. Trans-ancestry meta-regression offered substantial enhancements to fine-mapping, with 58.6% of associations more precisely localised due to population diversity, and 54.4% of signals resolved to a single variant with >50% posterior probability. This improved fine-mapping enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are mediated, laying foundations for functional investigations. Trans-ancestry genetic risk scores enhanced transferability across diverse populations, providing a step towards more effective clinical translation to improve global health.


2021 ◽  
Author(s):  
Tomas W Fitzgerald ◽  
Ewan Birney

Copy number variation (CNV) has long been known to influence human traits having a rich history of research into common and rare genetic disease and although CNV is accepted as an important class of genomic variation, progress on copy number (CN) phenotype associations from Next Generation Sequencing data (NGS) has been limited, in part, due to the relative difficulty in CNV detection and an enrichment for large numbers of false positives. To date most successful CN genome wide association studies (CN-GWAS) have focused on using predictive measures of dosage intolerance or gene burden tests to gain sufficient power for detecting CN effects. Here we present a novel method for large scale CN analysis from NGS data generating robust CN estimates and allowing CN-GWAS to be performed genome wide in discovery mode. We provide a detailed analysis in the large scale UK BioBank resource and a specifically designed software package for deriving CN estimates from NGS data that are robust enough to be used for CN-GWAS. We use these methods to perform genome wide CN-GWAS analysis across 78 human traits discovering 862 genetic associations that are likely to contribute strongly to trait distributions based solely on their CN or by acting in concert with other genetic variation. Finally, we undertake an analysis comparing CNV and SNP association signals across the same traits and samples, defining specific CNV association classes based on whether they could be detected using standard SNP-GWAS in the UK Biobank.


2018 ◽  
Author(s):  
Venice Juanillas ◽  
Alexis Dereeper ◽  
Nicolas Beaume ◽  
Gaetan Droc ◽  
Joshua Dizon ◽  
...  

AbstractBackgroundRice molecular genetics, breeding, genetic diversity, and allied research (such as rice-pathogen interaction) have adopted sequencing technologies and high density genotyping platforms for genome variation analysis and gene discovery. Germplasm collections representing rice diversity, improved varieties and elite breeding materials are accessible through rice gene banks for use in research and breeding, with many having genome sequences and high density genotype data available. Combining phenotypic and genotypic information on these accessions enables genome-wide association analysis, which is driving quantitative trait loci (QTL) discovery and molecular marker development. Comparative sequence analyses across QTL regions facilitate the discovery of novel alleles. Analyses involving DNA sequences and large genotyping matrices for thousands of samples, however, pose a challenge to non-computer savvy rice researchers.FindingsWe adopted the Galaxy framework to build the federated Rice Galaxy resource, with shared datasets, tools, and analysis workflows relevant to rice research. The shared datasets include high density genotypes from the 3,000 Rice Genomes project and sequences with corresponding annotations from nine published rice genomes. Rice Galaxy includes tools for designing single nucleotide polymorphism (SNP) assays, analyzing genome-wide association studies, population diversity, rice-bacterial pathogen diagnostics, and a suite of published genomic prediction methods. A prototype Rice Galaxy compliant to Open Access, Open Data, and Findable, Accessible, Interoperable, and Reproducible principles is also presented.ConclusionsRice Galaxy is a freely available resource that empowers the plant research community to perform state-of-the-art analyses and utilize publicly available big datasets for both fundamental and applied science.


2021 ◽  
Author(s):  
Artur F. Schumacher-Schuh ◽  
Andrei Bieger ◽  
Olaitan Okunoye ◽  
Kin Mok ◽  
Shen-Yang Lim ◽  
...  

AbstractHuman genetics research lacks diversity; over 80% of genome-wide association studies (GWAS) have been conducted on individuals of European ancestry. In addition to limiting insights regarding disease mechanisms, disproportionate representation can create disparities preventing equitable implementation of personalized medicine. This systematic review provides an overview of research involving Parkinson’s disease (PD) genetics in under-represented populations (URP), and sets a baseline to measure the future impact of current efforts in those populations.We searched PubMed and EMBASE until October 2021 using search strings for “PD”, “genetics”, the main “URP”, and “lower-to-upper-middle-income countries”. Inclusion criteria were original studies, written in English, reporting genetic results on PD patients from non-European populations. Two levels of independent reviewers identified and extracted relevant information.We observed considerable imbalances in PD genetic studies among URP. Asian participants from China were described in the majority of the articles published (61%), but other populations were less well studied, for example, Blacks were represented in just 4.0% of the publications. Also, although idiopathic PD was more studied than monogenic forms of the disease, most studies analyzed a limited number of genetic variants. We identified just seven studies using a genome-wide approach published up to 2021 including URP.This review provides insight into the significant lack of population diversity in PD research highlighting the urgent need for better representation. The Global Parkinson’s Genetics Program (GP2) and similar initiatives aim to impact research in URP, and the early metrics presented here can be used to measure progress in the field of PD genetics in the future.


2019 ◽  
Author(s):  
Yoav Voichek ◽  
Detlef Weigel

AbstractStructural variants and presence/absence polymorphisms are common in plant genomes, yet they are routinely overlooked in genome-wide association studies (GWAS). Here, we expand the genetic variants detected in GWAS to include major deletions, insertions, and rearrangements. We first use raw sequencing data directly to derive short sequences, k-mers, that mark a broad range of polymorphisms independently of a reference genome. We then link k-mers associated with phenotypes to specific genomic regions. Using this approach, we re-analyzed 2,000 traits measured in Arabidopsis thaliana, tomato, and maize populations. Associations identified with k-mers recapitulate those found with single-nucleotide polymorphisms (SNPs), however, with stronger statistical support. Moreover, we identified new associations with structural variants and with regions missing from reference genomes. Our results demonstrate the power of performing GWAS before linking sequence reads to specific genomic regions, which allow detection of a wider range of genetic variants responsible for phenotypic variation.


Heart ◽  
2017 ◽  
Vol 104 (3) ◽  
pp. 201-206 ◽  
Author(s):  
Aneesh Bapat ◽  
Christopher D Anderson ◽  
Patrick T Ellinor ◽  
Steven A Lubitz

Atrial fibrillation (AF) is a prevalent arrhythmia associated with substantial morbidity, mortality and costs. Available management strategies generally have limited efficacy and are associated with potential adverse effects. In part, the limited efficacy of approaches to managing AF reflect an incomplete understanding of the biological mechanisms underlying the arrhythmia, and only a partial understanding of how best to individualise management. Over the last several decades, a greater understanding of genome biology has led to recognition of a widespread genetic susceptibility to AF. Through genome-wide association studies, at least 30 genetic loci have been identified in association with AF, most of which implicate mechanisms not previously appreciated to be involved in the development of AF. We now recognise that AF is a polygenic condition, yet a great deal of work lies ahead to better understand the precise mechanisms by which genomic variation causes AF. Understanding the genetic basis of AF could provide a better understanding of AF mechanisms and cardiovascular biology, inform the management of patients through risk-guided approaches and facilitate the development of novel therapeutics.


Sign in / Sign up

Export Citation Format

Share Document