scholarly journals Analysis of the Batch Effect Due to Sequencing Center in Population Statistics Quantifying Rare Events in the 1000 Genomes Project

Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 44
Author(s):  
Iago Maceda ◽  
Oscar Lao

The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. M. Bea ◽  
E. Franco-Marín ◽  
V. Marco-Benedí ◽  
E. Jarauta ◽  
I. Gracia-Rubio ◽  
...  

AbstractAngiopoietin-like 3 (ANGPTL3) plays an important role in lipid metabolism in humans. Loss-of-function variants in ANGPTL3 cause a monogenic disease named familial combined hypolipidemia. However, the potential contribution of ANGPTL3 gene in subjects with familial combined hyperlipidemia (FCHL) has not been studied. For that reason, the aim of this work was to investigate the potential contribution of ANGPTL3 in the aetiology of FCHL by identifying gain-of-function (GOF) genetic variants in the ANGPTL3 gene in FCHL subjects. ANGPTL3 gene was sequenced in 162 unrelated subjects with severe FCHL and 165 normolipemic controls. Pathogenicity of genetic variants was predicted with PredictSNP2 and FruitFly. Frequency of identified variants in FCHL was compared with that of normolipemic controls and that described in the 1000 Genomes Project. No GOF mutations in ANGPTL3 were present in subjects with FCHL. Four variants were identified in FCHL subjects, showing a different frequency from that observed in normolipemic controls: c.607-109T>C, c.607-47_607-46delGT, c.835+41C>A and c.*52_*60del. This last variant, c.*52_*60del, is a microRNA associated sequence in the 3′UTR of ANGPTL3, and it was present 2.7 times more frequently in normolipemic controls than in FCHL subjects. Our research shows that no GOF mutations in ANGPTL3 were found in a large group of unrelated subjects with FCHL.


2014 ◽  
Author(s):  
Debora Yoshihara Caldeira Brandt ◽  
Vitor Rezende da Costa Aguiar ◽  
Bárbara Domingues Bitarello ◽  
Kelly Nunes ◽  
Jérôme Goudet ◽  
...  

Next Generation Sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the Human Leukocyte Antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analises, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the SNPs reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, -DQB1 ). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1,092 1000G samples, and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect, and that allele frequencies are estimated with an error higher than ??0.1 at approximately 25% of the SNPs in HLA genes. We found a bias towards overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates, and discuss the outcomes of including those sites in different kinds of analyses. Since the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.


2015 ◽  
Vol 94 (4) ◽  
pp. 731-740 ◽  
Author(s):  
WENQIAN ZHANG ◽  
HUI WEN NG ◽  
MAO SHU ◽  
HENG LUO ◽  
ZHENQIANG SU ◽  
...  

2016 ◽  
Author(s):  
Petr Danecek ◽  
Shane A. McCarthy

AbstractMotivation:Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues.Results:BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory.Availability:The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from http://samtools.github.io/bcftoolsContact:[email protected]


2019 ◽  
Vol 35 (22) ◽  
pp. 4851-4853 ◽  
Author(s):  
Mihir A Kamat ◽  
James A Blackshaw ◽  
Robin Young ◽  
Praveen Surendran ◽  
Stephen Burgess ◽  
...  

Abstract Summary PhenoScanner is a curated database of publicly available results from large-scale genetic association studies in humans. This online tool facilitates ‘phenome scans’, where genetic variants are cross-referenced for association with many phenotypes of different types. Here we present a major update of PhenoScanner (‘PhenoScanner V2’), including over 150 million genetic variants and more than 65 billion associations (compared to 350 million associations in PhenoScanner V1) with diseases and traits, gene expression, metabolite and protein levels, and epigenetic markers. The query options have been extended to include searches by genes, genomic regions and phenotypes, as well as for genetic variants. All variants are positionally annotated using the Variant Effect Predictor and the phenotypes are mapped to Experimental Factor Ontology terms. Linkage disequilibrium statistics from the 1000 Genomes project can be used to search for phenotype associations with proxy variants. Availability and implementation PhenoScanner V2 is available at www.phenoscanner.medschl.cam.ac.uk.


PLoS Genetics ◽  
2013 ◽  
Vol 9 (12) ◽  
pp. e1003959 ◽  
Author(s):  
Carrie B. Moore ◽  
John R. Wallace ◽  
Daniel J. Wolfe ◽  
Alex T. Frase ◽  
Sarah A. Pendergrass ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fadilla Wahyudi ◽  
Farhang Aghakhanian ◽  
Sadequr Rahman ◽  
Yik-Ying Teo ◽  
Michał Szpak ◽  
...  

Abstract Background In population genomics, polymorphisms that are highly differentiated between geographically separated populations are often suggestive of Darwinian positive selection. Genomic scans have highlighted several such regions in African and non-African populations, but only a handful of these have functional data that clearly associates candidate variations driving the selection process. Fine-Mapping of Adaptive Variation (FineMAV) was developed to address this in a high-throughput manner using population based whole-genome sequences generated by the 1000 Genomes Project. It pinpoints positively selected genetic variants in sequencing data by prioritizing high frequency, population-specific and functional derived alleles. Results We developed a stand-alone software that implements the FineMAV statistic. To graphically visualise the FineMAV scores, it outputs the statistics as bigWig files, which is a common file format supported by many genome browsers. It is available as a command-line and graphical user interface. The software was tested by replicating the FineMAV scores obtained using 1000 Genomes Project African, European, East and South Asian populations and subsequently applied to whole-genome sequencing datasets from Singapore and China to highlight population specific variants that can be subsequently modelled. The software tool is publicly available at https://github.com/fadilla-wahyudi/finemav. Conclusions The software tool described here determines genome-wide FineMAV scores, using low or high-coverage whole-genome sequencing datasets, that can be used to prioritize a list of population specific, highly differentiated candidate variants for in vitro or in vivo functional screens. The tool displays these scores on the human genome browsers for easy visualisation, annotation and comparison between different genomic regions in worldwide human populations.


2019 ◽  
Vol 3 (s1) ◽  
pp. 115-115
Author(s):  
Manasi Malik ◽  
Naiqi Shi ◽  
Geraldine Serwald ◽  
Grace Y. Lee ◽  
Antonina I. Frolova ◽  
...  

OBJECTIVES/SPECIFIC AIMS: Previous studies suggest that genetic variants in the oxytocin receptor (OXTR) may alter oxytocin dose requirement for labor induction and may increase risk for preterm labor and neurodevelopmental disorders. However, the mechanisms of actions of these variants remain unknown. The goal of this study was to functionally characterize common missense and noncoding variants in OXTR. First, we aimed to determine the effects of missense variants on two major aspects of receptor function: calcium signaling and β-arrestin recruitment. Second, we used allelic expression imbalance assays in an effort to identify regulatory single nucleotide polymorphisms (SNPs) in noncoding regions of OXTR that alter OXTR mRNA expression. METHODS/STUDY POPULATION: We used the Exome Aggregation Consortium database to identify the 12 most prevalent missense single nucleotide variants in OXTR. To determine the functional effects of these variants, we transfected human embryonic kidney cells (a common model system used to study receptor function) with wild type OXTR, variant OXTR, or empty vector control. We used the calcium-sensitive dye Fluo4 to quantify intracellular calcium flux in response to oxytocin treatment, and used bioluminescence resonance energy transfer assays to measure recruitment of the signaling partner β-arrestin to the receptor. To investigate potential effects of noncoding SNPs on OXTR mRNA expression, we quantified allele-specific expression of OXTR in human uterine tissue obtained from participants at the time of Cesarean section. We used next-generation sequencing (Illumina MiSeq) to count alleles of a reporter SNP in OXTR exon 3. RESULTS/ANTICIPATED RESULTS: Of the 12 most prevalent missense single nucleotide variants, four were predicted to be deleterious by PolyPhen variant annotation software. We anticipate that these variants will alter receptor signaling through calcium or β-arrestin pathways. We further observed that a reporter SNP in OXTR exon 3 exhibits significant allelic expression imbalance in a subset of our myometrial tissue samples, indicating that OXTR expression may be regulated by a functional SNP. Our current work focuses on discovering the functional SNPs in OXTR responsible for the pattern of allelic expression imbalance seen in mRNA. In the future, we will seek to explore the effects of these variants on uterine function by using genome editing of uterine smooth muscle cells. DISCUSSION/SIGNIFICANCE OF IMPACT: Our results suggest that both missense and noncoding variants may affect OXTR expression and function. Future studies may suggest that OXTR sequencing, genotyping, or expression analysis would be useful to identify individuals likely to respond or fail to respond to safe doses of oxytocin for labor induction. Personalizing approaches for labor induction in this way would increase the safety of oxytocin and potentially reduce maternal morbidity and mortality.


PLoS ONE ◽  
2013 ◽  
Vol 8 (5) ◽  
pp. e64343 ◽  
Author(s):  
Andrew R. Wood ◽  
John R. B. Perry ◽  
Toshiko Tanaka ◽  
Dena G. Hernandez ◽  
Hou-Feng Zheng ◽  
...  

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Farah Qaiser ◽  
Yue Yin ◽  
Carolyn B. Mervis ◽  
Colleen A. Morris ◽  
Bonita P. Klein-Tasman ◽  
...  

Abstract Background 7q11.23 duplication (Dup7) is one of the most frequent recurrent copy number variants (CNVs) in individuals with autism spectrum disorder (ASD), but based on gold-standard assessments, only 19% of Dup7 carriers have ASD, suggesting that additional genetic factors are necessary to manifest the ASD phenotype. To assess the contribution of additional genetic variants to the Dup7 phenotype, we conducted whole-genome sequencing analysis of 20 Dup7 carriers: nine with ASD (Dup7-ASD) and 11 without ASD (Dup7-non-ASD). Results We identified three rare variants of potential clinical relevance for ASD: a 1q21.1 microdeletion (Dup7-non-ASD) and two deletions which disrupted IMMP2L (one Dup7-ASD, one Dup7-non-ASD). There were no significant differences in gene-set or pathway variant burden between the Dup7-ASD and Dup7-non-ASD groups. However, overall intellectual ability negatively correlated with the number of rare loss-of-function variants present in nervous system development and membrane component pathways, and adaptive behaviour standard scores negatively correlated with the number of low-frequency likely-damaging missense variants found in genes expressed in the prenatal human brain. ASD severity positively correlated with the number of low frequency loss-of-function variants impacting genes expressed at low levels in the brain, and genes with a low level of intolerance. Conclusions Our study suggests that in the presence of the same pathogenic Dup7 variant, rare and low frequency genetic variants act additively to contribute to components of the overall Dup7 phenotype.


Sign in / Sign up

Export Citation Format

Share Document