scholarly journals The quiescent X, the replicative Y and the Autosomes

2018 ◽  
Author(s):  
Guillaume Achaz ◽  
Serge Gangloff ◽  
Benoit Arcangioli

AbstractFrom the analysis of the mutation spectrum in the 2,504 sequenced human genomes from the 1000 genomes project (phase 3), we show that sexual chromosomes (X and Y) exhibit a different proportion of indel mutations than autosomes (A), ranking them X>A>Y. We further show that X chromosomes exhibit a higher ratio of deletion/insertion when compared to autosomes. This simple pattern shows that the recent report that non-dividing quiescent yeast cells accumulate relatively more indels (and particularly deletions) than replicating ones also applies to metazoan cells, including humans. Indeed, the X chromosomes display more indels than the autosomes, having spent more time in quiescent oocytes, whereas the Y chromosomes are solely present in the replicating spermatocytes. From the proportion of indels, we have inferred that de novo mutations arising in the maternal lineage are twice more likely to be indels than mutations from the paternal lineage. Our observation, consistent with a recent trio analysis of the spectrum of mutations inherited from the maternal lineage, is likely a major component in our understanding of the origin of anisogamy.

Author(s):  
Marta Byrska-Bishop ◽  
Uday S. Evani ◽  
Xuefang Zhao ◽  
Anna O. Basile ◽  
Haley J. Abel ◽  
...  

ABSTRACTThe 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and microarray genotyping. Here, we present a new, high coverage WGS resource encompassing the original 2,504 1kGP samples, as well as an additional 698 related samples that result in 602 complete trios in the 1kGP cohort. We sequenced this expanded 1kGP cohort of 3,202 samples to a targeted depth of 30X using Illumina NovaSeq 6000 instruments. We performed SNV/INDEL calling against the GRCh38 reference using GATK’s HaplotypeCaller, and generated a comprehensive set of SVs by integrating multiple analytic methods through a sophisticated machine learning model, upgrading the 1kGP dataset to current state-of-the-art standards. Using this strategy, we defined over 111 million SNVs, 14 million INDELs, and ∼170 thousand SVs across the entire cohort of 3,202 samples with estimated false discovery rate (FDR) of 0.3%, 1.0%, and 1.8%, respectively. By comparison to the low-coverage phase 3 callset, we observed substantial improvements in variant discovery and estimated FDR that were facilitated by high coverage re-sequencing and expansion of the cohort. Specifically, we called 7% more SNVs, 59% more INDELs, and 170% more SVs per genome than the phase 3 callset. Moreover, we leveraged the presence of families in the cohort to achieve superior haplotype phasing accuracy and we demonstrate improvements that the high coverage panel brings especially for INDEL imputation. We make all the data generated as part of this project publicly available and we envision this updated version of the 1kGP callset to become the new de facto public resource for the worldwide scientific community working on genomics and genetics.


2021 ◽  
Author(s):  
Tamara Soledad Frontanilla ◽  
Guilherme Valle Silva ◽  
Jesus Ayala ◽  
Celso Teixeira Mendes

Accurate STR genotyping from next-generation sequencing (NGS) data has been challenging. Haplotype inference and phasing for STRs (HipSTR) was specifically developed to deal with genotyping errors and obtain reliable STR genotypes from whole-genome sequencing datasets. The objective of this investigation was to perform a comprehensive genotyping analysis of a set of STRs of broad forensic interest from the 1000 Genomes populations and release a reliable open-access STR database to the forensic genetics community. A set of 22 STR markers were analyzed using the CRAM files of the 1000 Genomes Project Phase 3 high-coverage (30x) dataset generated by the New York Genome Center (NYGC). HipSTR was used to call genotypes from 2,504 samples from 26 populations organized into five groups: African, East Asian, European, South Asian, and admixed American. The D21S11 marker could not be detected in the present study. Moreover, the Hardy-Weinberg equilibrium analysis, coupled with a comprehensive analysis of allele frequencies, revealed that HipSTR could not identify longer Penta E (and Penta D at a lesser extent) alleles. This issue is probably due to the limited length of sequencing reads available for genotype calling, resulting in heterozygote deficiency. Notwithstanding that, AMOVA, a clustering analysis using STRUCTURE, and a Principal Coordinates Analysis revealed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium (AFR, EUR, EAS, SAS). Meanwhile, the AMOVA results corroborated previous reports that most of the variance is (97.12%) observed within populations. This set of analyses revealed that except for larger Penta D and Penta E alleles, allele frequencies and genotypes defined by HipSTR from the 1000 Genomes Project phase 3 data and offered as an open-access database are consistent and highly reliable.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 779
Author(s):  
Artem Lisachov ◽  
Daria Andreyushkova ◽  
Guzel Davletshina ◽  
Dmitry Prokopov ◽  
Svetlana Romanenko ◽  
...  

Heteromorphic W and Y sex chromosomes often experience gene loss and heterochromatinization, which is frequently viewed as their “degeneration”. However, the evolutionary trajectories of the heterochromosomes are in fact more complex since they may not only lose but also acquire new sequences. Previously, we found that the heterochromatic W chromosome of a lizard Eremias velox (Lacertidae) is decondensed and thus transcriptionally active during the lampbrush stage. To determine possible sources of this transcription, we sequenced DNA from a microdissected W chromosome sample and a total female DNA sample and analyzed the results of reference-based and de novo assembly. We found a new repetitive sequence, consisting of fragments of an autosomal protein-coding gene ATF7IP2, several SINE elements, and sequences of unknown origin. This repetitive element is distributed across the whole length of the W chromosome, except the centromeric region. Since it retained only 3 out of 10 original ATF7IP2 exons, it remains unclear whether it is able to produce a protein product. Subsequent studies are required to test the presence of this element in other species of Lacertidae and possible functionality. Our results provide further evidence for the view of W and Y chromosomes as not just “degraded” copies of Z and X chromosomes but independent genomic segments in which novel genetic elements may arise.


2019 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Ernesto Lowy-Gallego ◽  
Susan Fairley ◽  
Xiangqun Zheng-Bradley ◽  
Magali Ruffier ◽  
Laura Clarke ◽  
...  

We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called de novo on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.


2016 ◽  
Author(s):  
Cathal Seoighe ◽  
Aylwyn Scally

AbstractThe rate of germline mutation varies widely between species but little is known about the extent of variation in the germline mutation rate between individuals of the same species. Here we demonstrate that an allele that increases the rate of germline mutation can result in a distinctive signature in the genomic region linked to the affected locus, characterized by a number of haplotypes with a locally high proportion of derived alleles, against a background of haplotypes carrying a typical proportion of derived alleles. We searched for this signature in human haplotype data from phase 3 of the 1000 Genomes Project and report a number of candidate mutator loci, several of which are located close to or within genes involved in DNA repair or the DNA damage response. To investigate whether mutator alleles remained active at any of these loci, we used de novo mutation counts from human parent-offspring trios in the 1000 Genomes and Genome of the Netherlands cohorts, looking for an elevated number of de novo mutations in the offspring of parents carrying a candidate mutator haplotype at each of these loci. We found some support for two of the candidate loci, including one locus just upstream of the BRSK2 gene, which is expressed in the testis and has been reported to be involved in the response to DNA damage.Author SummaryEach time a genome is replicated there is the possibility of error resulting in the incorporation of an incorrect base or bases in the genome sequence. When these errors occur in cells that lead to the production of gametes they can be incorporated into the germline. Such germline mutations are the basis of evolutionary change; however, to date there has been little attempt to quantify the extent of genetic variation in human populations in the rate at which they occur. This is particularly important because new spontaneous mutations are thought to make an important contribution to many human diseases. Here we present a new way to identify genetic loci that may be associated with an elevated rate of germline mutation and report the application of this method to data from a large number of human genomes, generated by the 1000 Genomes Project. Several of the candidate loci we report are in or near genes involved in DNA repair and some were supported by direct measurement of the mutation rate obtained from parent-offspring trios.


2020 ◽  
Author(s):  
Nathan S. Harris ◽  
Alan R. Rogers

AbstractSignals of selection are not often shared between populations. When a mutual signal is detected, it is often not known if selection occurred before or after populations split. Here we develop a method to detect genomic regions at which selection has favored different haplotypes in two populations. This method is verified through simulations and tested on small regions of the genome. This method was then expanded to scan the phase 3 genomes of the 1000 Genomes Project populations for regions in which the evidence for independent selection is strongest. We identify several genes which likely underwent selection independently in different populations.


2022 ◽  
Vol 12 ◽  
Author(s):  
Lu Cao ◽  
Ruixue Zhang ◽  
Yirui Wang ◽  
Xia Hu ◽  
Liang Yong ◽  
...  

The important role of MHC in the pathogenesis of vitiligo and SLE has been confirmed in various populations. To map the most significant MHC variants associated with the risk of vitiligo and SLE, we conducted fine mapping analysis using 1117 vitiligo cases, 1046 SLE cases and 1693 healthy control subjects in the Han-MHC reference panel and 1000 Genomes Project phase 3. rs113465897 (P=1.03×10-13, OR=1.64, 95%CI =1.44–1.87) and rs3129898 (P=4.21×10-17, OR=1.93, 95%CI=1.66–2.25) were identified as being most strongly associated with vitiligo and SLE, respectively. Stepwise conditional analysis revealed additional independent signals at rs3130969(p=1.48×10-7, OR=0.69, 95%CI=0.60–0.79), HLA-DPB1*03:01 (p=1.07×10-6, OR=1.94, 95%CI=1.49–2.53) being linked to vitiligo and HLA-DQB1*0301 (P=4.53×10-7, OR=0.62, 95%CI=0.52-0.75) to SLE. Considering that epidemiological studies have confirmed comorbidities of vitiligo and SLE, we used the GCTA tool to analyse the genetic correlation between these two diseases in the HLA region, the correlation coefficient was 0.79 (P=5.99×10-10, SE=0.07), confirming their similar genetic backgrounds. Our findings highlight the value of the MHC region in vitiligo and SLE and provide a new perspective for comorbidities among autoimmune diseases.


2016 ◽  
Author(s):  
G. David Poznik

AbstractWe have developed an algorithm to rapidly and accurately identify the Y-chromosome haplogroup of each male in a sample of one to millions. The algorithm, implemented in the yHaplo* software package (yHaplo), does not rely on any particular genotyping modality or platform. Full sequences yield the most granular haplogroup classifications, but genotyping arrays can yield reliable calls, provided a reasonable number of phylogenetically informative variants has been assayed. The algorithm is robust to missing data, genotype errors, mutation recurrence, and other complications. We have tested the software on full sequences from phase 3 of the 1000 Genomes Project and on subsets thereof constructed by downsampling to SNPs present on each of four genotyping arrays. We have also run the software on array data from more than 600,000 males.


2021 ◽  
Author(s):  
Zhong Wang ◽  
Lei Sun ◽  
Andrew D Paterson

An unexpectedly high proportion of SNPs on the X chromosome in the 1000 Genomes Project phase 3 data were identified with significant sex differences in minor allele frequencies (sdMAF). sdMAF persisted for many of these SNPs in the recently released high coverage whole genome sequence, and it was consistent between the five super-populations. Among the 245,825 common biallelic SNPs in phase 3 data presumed to be high quality, 2,039 have genome-wide significant sdMAF (p-value <5e-8). sdMAF varied by location: (NPR)=0.83%, pseudo-autosomal region (PAR1)=0.29%, PAR2=13.1%, and PAR3=0.85% of SNPs had sdMAF, and they were clustered at the NPR-PAR boundaries, among others. sdMAF at the NPR-PAR boundaries are biologically expected due to sex-linkage, but have generally been ignored in association studies. For comparison, similar analyses found only 6, 1 and 0 SNPs with significant sdMAF on chromosomes 1, 7 and 22, respectively. Future X chromosome analyses need to take sdMAF into account.


Sign in / Sign up

Export Citation Format

Share Document