scholarly journals Evidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
PingHsun Hsieh ◽  
Vy Dang ◽  
Mitchell R. Vollger ◽  
Yafei Mao ◽  
Tzu-Hsueh Huang ◽  
...  

AbstractTRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.

2020 ◽  
Author(s):  
PingHsun Hsieh ◽  
Vy Dang ◽  
Mitchell Vollger ◽  
Yafei Mao ◽  
Tzu-Hsueh Huang ◽  
...  

Abstract TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in human and nonhuman primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations that altered TCAF copy number and regulation. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. The significant, positive effect of H4 on TCAF2 expression in modern-day humans with candidate associations for hypothyroidism, nerve compression, and diabetes suggests TCAF diversification among hominins potentially in response to cold or dietary adaptations.


Author(s):  
Justin Wagner ◽  
Nathan D Olson ◽  
Lindsay Harris ◽  
Ziad Khan ◽  
Jesse Farek ◽  
...  

AbstractGenome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling and sequencing methods. Here we use accurate long and linked reads to expand the prior benchmark to include difficult-to-map regions and segmental duplications that are not readily accessible to short reads. Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16 % new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., PMS2). We increase coverage of the autosomal GRCh38 assembly from 85 % to 92 %, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and assembly errors) that should not have been in the previous version. Our new benchmark reliably identifies both false positives and false negatives across multiple short-, linked-, and long-read based variant calling methods. As an example of its utility, this benchmark identifies eight times more false negatives in a short read variant call set relative to our previous benchmark, mostly in difficult-to-map regions. To enable robust small variant benchmarking, we still exclude 3.6% of GRCh37 and 5.0% of GRCh38 in (1) highly repetitive regions such as large, highly similar segmental duplications and the centromere not accessible to our data and (2) regions where our sample is highly divergent from the reference due to large indels, structural variation, copy number variation, and/or errors in the reference (e.g., some KIR genes that have duplications in HG002). We have demonstrated the utility of this benchmark to assess performance in more challenging regions, which enables benchmarking in more difficult genes and continued technology and bioinformatics development. The v4.2.1 benchmarks are available under ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/.


2020 ◽  
Author(s):  
Christopher W. Whelan ◽  
Robert E. Handsaker ◽  
Giulio Genovese ◽  
Seva Kashin ◽  
Monkol Lek ◽  
...  

AbstractTwo intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs.Author SummaryCopy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the copy number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already display common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.


2008 ◽  
Vol 18 (5) ◽  
pp. 683-694 ◽  
Author(s):  
I. Cusco ◽  
R. Corominas ◽  
M. Bayes ◽  
R. Flores ◽  
N. Rivera-Brugues ◽  
...  

2018 ◽  
Author(s):  
Thomas A. Sasani ◽  
Kelsey R. Cone ◽  
Aaron R. Quinlan ◽  
Nels C. Elde

AbstractLarge DNA viruses rapidly evolve to defeat host defenses. Poxvirus adaptation can involve combinations of recombination-driven gene copy number variation and beneficial single nucleotide variants (SNVs) at the same locus, yet how these distinct mechanisms of genetic diversification might simultaneously facilitate adaptation to immune blocks is unknown. We performed experimental evolution with a vaccinia virus population harboring a SNV in a gene actively undergoing copy number amplification. Comparisons of virus genomes using the Oxford Nanopore Technologies sequencing platform allowed us to phase SNVs within large gene copy arrays for the first time, and uncovered a mechanism of adaptive SNV homogenization reminiscent of gene conversion, which is actively driven by selection. Our work reveals a new mechanism for the fluid gain of beneficial mutations in genetic regions undergoing active recombination in viruses, and illustrates the value of long read sequencing technologies for investigating complex genome dynamics in diverse biological systems.


Genomics ◽  
2006 ◽  
Vol 88 (2) ◽  
pp. 152-162 ◽  
Author(s):  
Cecilia de Bustos ◽  
Teresita Díaz de Ståhl ◽  
Arkadiusz Piotrowski ◽  
Kiran K. Mantripragada ◽  
Patrick G. Buckley ◽  
...  

2021 ◽  
Author(s):  
Riccardo Vicedomini ◽  
Lelia Polit ◽  
Silvana Condemi ◽  
Laura Longo ◽  
Alessandra Carbone

Dietary adaptation is the acquisition of an efficient system to digest food available in an ecosystem. To find the genetic basis for human dietary adaptation, we searched 16 genomes from Neandertal, Denisovan and Early Sapiens for food digestion genes that tend to have more or fewer copies than the modern human reference genome. Here, we identify 11 genes, including three gene clusters, with discernible copy number variation trends at the population level. The genomic variation shows how metabolic pathways for lipid, brown fat, protein or carbohydrate metabolism adapt to metabolize food from animal or plant sources. Interpreting the copy number profiles in relation to fossil evidence shows that Homo sapiens had an evolutionary advantage compared to Neandertal and Denisovan in adapting to cold and temperate ecosystems.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Thomas A Sasani ◽  
Kelsey R Cone ◽  
Aaron R Quinlan ◽  
Nels C Elde

Poxvirus adaptation can involve combinations of recombination-driven gene copy number variation and beneficial single nucleotide variants (SNVs) at the same loci. How these distinct mechanisms of genetic diversification might simultaneously facilitate adaptation to host immune defenses is unknown. We performed experimental evolution with vaccinia virus populations harboring a SNV in a gene actively undergoing copy number amplification. Using long sequencing reads from the Oxford Nanopore Technologies platform, we phased SNVs within large gene copy arrays for the first time. Our analysis uncovered a mechanism of adaptive SNV homogenization reminiscent of gene conversion, which is actively driven by selection. This study reveals a new mechanism for the fluid gain of beneficial mutations in genetic regions undergoing active recombination in viruses and illustrates the value of long read sequencing technologies for investigating complex genome dynamics in diverse biological systems.


2005 ◽  
Vol 77 (1) ◽  
pp. 78-88 ◽  
Author(s):  
Andrew J. Sharp ◽  
Devin P. Locke ◽  
Sean D. McGrath ◽  
Ze Cheng ◽  
Jeffrey A. Bailey ◽  
...  

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ksenia Lavrichenko ◽  
Stefan Johansson ◽  
Inge Jonassen

Abstract Background SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. Results We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. Conclusions Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.


Sign in / Sign up

Export Citation Format

Share Document