segmental duplications Latest Research Papers

Haplotype-resolved inversion landscape reveals hotspots of mutational recurrence associated with genomic disorders

10.1101/2021.12.20.472354 ◽

2021 ◽

Author(s):

David Porubsky ◽

Wolfram Höps ◽

Hufsah Ashraf ◽

PingHsun Hsieh ◽

Bernardo Rodriguez-Martin ◽

...

Keyword(s):

Copy Number ◽

Copy Number Variants ◽

Segmental Duplications ◽

Common Variants ◽

Base Pairs ◽

Genomic Technologies ◽

Human Genomes ◽

Heterozygous Carriers ◽

Critical Regions ◽

Genomic Disorders

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1-retrotransposition; 80% of the larger inversions are balanced and affect twice as many base pairs as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or mobile elements. Since this suggests recurrence due to non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7*10-4 per locus and generation. Recurrent inversions exhibit a sex-chromosomal bias, and significantly co-localize to the critical regions of genomic disorders. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes to disease-causing CNVs.

No evidence of paralogous loci or new bona fide microRNAs in telomere to telomere (T2T) genomic data

10.1101/2021.12.09.471935 ◽

2021 ◽

Author(s):

Arun H. Patil ◽

Marc K. Halushka ◽

Bastian K. Fromm

Keyword(s):

Human Genome ◽

Genome Project ◽

Segmental Duplications ◽

Base Pairs ◽

Repeat Elements ◽

Satellite Sequences ◽

Bona Fide ◽

Additional Base ◽

Genomic Regions ◽

Unmapped Reads

The telomere to telomere (T2T) genome project discovered and mapped ~240 million additional base pairs of primarily telomeric and centromeric reads. Much of this sequence was comprised of satellite sequences and large segmental duplications. We evaluated the extent to which human bona fide microRNAs (miRNAs) may be found in additional paralogous genomic loci or if previously undescribed microRNAs are present in these newly sequenced regions of the human genome. New genomic regions of the T2T project spanning ~240 million bp of sequence were obtained and evaluated by blastn for the human miRNAs contained in MirGeneDB2.0 (N=556) and miRBase (N = 1917) along with all species of MirGeneDB2.0 miRNAs (N=10,899). Additionally, bowtie was used to compare unmapped reads from >4,000 primary cell samples to the new T2T sequence. Based on sequence and structure, no bona fide miRNAs were identified. Ninety-seven miRNAs of questionable authenticity (frequently known repeat elements) were identified from the miRBase dataset across the newly described regions of the human genome. These 97 represent only 51 miRNA families due to paralogy of highly similar miRNAs such as 24 members of the hsa-mir-548 family. Altogether, this data strongly supports our having identified widely expressed bona fide miRNAs in the human genome and move us further toward the completion of human miRNA discovery.

A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes

10.1101/2021.12.08.471837 ◽

2021 ◽

Author(s):

Huishi Toh ◽

Chentao Yang ◽

Giulio Formenti ◽

Kalpana Raja ◽

Lily Yan ◽

...

Keyword(s):

Type 2 Diabetes ◽

Genome Assembly ◽

Genetic Basis ◽

Model Organism ◽

Diurnal Rhythms ◽

Segmental Duplications ◽

Genetic Studies ◽

Reference Genome Assembly ◽

Metabolic Dysfunctions

The Nile rat (Avicanthis niloticus) is an important animal model for biomedical research, including the study of diurnal rhythms and type 2 diabetes. Here, we report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3,613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including those that affect genes associated with type 2 diabetes and metabolic dysfunctions. These include 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse. Our findings reflect the exceptional level of genomic detail present in this assembly, which will greatly expand the potential of the Nile rat as a model organism for genetic studies.

Stepwise evolution of a butterfly supergene via duplication and inversion

10.1101/2021.12.06.471392 ◽

2021 ◽

Author(s):

Kang-Wook Kim ◽

Rishi De-Kayne ◽

Ian J. Gordon ◽

Kennedy Saitoti Omufwoko ◽

Dino J. Martins ◽

...

Keyword(s):

Structural Changes ◽

Monarch Butterfly ◽

Segmental Duplications ◽

Recombination Suppression ◽

Structural Differences ◽

The Face ◽

Suppressed Recombination ◽

Stepwise Evolution ◽

And Inversion

ABSTRACTSupergenes maintain adaptive clusters of alleles in the face of genetic mixing. Although usually attributed to inversions, there are few cases in which the specific mechanisms of recombination suppression, and their timing, have been reconstructed in detail. We investigated the origin of the BC supergene, which controls variation in warning colouration in the African Monarch butterfly, Danaus chrysippus. By generating chromosome-scale assemblies for all three alleles, we identified multiple structural differences. Most strikingly, we find that a region of >1 million bp underwent several segmental duplications at least 7.5 million years ago. The resulting duplicated fragments appear to have triggered four inversions in surrounding parts of the chromosome, resulting in stepwise growth of the region of suppressed recombination. Phylogenies for the inversions are incongruent with the species tree, and suggest that structural polymorphisms have persisted for at least 4.1 million years. In addition to the role of duplications in triggering inversions, our results suggest a previously undescribed mechanism of recombination suppression through independent losses of divergent duplicated tracts. Overall, our findings challenge the idea of instantaneous supergene evolution through a single inversion event, instead pointing towards a stepwise process involving a variety of structural changes.

NPGREAT: Assembly of the human subtelomere regions with the use of ultralong Nanopore reads and Linked-Reads

10.21203/rs.3.rs-1080088/v1 ◽

2021 ◽

Author(s):

Eleni Adam ◽

Desh Ranjan ◽

Harold Riethman

Keyword(s):

Segmental Duplication ◽

Tandem Repeats ◽

Segmental Duplications ◽

Sequence Contigs ◽

Correct Orientation ◽

Assembly Method ◽

Wide Range ◽

Complete Sequencing ◽

Lower Depth ◽

High Depth

Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies (associated primarily with tandem repeats). While we focus on subtelomeres, the method is generally applicable to assembly of segmental duplications and other complex genome regions. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in GM12878. 10X Linked-Read datasets with high depth of coverage and a TELL-seq Linked-Read dataset with lower depth of coverage were each combined with the ultralong nanopore reads from the same genome to provide improved assemblies. Tandem repeat regions of the short-read assemblies, which are especially prone to misassembly due to collapse of matching tandemly repeated reads, were readily identified and properly sized by comparison with the nanopore reads. Conclusion The NPGREAT method resulted in extension of high-quality assemblies into otherwise inaccessible segmental duplication regions near telomeres, enhancing our ability to accurately assemble human subtelomere DNA. This information will enable improved analyses of the structure, function, and evolution of these key regions.

New Genes in the Drosophila Y Chromosome: Lessons from D. willistoni

Genes ◽

10.3390/genes12111815 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1815

Author(s):

João Ricchio ◽

Fabiana Uno ◽

A. Bernardo Carvalho

Keyword(s):

Y Chromosome ◽

Strong Evidence ◽

Male Fertility ◽

Gene Loss ◽

Segmental Duplications ◽

Duplicated Genes ◽

New Genes ◽

Y Chromosomes ◽

Autosomal Copy

Y chromosomes play important roles in sex determination and male fertility. In several groups (e.g., mammals) there is strong evidence that they evolved through gene loss from a common X-Y ancestor, but in Drosophila the acquisition of new genes plays a major role. This conclusion came mostly from studies in two species. Here we report the identification of the 22 Y-linked genes in D. willistoni. They all fit the previously observed pattern of autosomal or X-linked testis-specific genes that duplicated to the Y. The ratio of gene gains to gene losses is ~25 in D. willistoni, confirming the prominent role of gene gains in the evolution of Drosophila Y chromosomes. We also found four large segmental duplications (ranging from 62 kb to 303 kb) from autosomal regions to the Y, containing ~58 genes. All but four of these duplicated genes became pseudogenes in the Y or disappeared. In the GK20609 gene the Y-linked copy remained functional, whereas its original autosomal copy degenerated, demonstrating how autosomal genes are transferred to the Y chromosome. Since the segmental duplication that carried GK20609 contained six other testis-specific genes, it seems that chance plays a significant role in the acquisition of new genes by the Drosophila Y chromosome.

Comprehensive In Silico Characterization and Expression Profiling of TCP Gene Family in Rapeseed

Frontiers in Genetics ◽

10.3389/fgene.2021.794297 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yunfei Wen ◽

Ali Raza ◽

Wen Chu ◽

Xiling Zou ◽

Hongtao Cheng ◽

...

Keyword(s):

Abiotic Stress ◽

Expression Analysis ◽

Stress Responses ◽

Enrichment Analysis ◽

Class Ii ◽

Class I ◽

Segmental Duplications ◽

Evolutionary Analysis ◽

Specific Expression ◽

A Genome

TCP proteins are plant-specific transcription factors that have multipurpose roles in plant developmental procedures and stress responses. Therefore, a genome-wide analysis was performed to categorize the TCP genes in the rapeseed genome. In this study, a total of 80 BnTCP genes were identified in the rapeseed genome and grouped into two main classes (PCF and CYC/TB1) according to phylogenetic analysis. The universal evolutionary analysis uncovered that BnTCP genes had experienced segmental duplications and positive selection pressure. Gene structure and conserved motif examination presented that Class I and Class II have diverse intron-exon patterns and motifs numbers. Overall, nine conserved motifs were identified and varied from 2 to 7 in all TCP genes; and some of them were gene-specific. Mainly, Class II (PCF and CYC/TB1) possessed diverse structures compared to Class I. We identified four hormone- and four stress-related responsive cis-elements in the promoter regions. Moreover, 32 bna-miRNAs from 14 families were found to be targeting 21 BnTCPs genes. Gene ontology enrichment analysis presented that the BnTCP genes were primarily related to RNA/DNA binding, metabolic processes, transcriptional regulatory activities, etc. Transcriptome-based tissue-specific expression analysis showed that only a few genes (mainly BnTCP9, BnTCP22, BnTCP25, BnTCP48, BnTCP52, BnTCP60, BnTCP66, and BnTCP74) presented higher expression in root, stem, leaf, flower, seeds, and silique among all tested tissues. Likewise, qRT-PCR-based expression analysis exhibited that BnTCP36, BnTCP39, BnTCP53, BnTCP59, and BnTCP60 showed higher expression at certain time points under various hormones and abiotic stress conditions but not by drought and MeJA. Our results opened the new groundwork for future understanding of the intricate mechanisms of BnTCP in various developmental processes and abiotic stress signaling pathways in rapeseed.

Quantitative assessment reveals the dominance of duplicated sequences in germline-derived extrachromosomal circular DNA

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2102842118 ◽

2021 ◽

Vol 118 (47) ◽

pp. e2102842118

Author(s):

Lila Mouakkad-Montoya ◽

Michael M. Murata ◽

Arvis Sulovari ◽

Ryusuke Suzuki ◽

Beth Osia ◽

...

Keyword(s):

Total Population ◽

Nuclear Dna ◽

Single Copy ◽

Quantitative Information ◽

Copy Number Variations ◽

Segmental Duplications ◽

Chromosomal Dna ◽

Circular Dna ◽

Genomic Regions

Extrachromosomal circular DNA (eccDNA) originates from linear chromosomal DNA in various human tissues under physiological and disease conditions. The genomic origins of eccDNA have largely been investigated using in vitro–amplified DNA. However, in vitro amplification obscures quantitative information by skewing the total population stoichiometry. In addition, the analyses have focused on eccDNA stemming from single-copy genomic regions, leaving eccDNA from multicopy regions unexamined. To address these issues, we isolated eccDNA without in vitro amplification (naïve small circular DNA, nscDNA) and assessed the populations quantitatively by integrated genomic, molecular, and cytogenetic approaches. nscDNA of up to tens of kilobases were successfully enriched by our approach and were predominantly derived from multicopy genomic regions including segmental duplications (SDs). SDs, which account for 5% of the human genome and are hotspots for copy number variations, were significantly overrepresented in sperm nscDNA, with three times more sequencing reads derived from SDs than from the entire single-copy regions. SDs were also overrepresented in mouse sperm nscDNA, which we estimated to comprise 0.2% of nuclear DNA. Considering that eccDNA can be integrated into chromosomes, germline-derived nscDNA may be a mediator of genome diversity.

Genome-wide identification of RING finger genes in flax (Linum usitatissimum) and analyses of their evolution

PeerJ ◽

10.7717/peerj.12491 ◽

2021 ◽

Vol 9 ◽

pp. e12491

Author(s):

Xianwen Meng ◽

Jing Liu ◽

Mingde Zhao

Keyword(s):

Gene Family ◽

Stress Responses ◽

Linum Usitatissimum ◽

Ring Finger ◽

Synonymous Substitution ◽

Segmental Duplications ◽

Biotic And Abiotic Stress ◽

Gene Pairs ◽

Genome Wide ◽

Substitution Ratio

Background Flax (Linum usitatissimum) is an important crop for its seed oil and stem fiber. Really Interesting New Gene (RING) finger genes play essential roles in growth, development, and biotic and abiotic stress responses in plants. However, little is known about these genes in flax. Methods Here, we performed a systematic genome-wide analysis to identify RING finger genes in flax. Results We identified 587 RING domains in 574 proteins and classified them into RING-H2 (292), RING-HCa (181), RING-HCb (23), RING-v (53), RING-C2 (31), RING-D (2), RING-S/T (3), and RING-G (2). These proteins were further divided into 45 groups according to domain organization. These genes were located in 15 chromosomes and clustered into three clades according to their phylogenetic relationships. A total of 312 segmental duplicated gene pairs were inferred from 411 RING finger genes, indicating a major contribution of segmental duplications to the RING finger gene family expansion. The non-synonymous/synonymous substitution ratio of the segmentally duplicated gene pairs was less than 1, suggesting that the gene family was under negative selection since duplication. Further, most RING genes in flax were differentially expressed during seed development or in the shoot apex. This study provides useful information for further functional analysis of RING finger genes in flax and to develop gene-derived molecular markers in flax breeding.

Multi-modal investigation of the schizophrenia-associated 3q29 genomic interval reveals global genetic diversity with unique haplotypes and segments that increase the risk for non-allelic homologous recombination

10.1101/2021.11.10.21266197 ◽

2021 ◽

Author(s):

Feyza Yilmaz ◽

Umamaheswaran Gurusamy ◽

Trenell Mosley ◽

Yulia Mostovoy ◽

Tamim H. Shaikh ◽

...

Keyword(s):

Homologous Recombination ◽

Copy Number ◽

Chromosomal Rearrangements ◽

Deletion Syndrome ◽

Segmental Duplications ◽

Structural Variations ◽

Genomic Interval ◽

Genomic Disorders ◽

Recurrent Deletion ◽

Duplication Syndrome

Chromosomal rearrangements that alter the copy number of dosage-sensitive genes can result in genomic disorders, such as the 3q29 deletion syndrome. At the 3q29 region, non-allelic homologous recombination (NAHR) between paralogous copies of segmental duplications (SDs) leads to a recurrent ~1.6 Mbp deletion or duplication, causing neurodevelopmental and psychiatric phenotypes. However, risk factors contributing to NAHR at this locus are not well understood. In this study, we used an optical mapping approach to identify structural variations within the 3q29 interval. We identified 18 novel haplotypes among 161 unaffected individuals and used this information to characterize this region in 18 probands with either the 3q29 deletion or 3q29 duplication syndrome. A significant amount of variation in haplotype prevalence was observed between populations. Within probands, we narrowed down the breakpoints to a ~5 kbp segment within the SD blocks in 89% of the 3q29 deletion and duplication cases studied. Furthermore, all 3q29 deletion and duplication cases could be categorized into one of five distinct classes based on their breakpoints. Contrary to previous findings for other recurrent deletion and duplication loci, there was no evidence for inversions in either parent of the probands mediating the deletion or duplication seen in this syndrome.

segmental duplications
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Haplotype-resolved inversion landscape reveals hotspots of mutational recurrence associated with genomic disorders

No evidence of paralogous loci or new bona fide microRNAs in telomere to telomere (T2T) genomic data

A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes

Stepwise evolution of a butterfly supergene via duplication and inversion

NPGREAT: Assembly of the human subtelomere regions with the use of ultralong Nanopore reads and Linked-Reads

New Genes in the Drosophila Y Chromosome: Lessons from D. willistoni

Comprehensive In Silico Characterization and Expression Profiling of TCP Gene Family in Rapeseed

Quantitative assessment reveals the dominance of duplicated sequences in germline-derived extrachromosomal circular DNA

Genome-wide identification of RING finger genes in flax (Linum usitatissimum) and analyses of their evolution

Multi-modal investigation of the schizophrenia-associated 3q29 genomic interval reveals global genetic diversity with unique haplotypes and segments that increase the risk for non-allelic homologous recombination

Export Citation Format

segmental duplicationsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Haplotype-resolved inversion landscape reveals hotspots of mutational recurrence associated with genomic disorders

No evidence of paralogous loci or new bona fide microRNAs in telomere to telomere (T2T) genomic data

A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes

Stepwise evolution of a butterfly supergene via duplication and inversion

NPGREAT: Assembly of the human subtelomere regions with the use of ultralong Nanopore reads and Linked-Reads

New Genes in the Drosophila Y Chromosome: Lessons from D. willistoni

Comprehensive In Silico Characterization and Expression Profiling of TCP Gene Family in Rapeseed

Quantitative assessment reveals the dominance of duplicated sequences in germline-derived extrachromosomal circular DNA

Genome-wide identification of RING finger genes in flax (Linum usitatissimum) and analyses of their evolution

Multi-modal investigation of the schizophrenia-associated 3q29 genomic interval reveals global genetic diversity with unique haplotypes and segments that increase the risk for non-allelic homologous recombination

segmental duplications
Recently Published Documents