repeat structure
Recently Published Documents


TOTAL DOCUMENTS

126
(FIVE YEARS 24)

H-INDEX

29
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Dan Levy ◽  
Zihua Wang ◽  
Andrea Moffitt ◽  
Michael H. Wigler

Replication of tandem repeats of simple sequence motifs, also known as microsatellites, is error prone and variable lengths frequently occur during population expansions. Therefore, microsatellite length variations could serve as markers for cancer. However, accurate error-free quantitation of microsatellite lengths is difficult with current methods because of a high error rate during amplification and sequencing. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure so that it can replicate faithfully, yet not so much that the flanking regions cannot be reliably identified. In this work we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring two independent first copies of an initial template, we reach error rates below one in a million. We discuss potential clinical applications of this method.


2021 ◽  
Vol 11 (23) ◽  
pp. 11123
Author(s):  
Siqi Cheng ◽  
Ruonan Li ◽  
Lili Lin ◽  
Haojie Shi ◽  
Xunyan Liu ◽  
...  

Polygalacturonase-inhibiting protein (PGIP) is an important plant biochemical anti-disease factor. PGIP has a leucine-rich repeat structure that can selectively bind and inhibit the activity of endo-polygalacturonase (endo-PG) in fungi, playing a key role in plant disease resistance. The regulation of PGIP in plant disease resistance has been well studied, and the effect of PGIP to increase disease resistance is clear. This review summarizes recent advances in understanding the PGIP protein structure, the PGIP mechanism of plant disease resistance, and anti-disease activity by PGIP gene transfer. This overview should contribute to a better understanding of PGIP function and can help guide resistance breeding of PGIP for anti-disease effects.


2021 ◽  
Vol 55 (1) ◽  
pp. 583-602
Author(s):  
Karen H. Miga ◽  
Ivan A. Alexandrov

We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009449
Author(s):  
Shahab Sarmashghi ◽  
Metin Balaban ◽  
Eleonora Rachtman ◽  
Behrouz Touri ◽  
Siavash Mirarab ◽  
...  

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had < 1.5% error in length estimation compared to 34% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.


2021 ◽  
Vol 13 (10) ◽  
Author(s):  
Joseph L M Charboneau ◽  
Richard C Cronn ◽  
Aaron Liston ◽  
Martin F Wojciechowski ◽  
Michael J Sanderson

Abstract The plastid genomes of photosynthetic green plants have largely maintained conserved gene content and order as well as structure over hundreds of millions of years of evolution. Several plant lineages, however, have departed from this conservation and contain many plastome structural rearrangements, which have been associated with an abundance of repeated sequences both overall and near rearrangement endpoints. We sequenced the plastomes of 25 taxa of Astragalus L. (Fabaceae), a large genus in the inverted repeat-lacking clade of legumes, to gain a greater understanding of the connection between repeats and plastome inversions. We found plastome repeat structure has a strong phylogenetic signal among these closely related taxa mostly in the New World clade of Astragalus called Neo-Astragalus. Taxa without inversions also do not differ substantially in their overall repeat structure from four taxa each with one large-scale inversion. For two taxa with inversion endpoints between the same pairs of genes, differences in their exact endpoints indicate the inversions occurred independently. Our proposed mechanism for inversion formation suggests the short inverted repeats now found near the endpoints of the four inversions may be there as a result of these inversions rather than their cause. The longer inverted repeats now near endpoints may have allowed the inversions first mediated by shorter microhomologous sequences to propagate, something that should be considered in explaining how any plastome rearrangement becomes fixed regardless of the mechanism of initial formation.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tsung-Yu Lu ◽  
Katherine M. Munson ◽  
Alexandra P. Lewis ◽  
Qihui Zhu ◽  
Luke J. Tallon ◽  
...  

AbstractVariable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.


2021 ◽  
Author(s):  
Giuliana Giannuzzi ◽  
Glennis A. Logsdon ◽  
Nicolas Chatron ◽  
Danny E. Miller ◽  
Julie Reversat ◽  
...  

AbstractHuman centromeres are composed of alpha satellite DNA hierarchically organized as higher-order repeats and epigenetically specified by CENP-A binding. Current evolutionary models assert that new centromeres are first epigenetically established and subsequently acquire an alphoid array. We identified during routine prenatal aneuploidy diagnosis by FISH a de novo insertion of alpha satellite DNA array (~50-300 kbp) from the centromere of chromosome 18 (D18Z1) into chromosome 15q26 euchromatin. Although bound by CENP-B, this locus did not acquire centromeric functionality as demonstrated by lack of constriction and absence of CENP-A binding. We characterized the rearrangement by FISH and sequencing using Illumina, PacBio, and Nanopore adaptive sampling which revealed that the insertion was associated with a 2.8 kbp deletion and likely occurred in the paternal germline. Notably, the site was located ~10 Mbp distal from the location where a centromere was ancestrally seeded and then became inactive sometime between 20 and 25 million years ago (Mya), in the common ancestor of humans and apes. Long reads spanning either junction showed that the organization of the alphoid insertion followed the 12-mer higher-order repeat structure of the D18Z1 array. Mapping to the CHM13 human genome assembly revealed that the satellite segment transposed from a specific location of chromosome 18 centromere. The rearrangement did not directly disrupt any gene or predicted regulatory element and did not alter the epigenetic status of the surrounding region, consistent with the absence of phenotypic consequences in the carrier. This case demonstrates a likely rare but new class of structural variation that we name ‘alpha satellite insertion’. It also expands our knowledge about the evolutionary life cycle of centromeres, conveying the possibility that alphoid arrays can relocate near vestigial centromeric sites.


2021 ◽  
Author(s):  
Shahab Sarmashghi ◽  
Metin Balaban ◽  
Eleonora Rachtman ◽  
Behrouz Touri ◽  
Siavash Mirarab ◽  
...  

AbstractThe cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had < 1.5% error in length estimation compared to 34% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://github.com/shahab-sarmashghi/RESPECT.git


Author(s):  
William Rice

Centromeres are among the fastest evolving genomic regions in a diverse array of organisms. The evolutionary process driving this rapid evolution has not been unambiguously established. Here I integrate diverse information to motivate a model in which centromeres evolve rapidly because of their intrinsic molecular phenotype: they tightly bind centromeric proteins throughout the cell cycle. DNA-bound proteins have been shown to cause stalling and collapse of DNA replication forks in many genomic regions, including centromeres. Collapsed replication forks generate one-sided double strand breaks (DSBs) that are repaired by the Break-Induced Repair (BIR) pathway. Here I show why this repair is expected to generate tandem repeat structure and three key features at centromeres: i) increased nucleotide substitution mutation rates, ii) out-of- register re-initiation of replication that leads to indels spanning one or more repeat units, and iii) elevated rates of large and small transpositions within centromeres and between genomic regions. These phenotypes lead to: i) a rapid rate of nucleotide substitutions within a clade of centromeric sequences, ii) continual turnover of monomers within centromeres that fosters molecular-drift and molecular-drive, and iii) recurrent quantum leaps in centromere sequence due to the formation of mosaic monomers and new sequences transposed into non-homologous centromeres. These features are plausibly the major reason centromeres evolve so rapidly. I also speculate on how the DNA sequence of centromeres might perpetually coevolve with the protein sequence of histone CENH3 &ndash;the major epigenetic mark of centromeres.


Toxins ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 579
Author(s):  
Nur Shidaa Mohd Ali ◽  
Abu Bakar Salleh ◽  
Thean Chor Leow ◽  
Raja Noor Zaliha Raja Abd Rahman ◽  
Mohd Shukuri Mohamad Ali

Calcium-binding plays a decisive role in the folding and stabilization of many RTX proteins, especially for the RTX domain. Although many studies have been conducted to prove the contribution of Ca2+ ion toward the folding and stabilization of RTX proteins, its functional dynamics and conformational structural changes remain elusive. Here, molecular docking and molecular dynamics (MD) simulations were performed to analyze the contribution of Ca2+ ion toward the folding and stabilization of the RTX lipase (AMS8 lipase) structure. AMS8 lipase contains six Ca2+ ions (Ca1–Ca6). Three Ca2+ ions (Ca3, Ca4, and Ca5) were bound to the RTX parallel β-roll motif repeat structure (RTX domain). The metal ion (Ca2+) docking analysis gives a high binding energy, especially for Ca4 and Ca5 which are tightly bound to the RTX domain. The function of each Ca2+ ion is further analyzed using the MD simulation. The removal of Ca3, Ca4, and Ca5 caused the AMS8 lipase structure to become unstable and unfolded. The results suggested that Ca3, Ca4, and Ca5 stabilized the RTX domain. In conclusion, Ca3, Ca4, and Ca5 play a crucial role in the folding and stabilization of the RTX domain, which sustain the integrity of the overall AMS8 lipase structure.


Sign in / Sign up

Export Citation Format

Share Document