genomic interval
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 16)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Feyza Yilmaz ◽  
Umamaheswaran Gurusamy ◽  
Trenell Mosley ◽  
Yulia Mostovoy ◽  
Tamim H. Shaikh ◽  
...  

Chromosomal rearrangements that alter the copy number of dosage-sensitive genes can result in genomic disorders, such as the 3q29 deletion syndrome. At the 3q29 region, non-allelic homologous recombination (NAHR) between paralogous copies of segmental duplications (SDs) leads to a recurrent ~1.6 Mbp deletion or duplication, causing neurodevelopmental and psychiatric phenotypes. However, risk factors contributing to NAHR at this locus are not well understood. In this study, we used an optical mapping approach to identify structural variations within the 3q29 interval. We identified 18 novel haplotypes among 161 unaffected individuals and used this information to characterize this region in 18 probands with either the 3q29 deletion or 3q29 duplication syndrome. A significant amount of variation in haplotype prevalence was observed between populations. Within probands, we narrowed down the breakpoints to a ~5 kbp segment within the SD blocks in 89% of the 3q29 deletion and duplication cases studied. Furthermore, all 3q29 deletion and duplication cases could be categorized into one of five distinct classes based on their breakpoints. Contrary to previous findings for other recurrent deletion and duplication loci, there was no evidence for inversions in either parent of the probands mediating the deletion or duplication seen in this syndrome.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Aaron Gu ◽  
Hyun Jae Cho ◽  
Nathan C. Sheffield

AbstractFunctional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.


F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 581
Author(s):  
Selena C. Feng ◽  
Nathan C. Sheffield ◽  
Jianglin Feng

Searching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce Seqpare, a self-consistent and effective metric of similarity and tool for comparing sequences based on their interval sets. With this metric, the similarity of two interval sets is quantified by a single index, the ratio of their effective overlap over the union: an index of zero indicates unrelated interval sets, and an index of one means that the interval sets are identical. Analysis and tests confirm the effectiveness and self-consistency of the Seqpare metric.


Author(s):  
Jianglin Feng ◽  
Nathan C Sheffield

Abstract Summary Databases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions. Availability https://github.com/databio/IGD


2020 ◽  
Author(s):  
Aaron Gu ◽  
Hyun Jae Cho ◽  
Nathan C. Sheffield

Results of functional genomics experiments such as ChIP-Seq or ATAC-Seq produce data summarized as a region set. Many tools have been developed to analyze region sets, including computing similarity metrics to compare them. However, there is no way to objectively evaluate the effectiveness of region set similarity metrics. In this paper we present bedshift, a command-line tool and Python API to generate new BED files by making random perturbations to an original BED file. Perturbed files have known similarity to the original file and are therefore useful to benchmark similarity metrics. To demonstrate, we used bedshift to create an evaluation dataset of 3,600 perturbed files generated by shifting, adding, and dropping regions from a reference BED file. Then, we compared four similarity metrics: Jaccard score, coverage score, Euclidean distance, and cosine similarity. The results show that the Jaccard score is most sensitive to detecting adding and dropping regions, while the coverage score is more sensitive to shifted regions.AvailabilityBSD2-licensed source code and documentation can be found at https://bedshift.databio.org.


2020 ◽  
Vol 117 (42) ◽  
pp. 26307-26317
Author(s):  
Asier Ullate-Agote ◽  
Ingrid Burgelin ◽  
Adrien Debry ◽  
Carine Langrez ◽  
Florent Montange ◽  
...  

Reptiles exhibit a spectacular diversity of skin colors and patterns brought about by the interactions among three chromatophore types: black melanophores with melanin-packed melanosomes, red and yellow xanthophores with pteridine- and/or carotenoid-containing vesicles, and iridophores filled with light-reflecting platelets generating structural colors. Whereas the melanosome, the only color-producing endosome in mammals and birds, has been documented as a lysosome-related organelle, the maturation paths of xanthosomes and iridosomes are unknown. Here, we first use 10x Genomics linked-reads and optical mapping to assemble and annotate a nearly chromosome-quality genome of the corn snakePantherophis guttatus. The assembly is 1.71 Gb long, with an N50 of 16.8 Mb and L50 of 24. Second, we perform mapping-by-sequencing analyses and identify a 3.9-Mb genomic interval where thelavendervariant resides. The lavender color morph in corn snakes is characterized by gray, rather than red, blotches on a pink, instead of orange, background. Third, our sequencing analyses reveal a single nucleotide polymorphism introducing a premature stop codon in the lysosomal trafficking regulator gene (LYST) that shortens the corresponding protein by 603 amino acids and removes evolutionary-conserved domains. Fourth, we use light and transmission electron microscopy comparative analyses of wild type versus lavender corn snakes and show that the color-producing endosomes of all chromatophores are substantially affected in theLYSTmutant. Our work provides evidence characterizing xanthosomes in xanthophores and iridosomes in iridophores as lysosome-related organelles.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 581 ◽  
Author(s):  
Selena C. Feng ◽  
Nathan C. Sheffield ◽  
Jianglin Feng

Searching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce Seqpare, a self-consistent and effective metric of similarity and tool for comparing sequences based on their interval sets. With this metric, the similarity of two interval sets is quantified by a single index, the ratio of their effective overlap over the union: an index of zero indicates unrelated interval sets, and an index of one means that the interval sets are identical. Analysis and tests confirm the effectiveness and self-consistency of the Seqpare metric.


Author(s):  
Jianglin Feng ◽  
Nathan C. Sheffield

SummaryDatabases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions.Availabilityhttps://github.com/databio/IGD


2020 ◽  
Vol 10 (7) ◽  
pp. 2457-2464
Author(s):  
Mingmin Zheng ◽  
Tian Yang ◽  
Xiaowei Liu ◽  
Guihua Lü ◽  
Peng Zhang ◽  
...  

C-type cytoplasmic male sterility (CMS-C), one of the three major CMS types in maize, has a promising application prospect in hybrid seed production. However, the complex genetic mechanism underlying the fertility restoration of CMS-C remains poorly understood. The maize inbred line A619 is one of the rare strong restorer lines carrying the restorer gene Rf4, but different fertility segregation ratios are found in several F2 populations derived from crosses between isocytoplasmic allonucleus CMS-C lines and A619. In the present study, the segregation ratios of fertile to sterile plants in the (CHuangzaosi × A619) F2 and BC1F1 populations (36.77:1 and 2.36:1, respectively) did not follow a typical monogenic model of inheritance, which suggested that some F2 and BC1F1 plants displayed restored fertility even without Rf4. To determine the hidden locus affecting fertility restoration, next-generation sequencing-based QTL-seq was performed with two specific extreme bulks consisting of 30 fertile and 30 sterile rf4rf4 individuals from the F2 population. A major QTL related to fertility restoration, designated qRf8-1, was detected on the long arm of chromosome 8 in A619. Subsequently, qRf8-1 was further validated and narrowed down to a 17.93-Mb genomic interval by insertion and deletion (InDel) and simple sequence repeat (SSR) marker-based traditional QTL mapping, explaining 12.59% (LOD = 25.06) of the phenotypic variation. Thus, using genetic analyses and molecular markers, we revealed another fertility restoration system acting in parallel with Rf4 in A619 that could rescue the male sterility of CHuangzaosi. This study not only expands the original fertility restoration system but also provides valuable insights into the complex genetic mechanisms underlying the fertility restoration of CMS-C.


Author(s):  
Selena C. Feng ◽  
Nathan C. Sheffield ◽  
Jianglin Feng

ABSTRACTSummarySearching genomic interval sets produced by sequencing methods has been widely and routinely performed; however, existing metrics for quantifying similarities among interval sets are inconsistent. Here we introduce Seqpare, a self-consistent and effective metric of similarity and tool for comparing sequences based on their interval sets. With this metric, the similarity of two interval sets is quantified by a single index, the ratio of their effective overlap over the union: an index of zero indicates unrelated interval sets, and an index of one means that the interval sets are identical. Analysis and tests confirm the effectiveness and self-consistency of the Seqpare metric.Availabilityhttps://github.com/deepstanding/[email protected]


Sign in / Sign up

Export Citation Format

Share Document