scholarly journals No evidence of paralogous loci or new bona fide microRNAs in telomere to telomere (T2T) genomic data

2021 ◽  
Author(s):  
Arun H. Patil ◽  
Marc K. Halushka ◽  
Bastian K. Fromm

The telomere to telomere (T2T) genome project discovered and mapped ~240 million additional base pairs of primarily telomeric and centromeric reads. Much of this sequence was comprised of satellite sequences and large segmental duplications. We evaluated the extent to which human bona fide microRNAs (miRNAs) may be found in additional paralogous genomic loci or if previously undescribed microRNAs are present in these newly sequenced regions of the human genome. New genomic regions of the T2T project spanning ~240 million bp of sequence were obtained and evaluated by blastn for the human miRNAs contained in MirGeneDB2.0 (N=556) and miRBase (N = 1917) along with all species of MirGeneDB2.0 miRNAs (N=10,899). Additionally, bowtie was used to compare unmapped reads from >4,000 primary cell samples to the new T2T sequence. Based on sequence and structure, no bona fide miRNAs were identified. Ninety-seven miRNAs of questionable authenticity (frequently known repeat elements) were identified from the miRBase dataset across the newly described regions of the human genome. These 97 represent only 51 miRNA families due to paralogy of highly similar miRNAs such as 24 members of the hsa-mir-548 family. Altogether, this data strongly supports our having identified widely expressed bona fide miRNAs in the human genome and move us further toward the completion of human miRNA discovery.

ESC CardioMed ◽  
2018 ◽  
pp. 669-671
Author(s):  
Eric Schulze-Bahr

The human genome consists of approximately 3 billion (3 × 109) base pairs of DNA (around 20,000 genes), organized as 23 chromosomes (diploid parental set), and a small mitochondrial genome (37 genes, including 13 proteins; 16,589 base pairs) of maternal origin. Most human genetic variation is natural, that is, common or rare (minor allele frequency >0.1%) and does not cause disease—apart from every true disease-causing (bona fide) mutation each individual genome harbours more than 3.5 million single nucleotide variants (including >10,000 non-synonymous changes causing amino acid substitutions) and 200–300 large structural or copy number variants (insertions/deletions, up to several thousands of base-pairs) that are non-disease-causing variations and scattered throughout coding and non-coding genomic regions.


2018 ◽  
Vol 47 (4) ◽  
pp. e21-e21 ◽  
Author(s):  
Marius Gheorghe ◽  
Geir Kjetil Sandve ◽  
Aziz Khan ◽  
Jeanne Chèneby ◽  
Benoit Ballester ◽  
...  

Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF–DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF–DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF–DNA interactions. Our work culminated with predicted interactions covering >2% of the human genome, obtained by uniformly processing 1983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF–DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. We provide this collection of direct TF–DNA interactions and cis-regulatory modules through the UniBind web-interface (http://unibind.uio.no).


Author(s):  
Jonathon Keats

Sequencing the 3.2 billion base pairs of the human genome took thirteen years and cost $3 billion. Yet even before the genome was released in 2003, scientists were beginning to question whether an index of our DNA, however exhaustive, would genetically encapsulate Homo sapiens . One of the most articulate skeptics was Joshua Lederberg, a Nobel Prize–winning biologist who pioneered the field of bacterial genetics in the 1950s, while Watson and Crick were discovering the double helix. Lederberg publicly expressed his reservations in the year 2000, amid rampant hype about the Human Genome Project including a Clinton White House press conference. “Just as scientists study entire ecological systems to see how the various parts interact,” he wrote in a syndicated editorial, “we must regard the human body as an extended genome. Its parts consist of the nuclear DNA genome (karyome), a chondriome (mitochondria), and what I call the microbiome: the menagerie of the body’s attendant microbes. We must study the microbes that we carry within us and on our surfaces as part of a shared embodiment.” Lederberg coined the term microbiome to echo the older word genome and to establish a sort of equivalence. If anything, his claim on behalf of single-cell organisms was understated. The average body is host to an estimated 100 trillion microbes, ten times the number of human cells. These microorganisms span some twenty-two phyla, with more than six hundred species living in the mouth and 150 species on each palm. Taken together that adds up to vast genetic diversity: human DNA contains approximately 23,000 genes. The microbiome may contain as many as 23 million. Were these microbes mere fellow travelers the numbers would make no difference. (Because of their minuscule size microorganisms comprise less than 2 percent of our body mass.) What makes them significant from a human perspective—and makes their genetic legacy inseparable from ours—is that they perform essential functions in concert with human cells, facilitating digestion, for instance, and also aiding the immune system.


2001 ◽  
Vol 11 (6) ◽  
pp. 1005-1017 ◽  
Author(s):  
Jeffrey A. Bailey ◽  
Amy M. Yavor ◽  
Hillary F. Massa ◽  
Barbara J. Trask ◽  
Evan E. Eichler

Segmental duplications play fundamental roles in both genomic disease and gene evolution. To understand their organization within the human genome, we have developed the computational tools and methods necessary to detect identity between long stretches of genomic sequence despite the presence of high copy repeats and large insertion-deletions. Here we present our analysis of the most recent genome assembly (January 2001) in which we focus on the global organization of these segments and the role they play in the whole-genome assembly process. Initially, we considered only large recent duplication events that fell well-below levels of draft sequencing error (alignments 90%–98% similar and ≥1 kb in length). Duplications (90%–98%; ≥1 kb) comprise 3.6% of all human sequence. These duplications show clustering and up to 10-fold enrichment within pericentromeric and subtelomeric regions. In terms of assembly, duplicated sequences were found to be over-represented in unordered and unassigned contigs indicating that duplicated sequences are difficult to assign to their proper position. To assess coverage of these regions within the genome, we selected BACs containing interchromosomal duplications and characterized their duplication pattern by FISH. Only 47% (106/224) of chromosomes positive by FISH had a corresponding chromosomal position by BLAST comparison. We present data that indicate that this is attributable to misassembly, misassignment, and/or decreased sequencing coverage within duplicated regions. Surprisingly, if we consider putative duplications >98% identity, we identify 10.6% (286 Mb) of the current assembly as paralogous. The majority of these alignments, we believe, represent unmerged overlaps within unique regions. Taken together the above data indicate that segmental duplications represent a significant impediment to accurate human genome assembly, requiring the development of specialized techniques to finish these exceptional regions of the genome. The identification and characterization of these highly duplicated regions represents an important step in the complete sequencing of a human reference genome.


2018 ◽  
Vol 12 ◽  
pp. 117793221881610 ◽  
Author(s):  
Oluwadurotimi S Aworunse ◽  
Oluwatomiwa Adeniji ◽  
Olusola L Oyesola ◽  
Itunuoluwa Isewon ◽  
Jelili Oyelade ◽  
...  

Lately, the term “genomics” has become ubiquitous in many scientific articles. It is a rapidly growing aspect of the biomedical sciences that studies the genome. The human genome contains a torrent of information that gives clues about human origin, evolution, biological function, and diseases. In a bid to demystify the workings of the genome, the Human Genome Project (HGP) was initiated in 1990, with the chief goal of sequencing the approximately 3 billion nucleotide base pairs of the human DNA. Since its completion in 2003, the HGP has opened new avenues for the application of genomics in clinical practice. This review attempts to overview some milestone discoveries that paved way for the initiation of the HGP, remarkable revelations from the HGP, and how genomics is influencing a paradigm shift in routine clinical practice. It further highlights the challenges facing the implementation of genomic medicine, particularly in Africa. Possible solutions are also discussed.


2014 ◽  
Vol 42 (S1) ◽  
pp. 9-21 ◽  
Author(s):  
Gail H. Javitt ◽  
Katherine Strong Carner

Since the first draft of the human genome was published in 2001, DNA sequencing technology has advanced at a remarkable pace. Launched in 1990, the Human Genome Project sought to sequence all three billion base pairs of the haploid human genome, an endeavor that took more than a decade and cost nearly three billion dollars. The subsequent development of so-called “next generation” sequencing (NGS) methods has raised the possibility that real-time, affordable genome sequencing will soon be widely available. Currently, NGS methods can be used to sequence up to 60 billion base pairs per day. Whole-genome sequencing costs an estimated $5,000-10,000, with that number predicted to fall to $1000 in the near future.In the past few years, the availability of high-throughput NGS methods has led to a proliferation of potential and actual clinical applications for NGS. NGS therefore has the potential to usher in the long-awaited era of personalized medicine.


2018 ◽  
Author(s):  
Marius Gheorghe ◽  
Geir Kjetil Sandve ◽  
Aziz Khan ◽  
Jeanne Chèneby ◽  
Benoit Ballester ◽  
...  

ABSTRACTChromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1,983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. Finally, we provide this collection of direct TF-DNA interactions and cis-regulatory modules in the human genome through the UniBind web-interface (http://unibind.uio.no).


Sign in / Sign up

Export Citation Format

Share Document