scholarly journals Genetic Complexity of the Human Genome in Health and Disease: Basic Concepts

2020 ◽  
Vol 23 (2) ◽  
pp. 113-120
Author(s):  
A. Athanassiadou

Determination of the DNA sequence of the human genome, revealing extensive genetic variation, and the mapping of the genes and the various regulatory elements of genome function within the genomic DNA, has revolutionized the way we view the states of health and disease in our time. Genetic complexity of the genome is manifested on different levels. The first level refers to the expression of protein coding genes, as regulated by their individual promoter in linear proximity. The next level of genetic complexity involves long distance action by far away enhancers, interacting with promoters through DNA looping. This 3- dimensional (3D) regulation is further developing by chromosome folding into the so called transcription factories, for fully physiological expression. Chromosome folding, mediated by specific genetic elements - insulators - is adding to the genetic complexity by facilitating movements of chromatin of specific genomic regions - the so-called topologically associated domains (TAD) in support of transcription and other cellular functions. Further genetic complexity has emerged with the finding that over 75% of the genome is transcribed and except of the coding genes, a plethora of RNA transcripts are produced - the non-coding RNA - that has important regulatory roles in the gene expression context. The great variation of genome sequence and regulatory elements of the genome architecture are exploited in studies of genome-wide association with disease, in the framework of Precision Medicine and in general of Genomic Medicine.

Science ◽  
2021 ◽  
Vol 371 (6531) ◽  
pp. eabc6405 ◽  
Author(s):  
Rachel L. Cosby ◽  
Julius Judd ◽  
Ruiling Zhang ◽  
Alan Zhong ◽  
Nathaniel Garry ◽  
...  

Genes with novel cellular functions may evolve through exon shuffling, which can assemble novel protein architectures. Here, we show that DNA transposons provide a recurrent supply of materials to assemble protein-coding genes through exon shuffling. We find that transposase domains have been captured—primarily via alternative splicing—to form fusion proteins at least 94 times independently over the course of ~350 million years of tetrapod evolution. We find an excess of transposase DNA binding domains fused to host regulatory domains, especially the Krüppel-associated box (KRAB) domain, and identify four independently evolved KRAB-transposase fusion proteins repressing gene expression in a sequence-specific fashion. The bat-specific KRABINER fusion protein binds its cognate transposons genome-wide and controls a network of genes and cis-regulatory elements. These results illustrate how a transcription factor and its binding sites can emerge.


2021 ◽  
Author(s):  
Naoto Kubota ◽  
Mikita Suyama

AbstractGenome-wide association studies (GWAS) have been performed to identify thousands of variants in the human genome as disease risk markers, but functional variants that actually affect gene regulation and their genomic features remain largely unknown. Here we performed a comprehensive survey of functional variants in the regulatory elements of the human genome. We integrated hematopoietic transcription factor (TF) footprints datasets generated by ENCODE project with multiple quantitative trait locus (QTL) datasets (eQTL, caQTL, bQTL, and hQTL) and investigated the associations of functional variants and immune system disease risk. We identified candidate regulatory variants highly linked with GWAS lead variants and found that they were strongly enriched in active enhancers in hematopoietic cells, emphasizing the clinical relevance of enhancers in disease risk. Moreover, we found some strong relationships between traits and hematopoietic cell types or TFs. We highlighted some credible regulatory variants and found that a variant, rs2291668, which potentially functions in the molecular pathogenesis of multiple sclerosis, is located within a TF footprint present in a protein-coding exon of the TNFSF14 gene, indicating that protein-coding exons as well as noncoding regions can possess clinically relevant regulatory elements. Collectively, our results shed light on the molecular pathogenesis of immune system diseases. The methods described in this study can readily be applied to the study of the risk factors of other diseases.


2018 ◽  
Author(s):  
Alex Wells ◽  
David Heckerman ◽  
Ali Torkamani ◽  
Li Yin ◽  
Bing Ren ◽  
...  

The identification of essential regulatory elements is central to the understanding of the consequences of genetic variation. Here we use novel genomic data and machine learning techniques to map essential regulatory elements and to guide functional validation. We train an XGBoost model using 38 functional and structural features, including genome essentiality metrics, 3D genome organization and enhancer reporter STARR-seq data to differentiate between pathogenic and control non-coding genetic variants. We validate the accuracy of prediction by using data from tiling-deletion-based and CRISPR interference screens of activity of cis-regulatory elements. In neurodevelopmental disorders, the model (ncER, non-coding Essential Regulation) maps essential genomic segments within deletions and rearranged topologically associated domains linked to human disease. We show that the approach successfully identifies essential regulatory elements in the human genome.


2019 ◽  
Vol 41 (3) ◽  
pp. 46-48
Author(s):  
Jon M. Laurent ◽  
Sudarshan Pinglay ◽  
Leslie Mitchell ◽  
Ran Brosh

Less than 2% of our genome is protein-coding DNA. The vast expanses of non-coding DNA make up the genome's “dark matter”, where introns, repetitive and regulatory elements reside. Variation between individuals in non-coding regulatory DNA is emerging as a major factor in the genetics of numerous diseases and traits, yet very little is known about how such variations contribute to disease risk. Studying the genetics of regulatory variation is technically challenging as regulatory elements can affect genes located tens of thousands of base pairs away, and often, multiple distal regulatory variations, each with a very small effect, combine in an unknown way to significantly modulate the expression of genes. At the Center for Synthetic Regulatory Genomics (SyRGe) we directly tackle these problems in order to systematically elucidate the mechanisms of regulatory variation underlying human disease.


2017 ◽  
Author(s):  
Ankit Gupta ◽  
Alexander M. Rush

AbstractWe consider the task of detecting regulatory elements in the human genome directly from raw DNA. Past work has focused on small snippets of DNA, making it difficult to model long-distance dependencies that arise from DNA’s 3-dimensional conformation. In order to study long-distance dependencies, we develop and release a novel dataset for a larger-context modeling task. Using this new data set we model long-distance interactions using dilated convolutional neural networks, and compare them to standard convolutions and recurrent neural networks. We show that dilated convolutions are effective at modeling the locations of regulatory markers in the human genome, such as transcription factor binding sites, histone modifications, and DNAse hypersensitivity sites.


2019 ◽  
Vol 98 (9) ◽  
pp. 949-955 ◽  
Author(s):  
K. Divaris

Understanding the “code of life” and mapping the human genome have been monumental and era-defining scientific landmarks—analogous to setting foot on the moon. The last century has been characterized by exponential advances in our understanding of the biological and specifically molecular basis of health and disease. The early part of the 20th century was marked by fundamental theoretical and scientific advances in understanding heredity, the identification of the DNA molecule and genes, and the elucidation of the central dogma of biology. The second half was characterized by experimental and increasingly molecular investigations, including clinical and population applications. The completion of the Human Genome Project in 2003 and the continuous technological advances have democratized access to this information and the ability to generate health and disease association data; however, the realization of genomic and precision medicine, to practically improve people’s health, has lagged. The oral health domain has made great strides and substantially benefited from the last century of advances in genetics and genomics. Observations regarding a hereditary component of dental caries were reported as early as the 1920s. Subsequent breakthroughs were made in the discovery of genetic causes of rare diseases, such as ectodermal dysplasias, orofacial clefts, and other craniofacial and dental anomalies. More recently, genome-wide investigations have been conducted and reported for several diseases and traits, including periodontal disease, dental caries, tooth agenesis, cancers of the head and neck, orofacial pain, temporomandibular disorders, and craniofacial morphometrics. Gene therapies and gene editing with CRISPR/Cas represent the latest frontier surpassed in the era of genomic medicine. Amid rapid genomics progress, several challenges and opportunities lie ahead. Importantly, systematic efforts supported by implementation science are needed to realize the full potential of genomics, including the improvement of public and practitioner genomics literacy, the promotion of individual and population oral health, and the reduction of disparities.


2020 ◽  
Vol 13 (663) ◽  
pp. eabd8379
Author(s):  
Heba Ali ◽  
Lena Marth ◽  
Dilja Krueger-Burg

Postsynaptic organizational protein complexes play central roles both in orchestrating synapse formation and in defining the functional properties of synaptic transmission that together shape the flow of information through neuronal networks. A key component of these organizational protein complexes is the family of synaptic adhesion proteins called neuroligins. Neuroligins form transsynaptic bridges with presynaptic neurexins to regulate various aspects of excitatory and inhibitory synaptic transmission. Neuroligin-2 (NLGN2) is the only member that acts exclusively at GABAergic inhibitory synapses. Altered expression and mutations in NLGN2 and several of its interacting partners are linked to cognitive and psychiatric disorders, including schizophrenia, autism, and anxiety. Research on NLGN2 has fundamentally shaped our understanding of the molecular architecture of inhibitory synapses. Here, we discuss the current knowledge on the molecular and cellular functions of mammalian NLGN2 and its role in the neuronal circuitry that regulates behavior in rodents and humans.


2008 ◽  
Vol 33 (2) ◽  
pp. 139-147 ◽  
Author(s):  
Chunxiang Zhang

Genomic evidence reveals that gene expression in humans is precisely controlled in cellular, tissue-type, temporal, and condition-specific manners. Completely understanding the regulatory mechanisms of gene expression is therefore one of the most important issues in genomic medicine. Surprisingly, recent analyses of the human and animal genomes have demonstrated that the majority of RNA transcripts are relatively small, noncoding RNAs (sncRNAs), rather than large, protein coding message RNAs (mRNAs). Moreover, these sncRNAs may represent a novel important layer of regulation for gene expression. The most important breakthrough in this new area is the discovery of microRNAs (miRNAs). miRNAs comprise a novel class of endogenous, small, noncoding RNAs that negatively regulate gene expression via degradation or translational inhibition of their target mRNAs. As a group, miRNAs may directly regulate ∼30% of the genes in the human genome. In keeping with the nomenclature of RNomics, which is to study sncRNAs on the genomic scale, “microRNomics” is coined here to describe a novel subdiscipline of genomics that studies the identification, expression, biogenesis, structure, regulation of expression, targets, and biological functions of miRNAs on the genomic scale. A growing body of exciting evidence suggests that miRNAs are important regulators of cell differentiation, proliferation/growth, mobility, and apoptosis. These miRNAs therefore play important roles in development and physiology. Consequently, dysregulation of miRNA function may lead to human diseases such as cancer, cardiovascular disease, liver disease, immune dysfunction, and metabolic disorders. microRNomics may be a newly emerging approach for human disease biology.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Geneviève Bart ◽  
Daniel Fischer ◽  
Anatoliy Samoylenko ◽  
Artem Zhyvolozhnyi ◽  
Pavlo Stehantsev ◽  
...  

Abstract Background The human sweat is a mixture of secretions from three types of glands: eccrine, apocrine, and sebaceous. Eccrine glands open directly on the skin surface and produce high amounts of water-based fluid in response to heat, emotion, and physical activity, whereas the other glands produce oily fluids and waxy sebum. While most body fluids have been shown to contain nucleic acids, both as ribonucleoprotein complexes and associated with extracellular vesicles (EVs), these have not been investigated in sweat. In this study we aimed to explore and characterize the nucleic acids associated with sweat particles. Results We used next generation sequencing (NGS) to characterize DNA and RNA in pooled and individual samples of EV-enriched sweat collected from volunteers performing rigorous exercise. In all sequenced samples, we identified DNA originating from all human chromosomes, but only the mitochondrial chromosome was highly represented with 100% coverage. Most of the DNA mapped to unannotated regions of the human genome with some regions highly represented in all samples. Approximately 5 % of the reads were found to map to other genomes: including bacteria (83%), archaea (3%), and virus (13%), identified bacteria species were consistent with those commonly colonizing the human upper body and arm skin. Small RNA-seq from EV-enriched pooled sweat RNA resulted in 74% of the trimmed reads mapped to the human genome, with 29% corresponding to unannotated regions. Over 70% of the RNA reads mapping to an annotated region were tRNA, while misc. RNA (18,5%), protein coding RNA (5%) and miRNA (1,85%) were much less represented. RNA-seq from individually processed EV-enriched sweat collection generally resulted in fewer percentage of reads mapping to the human genome (7–45%), with 50–60% of those reads mapping to unannotated region of the genome and 30–55% being tRNAs, and lower percentage of reads being rRNA, LincRNA, misc. RNA, and protein coding RNA. Conclusions Our data demonstrates that sweat, as all other body fluids, contains a wealth of nucleic acids, including DNA and RNA of human and microbial origin, opening a possibility to investigate sweat as a source for biomarkers for specific health parameters.


Sign in / Sign up

Export Citation Format

Share Document