The Human Genome—Structure and Organization

2014 ◽  
pp. 13-26
Author(s):  
Andrew P. Read
2010 ◽  
Vol 107 (24) ◽  
pp. 10848-10853 ◽  
Author(s):  
B. Teague ◽  
M. S. Waterman ◽  
S. Goldstein ◽  
K. Potamousis ◽  
S. Zhou ◽  
...  

2019 ◽  
Author(s):  
Jonathan A. Shortt ◽  
Robert P. Ruggiero ◽  
Corey Cox ◽  
Aaron C. Wacholder ◽  
David D. Pollock

AbstractBackgroundPreviously, 3% of the human genome has been annotated as simple sequence repeats (SSRs), similar to the proportion annotated as protein coding. The origin of much of the genome is not well annotated, however, and some of the unidentified regions are likely to be ancient SSR-derived regions not identified by current methods. The identification of these regions is complicated because SSRs appear to evolve through complex cycles of expansion and contraction, often interrupted by mutations that alter both the repeated motif and mutation rate. We applied an empirical, kmer-based, approach to identify genome regions that are likely derived from SSRs.ResultsThe sequences flanking annotated SSRs are enriched for similar sequences and for SSRs with similar motifs, suggesting that the evolutionary remains of SSR activity abound in regions near obvious SSRs. Using our previously described P-clouds approach, we identified ‘SSR-clouds’, groups of similar kmers (or ‘oligos’) that are enriched near a training set of unbroken SSR loci, and then used the SSR-clouds to detect likely SSR-derived regions throughout the genome.ConclusionsOur analysis indicates that the amount of likely SSR-derived sequence in the human genome is 6.77%, over twice as much as previous estimates, including millions of newly identified ancient SSR-derived loci. SSR-clouds identified poly-A sequences adjacent to transposable element termini in over 74% of the oldest class ofAlu(roughly,AluJ), validating the sensitivity of the approach. Poly-A’s annotated by SSR-clouds also had a length distribution that was more consistent with their poly-A origins, with mean about 35 bp even in olderAlus. This work demonstrate that the high sensitivity provided by SSR-Clouds improves the detection of SSR-derived regions and will enable deeper analysis of how decaying repeats contribute to genome structure.


2020 ◽  
Vol 48 (W1) ◽  
pp. W170-W176
Author(s):  
Michal Wlasnowolski ◽  
Michal Sadowski ◽  
Tymon Czarnota ◽  
Karolina Jodkowska ◽  
Przemyslaw Szalaj ◽  
...  

Abstract Structural variants (SVs) that alter DNA sequence emerge as a driving force involved in the reorganisation of DNA spatial folding, thus affecting gene transcription. In this work, we describe an improved version of our integrated web service for structural modeling of three-dimensional genome (3D-GNOME), which now incorporates all types of SVs to model changes to the reference 3D conformation of chromatin. In 3D-GNOME 2.0, the default reference 3D genome structure is generated using ChIA-PET data from the GM12878 cell line and SVs data are sourced from the population-scale catalogue of SVs identified by the 1000 Genomes Consortium. However, users may also submit their own structural data to set a customized reference genome structure, and/or a custom input list of SVs. 3D-GNOME 2.0 provides novel tools to inspect, visualize and compare 3D models for regions that differ in terms of their linear genomic sequence. Contact diagrams are displayed to compare the reference 3D structure with the one altered by SVs. In our opinion, 3D-GNOME 2.0 is a unique online tool for modeling and analyzing conformational changes to the human genome induced by SVs across populations. It can be freely accessed at https://3dgnome.cent.uw.edu.pl/.


2015 ◽  
Author(s):  
Ryan D. Hernandez ◽  
Lawrence H. Uricchio

SUMMARY: Modern implementations of forward population genetic simulations are efficient and flexible, enabling the exploration of complex models that may otherwise be intractable. Here we describe an updated version of SFS_CODE, which has increased efficiency and includes many novel features. Among these features is an arbitrary model of dominance, the ability to simulate partial and soft selective sweeps, as well as track the trajectories of mutations and/or ancestries across multiple populations under complex models that are not possible under a coalescent framework. We also release sfs_coder, a Python wrapper to SFS_CODE allowing the user to easily generate command lines for common models of demography, selection, and human genome structure, as well as parse and simulate phenotypes from SFS_CODE output. Availability and Implementation: Our open source software is written in C and Python, and are available under the GNU General Public License at http://sfscode.sourceforge.net. Contact: [email protected] Supplementary information: Detailed usage information is available from the project website at http://sfscode.sourceforge.net.


2001 ◽  
Vol 48 (3) ◽  
pp. 587-598 ◽  
Author(s):  
W Makałowski

Genetic information of human is encoded in two genomes: nuclear and mitochondrial. Both of them reflect molecular evolution of human starting from the beginning of life (about 4.5 billion years ago) until the origin of Homo sapiens species about 100,000 years ago. From this reason human genome contains some features that are common for different groups of organisms and some features that are unique for Homo sapiens. 3.2 x 10(9) base pairs of human nuclear genome are packed into 23 chromosomes of different size. The smallest chromosome - 21st contains 5 x 10(7) base pairs while the biggest one -1st contains 2.63 x 10(8) base pairs. Despite the fact that the nucleotide sequence of all chromosomes is established, the organisation of nuclear genome put still questions: for example: the exact number of genes encoded by the human genome is still unknown giving estimations from 30 to 150 thousand genes. Coding sequences represent a few percent of human nuclear genome. The majority of the genome is represented by repetitiVe sequences (about 50%) and noncoding unique sequences. This part of the genome is frequently wrongly called "junk DNA". The distribution of genes on chromosomes is irregular, DNA fragments containing low percentage of GC pairs code lower number of genes than the fragments of high percentage of GC pairs.


Sign in / Sign up

Export Citation Format

Share Document