scholarly journals Characterising genome architectures using Genome Decomposition Analysis

2021 ◽  
Author(s):  
Eerik Aunin ◽  
Matthew Berriman ◽  
Adam James Reid

AbstractGenome architecture describes how genes and other features are arranged in genomes. These arrangements reflect the evolutionary pressures on genomes and underlie biological processes such as chromosomal segregation and the regulation of gene expression. We present a new tool called Genome Decomposition Analysis (GDA) that characterises genome architectures and acts as an accessible approach for discovering hidden features of a genome assembly. With the imminent deluge of high quality genome assemblies from projects such as the Darwin Tree of Life and the Earth BioGenome Project, GDA has been designed to facilitate their exploration and the discovery of novel genome biology. We highlight the effectiveness of our approach in characterising the genome architectures of single-celled eukaryotic parasites from the phylum Apicomplexa and show that it scales well to large genomes.SignificanceGenome sequencing has revealed that there are functionally important arrangements of genes, repetitive elements and regulatory sequences within chromosomes. Identifying these arrangements requires extensive computation and analysis. Furthermore, improvements in genome sequencing technology and the establishment of consortia aiming to sequence all species of eukaryotes mean that there is a need for high throughput methods for discovering new genome biology. Here we present a software pipeline, named GDA, which determines the patterns of genomic features across chromosomes and uses these to characterise genome architecture. We show that it recapitulates the known genome architecture of several Apicomplexan parasites and use it to identify features in a recently sequenced, less well-characterised genome. GDA scales well to large genomes and is freely available.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kyle Fletcher ◽  
Lin Zhang ◽  
Juliana Gil ◽  
Rongkui Han ◽  
Keri Cavanaugh ◽  
...  

AbstractOur assembly-free linkage analysis pipeline (AFLAP) identifies segregating markers as k-mers in the raw reads without using a reference genome assembly for calling variants and provides genotype tables for the construction of unbiased, high-density genetic maps without a genome assembly. AFLAP is validated and contrasted to a conventional workflow using simulated data. AFLAP is applied to whole genome sequencing and genotype-by-sequencing data of F1, F2, and recombinant inbred populations of two different plant species, producing genetic maps that are concordant with genome assemblies. The AFLAP-based genetic map for Bremia lactucae enables the production of a chromosome-scale genome assembly.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Pinpin Long ◽  
Qiuhong Wang ◽  
Yizhi Zhang ◽  
Xiaoyan Zhu ◽  
Kuai Yu ◽  
...  

Abstract Background Acute coronary syndrome (ACS) is a cardiac emergency with high mortality. Exposure to high copper (Cu) concentration has been linked to ACS. However, whether DNA methylation contributes to the association between Cu and ACS is unclear. Methods We measured methylation level at > 485,000 cytosine-phosphoguanine sites (CpGs) of blood leukocytes using Human Methylation 450 Bead Chip and conducted a genome-wide meta-analysis of plasma Cu in a total of 1243 Chinese individuals. For plasma Cu-related CpGs, we evaluated their associations with the expression of nearby genes as well as major cardiovascular risk factors. Furthermore, we examined their longitudinal associations with incident ACS in the nested case-control study. Results We identified four novel Cu-associated CpGs (cg20995564, cg18608055, cg26470501 and cg05825244) within a 5% false discovery rate (FDR). DNA methylation level of cg18608055, cg26470501, and cg05825244 also showed significant correlations with expressions of SBNO2, BCL3, and EBF4 gene, respectively. Higher DNA methylation level at cg05825244 locus was associated with lower high-density lipoprotein cholesterol level and higher C-reactive protein level. Furthermore, we demonstrated that higher cg05825244 methylation level was associated with increased risk of ACS (odds ratio [OR], 1.23; 95% CI 1.02–1.48; P = 0.03). Conclusions We identified novel DNA methylation alterations associated with plasma Cu in Chinese populations and linked these loci to risk of ACS, providing new insights into the regulation of gene expression by Cu-related DNA methylation and suggesting a role for DNA methylation in the association between copper and ACS.


2021 ◽  
Author(s):  
Dingxia Feng ◽  
Zhiwei Zhai ◽  
Zhiyong Shao ◽  
Yi Zhang ◽  
Jo Anne Powell-Coffman

AbstractDuring development, homeostasis, and disease, organisms must balance responses that allow adaptation to low oxygen (hypoxia) with those that protect cells from oxidative stress. The evolutionarily conserved hypoxia-inducible factors are central to these processes, as they orchestrate transcriptional responses to oxygen deprivation. Here, we employ genetic strategies in C. elegans to identify stress-responsive genes and pathways that modulate the HIF-1 hypoxia-inducible factor and facilitate oxygen homeostasis. Through a genome-wide RNAi screen, we show that RNAi-mediated mitochondrial or proteasomal dysfunction increases the expression of hypoxia-responsive reporter Pnhr-57:GFP in C. elegans. Interestingly, only a subset of these effects requires hif-1. Of particular importance, we found that skn-1 RNAi increases the expression of hypoxia-responsive reporter Pnhr-57:GFP and elevates HIF-1 protein levels. The SKN-1/NRF transcription factor has been shown to promote oxidative stress resistance. We present evidence that the crosstalk between HIF-1 and SKN-1 is mediated by EGL-9, the prolyl hydroxylase that targets HIF-1 for oxygen-dependent degradation. Treatment that induces SKN-1, such as heat, increases expression of a Pegl-9:GFP reporter, and this effect requires skn-1 function and a putative SKN-1 binding site in egl-9 regulatory sequences. Collectively, these data support a model in which SKN-1 promotes egl-9 transcription, thereby inhibiting HIF-1. We propose that this interaction enables animals to adapt quickly to changes in cellular oxygenation and to better survive accompanying oxidative stress.


2020 ◽  
Author(s):  
Kyle Fletcher ◽  
Lin Zhang ◽  
Juliana Gil ◽  
Rongkui Han ◽  
Keri Cavanaugh ◽  
...  

AbstractBackgroundGenetic maps are an important resource for validation of genome assemblies, trait discovery, and breeding. Next generation sequencing has enabled production of high-density genetic maps constructed with 10,000s of markers. Most current approaches require a genome assembly to identify markers. Our Assembly Free Linkage Analysis Pipeline (AFLAP) removes this requirement by using uniquely segregating k-mers as markers to rapidly construct a genotype table and perform subsequent linkage analysis. This avoids potential biases including preferential read alignment and variant calling.ResultsThe performance of AFLAP was determined in simulations and contrasted to a conventional workflow. We tested AFLAP using 100 F2 individuals of Arabidopsis thaliana, sequenced to low coverage. Genetic maps generated using k-mers contained over 130,000 markers that were concordant with the genomic assembly. The utility of AFLAP was then demonstrated by generating an accurate genetic map using genotyping-by-sequencing data of 235 recombinant inbred lines of Lactuca spp. AFLAP was then applied to 83 F1 individuals of the oomycete Bremia lactucae, sequenced to >5x coverage. The genetic map contained over 90,000 markers ordered in 19 large linkage groups. This genetic map was used to fragment, order, orient, and scaffold the genome, resulting in a much-improved reference assembly.ConclusionsAFLAP can be used to generate high density linkage maps and improve genome assemblies of any organism when a mapping population is available using whole genome sequencing or genotyping-by-sequencing data. Genetic maps produced for B. lactucae were accurately aligned to the genome and guided significant improvements of the reference assembly.


2020 ◽  
Author(s):  
Yi Feng ◽  
Leslie Y. Beh ◽  
Wei-Jen Chang ◽  
Laura F. Landweber

AbstractCiliates are microbial eukaryotes with distinct somatic and germline genomes. Post-zygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programmed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. While many high-quality somatic genomes have been assembled, a high quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline SIGAR (Splitread Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short germline DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.


Plants ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 270 ◽  
Author(s):  
Yun Gyeong Lee ◽  
Sang Chul Choi ◽  
Yuna Kang ◽  
Kyeong Min Kim ◽  
Chon-Sik Kang ◽  
...  

The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.


2019 ◽  
Vol 9 (10) ◽  
pp. 3213-3223 ◽  
Author(s):  
Giovanna Cáceres ◽  
María E. López ◽  
María I. Cádiz ◽  
Grazyella M. Yoshida ◽  
Ana Jedlicki ◽  
...  

Nile tilapia (Oreochromis niloticus) is one of the most cultivated and economically important species in world aquaculture. Intensive production promotes the use of monosex animals, due to an important dimorphism that favors male growth. Currently, the main mechanism to obtain all-male populations is the use of hormones in feeding during larval and fry phases. Identifying genomic regions associated with sex determination in Nile tilapia is a research topic of great interest. The objective of this study was to identify genomic variants associated with sex determination in three commercial populations of Nile tilapia. Whole-genome sequencing of 326 individuals was performed, and a total of 2.4 million high-quality bi-allelic single nucleotide polymorphisms (SNPs) were identified after quality control. A genome-wide association study (GWAS) was conducted to identify markers associated with the binary sex trait (males = 1; females = 0). A mixed logistic regression GWAS model was fitted and a genome-wide significant signal comprising 36 SNPs, spanning a genomic region of 536 kb in chromosome 23 was identified. Ten out of these 36 genetic variants intercept the anti-Müllerian (Amh) hormone gene. Other significant SNPs were located in the neighboring Amh gene region. This gene has been strongly associated with sex determination in several vertebrate species, playing an essential role in the differentiation of male and female reproductive tissue in early stages of development. This finding provides useful information to better understand the genetic mechanisms underlying sex determination in Nile tilapia.


2019 ◽  
Vol 6 (2) ◽  
pp. 180608 ◽  
Author(s):  
Marvin Choquet ◽  
Irina Smolina ◽  
Anusha K. S. Dhanasiri ◽  
Leocadio Blanco-Bercial ◽  
Martina Kopp ◽  
...  

Advances in next-generation sequencing technologies and the development of genome-reduced representation protocols have opened the way to genome-wide population studies in non-model species. However, species with large genomes remain challenging, hampering the development of genomic resources for a number of taxa including marine arthropods. Here, we developed a genome-reduced representation method for the ecologically important marine copepod Calanus finmarchicus (haploid genome size of 6.34 Gbp). We optimized a capture enrichment-based protocol based on 2656 single-copy genes, yielding a total of 154 087 high-quality SNPs in C. finmarchicus including 62 372 in common among the three locations tested. The set of capture probes was also successfully applied to the congeneric C. glacialis . Preliminary analyses of these markers revealed similar levels of genetic diversity between the two Calanus species, while populations of C. glacialis showed stronger genetic structure compared to C. finmarchicus . Using this powerful set of markers, we did not detect any evidence of hybridization between C. finmarchicus and C. glacialis . Finally, we propose a shortened version of our protocol, offering a promising solution for population genomics studies in non-model species with large genomes.


2019 ◽  
Vol 31 (7) ◽  
pp. 1189 ◽  
Author(s):  
Janine E. Deakin ◽  
Sally Potter

Marsupials have unique features that make them particularly interesting to study, and sequencing of marsupial genomes is helping to understand their evolution. A decade ago, it was a huge feat to sequence the first marsupial genome. Now, the advances in sequencing technology have made the sequencing of many more marsupial genomes possible. However, the DNA sequence is only one component of the structures it is packaged into: chromosomes. Knowing the arrangement of the DNA sequence on each chromosome is essential for a genome assembly to be used to its full potential. The importance of combining sequence information with cytogenetics has previously been demonstrated for rapidly evolving regions of the genome, such as the sex chromosomes, as well as for reconstructing the ancestral marsupial karyotype and understanding the chromosome rearrangements involved in the Tasmanian devil facial tumour disease. Despite the recent advances in sequencing technology assisting in genome assembly, physical anchoring of the sequence to chromosomes is required to achieve a chromosome-level assembly. Once chromosome-level assemblies are achieved for more marsupials, we will be able to investigate changes in the packaging and interactions between chromosomes to gain an understanding of the role genome architecture has played during marsupial evolution.


2005 ◽  
Vol 79 (11) ◽  
pp. 6610-6619 ◽  
Author(s):  
M. K. Lewinski ◽  
D. Bisgrove ◽  
P. Shinn ◽  
H. Chen ◽  
C. Hoffmann ◽  
...  

ABSTRACT We have investigated regulatory sequences in noncoding human DNA that are associated with repression of an integrated human immunodeficiency virus type 1 (HIV-1) promoter. HIV-1 integration results in the formation of precise and homogeneous junctions between viral and host DNA, but integration takes place at many locations. Thus, the variation in HIV-1 gene expression at different integration sites reports the activity of regulatory sequences at nearby chromosomal positions. Negative regulation of HIV transcription is of particular interest because of its association with maintaining HIV in a latent state in cells from infected patients. To identify chromosomal regulators of HIV transcription, we infected Jurkat T cells with an HIV-based vector transducing green fluorescent protein (GFP) and separated cells into populations containing well-expressed (GFP-positive) or poorly expressed (GFP-negative) proviruses. We then determined the chromosomal locations of the two classes by sequencing 971 junctions between viral and cellular DNA. Possible effects of endogenous cellular transcription were characterized by transcriptional profiling. Low-level GFP expression correlated with integration in (i) gene deserts, (ii) centromeric heterochromatin, and (iii) very highly expressed cellular genes. These data provide a genome-wide picture of chromosomal features that repress transcription and suggest models for transcriptional latency in cells from HIV-infected patients.


Sign in / Sign up

Export Citation Format

Share Document