Chromosome-level genome assembly of Gynostemma pentaphyllum provides insights into gypenoside biosynthesis

Abstract Gynostemma pentaphyllum (Thunb.) Makino is an economically valuable medicinal plant belonging to the Cucurbitaceae family that produces the bioactive compound gypenoside. Despite several transcriptomes having been generated for G. pentaphyllum, a reference genome is still unavailable, which has limited the understanding of the gypenoside biosynthesis and regulatory mechanism. Here, we report a high-quality G. pentaphyllum genome with a total length of 582 Mb comprising 1,232 contigs and a scaffold N50 of 50.78 Mb. The G. pentaphyllum genome comprised 59.14% repetitive sequences and 25,285 protein-coding genes. Comparative genome analysis revealed that G. pentaphyllum was related to Siraitia grosvenorii, with an estimated divergence time dating to the Paleogene (∼48 million years ago). By combining transcriptome data from seven tissues, we reconstructed the gypenoside biosynthetic pathway and potential regulatory network using tissue-specific gene co-expression network analysis. Four UDP-glucuronosyltransferases (UGTs), belonging to the UGT85 subfamily and forming a gene cluster, were involved in catalyzing glycosylation in leaf-specific gypenoside biosynthesis. Furthermore, candidate biosynthetic genes and transcription factors involved in the gypenoside regulatory network were identified. The genetic information obtained in this study provides insights into gypenoside biosynthesis and lays the foundation for further exploration of the gypenoside regulatory mechanism.

Download Full-text

Combined genomic, transcriptomic, and metabolomic analyses provide insights into chayote (Sechium edule) evolution and fruit development

Horticulture Research ◽

10.1038/s41438-021-00487-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Anzhen Fu ◽

Qing Wang ◽

Jianlou Mu ◽

Lili Ma ◽

Changlong Wen ◽

...

Keyword(s):

Fruit Development ◽

Repetitive Sequences ◽

Genetic Research ◽

Future Research ◽

Agricultural Crop ◽

Protein Coding ◽

Third Generation Sequencing ◽

Sechium Edule ◽

Generation Sequencing ◽

Cucurbitaceae Family

AbstractChayote (Sechium edule) is an agricultural crop in the Cucurbitaceae family that is rich in bioactive components. To enhance genetic research on chayote, we used Nanopore third-generation sequencing combined with Hi–C data to assemble a draft chayote genome. A chromosome-level assembly anchored on 14 chromosomes (N50 contig and scaffold sizes of 8.40 and 46.56 Mb, respectively) estimated the genome size as 606.42 Mb, which is large for the Cucurbitaceae, with 65.94% (401.08 Mb) of the genome comprising repetitive sequences; 28,237 protein-coding genes were predicted. Comparative genome analysis indicated that chayote and snake gourd diverged from sponge gourd and that a whole-genome duplication (WGD) event occurred in chayote at 25 ± 4 Mya. Transcriptional and metabolic analysis revealed genes involved in fruit texture, pigment, flavor, flavonoids, antioxidants, and plant hormones during chayote fruit development. The analysis of the genome, transcriptome, and metabolome provides insights into chayote evolution and lays the groundwork for future research on fruit and tuber development and genetic improvements in chayote.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text

NKL Homeobox Gene VENTX Is Part of a Regulatory Network in Human Conventional Dendritic Cells

International Journal of Molecular Sciences ◽

10.3390/ijms22115902 ◽

2021 ◽

Vol 22 (11) ◽

pp. 5902

Author(s):

Stefan Nagel ◽

Claudia Pommerenke ◽

Corinna Meyer ◽

Hans G. Drexler

Keyword(s):

Dendritic Cells ◽

Transcription Factors ◽

Cell Lines ◽

Expression Profiling ◽

Regulatory Network ◽

Homeobox Gene ◽

Leukemia Cell Line ◽

Specific Gene ◽

Rna Seq ◽

And Function

Recently, we documented a hematopoietic NKL-code mapping physiological expression patterns of NKL homeobox genes in human myelopoiesis including monocytes and their derived dendritic cells (DCs). Here, we enlarge this map to include normal NKL homeobox gene expressions in progenitor-derived DCs. Analysis of public gene expression profiling and RNA-seq datasets containing plasmacytoid and conventional dendritic cells (pDC and cDC) demonstrated HHEX activity in both entities while cDCs additionally expressed VENTX. The consequent aim of our study was to examine regulation and function of VENTX in DCs. We compared profiling data of VENTX-positive cDC and monocytes with VENTX-negative pDC and common myeloid progenitor entities and revealed several differentially expressed genes encoding transcription factors and pathway components, representing potential VENTX regulators. Screening of RNA-seq data for 100 leukemia/lymphoma cell lines identified prominent VENTX expression in an acute myelomonocytic leukemia cell line, MUTZ-3 containing inv(3)(q21q26) and t(12;22)(p13;q11) and representing a model for DC differentiation studies. Furthermore, extended gene analyses indicated that MUTZ-3 is associated with the subtype cDC2. In addition to analysis of public chromatin immune-precipitation data, subsequent knockdown experiments and modulations of signaling pathways in MUTZ-3 and control cell lines confirmed identified candidate transcription factors CEBPB, ETV6, EVI1, GATA2, IRF2, MN1, SPIB, and SPI1 and the CSF-, NOTCH-, and TNFa-pathways as VENTX regulators. Live-cell imaging analyses of MUTZ-3 cells treated for VENTX knockdown excluded impacts on apoptosis or induced alteration of differentiation-associated cell morphology. In contrast, target gene analysis performed by expression profiling of knockdown-treated MUTZ-3 cells revealed VENTX-mediated activation of several cDC-specific genes including CSFR1, EGR2, and MIR10A and inhibition of pDC-specific genes like RUNX2. Taken together, we added NKL homeobox gene activities for progenitor-derived DCs to the NKL-code, showing that VENTX is expressed in cDCs but not in pDCs and forms part of a cDC-specific gene regulatory network operating in DC differentiation and function.

Download Full-text

Developmental gene expression in Leishmania donovani: differential cloning and analysis of an amastigote-stage-specific gene

Molecular and Cellular Biology ◽

10.1128/mcb.14.5.2975-2984.1994 ◽

1994 ◽

Vol 14 (5) ◽

pp. 2975-2984

Author(s):

H Charest ◽

G Matlashewski

Keyword(s):

Life Cycle ◽

Leishmania Donovani ◽

Repetitive Sequences ◽

Protozoan Parasite ◽

Life Cycle Stage ◽

Immune Serum ◽

Sand Fly ◽

Specific Gene ◽

Reading Frame ◽

Developmental Gene Expression

Leishmania protozoans are the causative agents of leishmaniasis, a major parasitic disease in humans. During their life cycle, Leishmania protozoans exist as flagellated promastigotes in the sand fly vector and as nonmotile amastigotes in the mammalian hosts. The promastigote-to-amastigote transformation occurs in the phagolysosomal compartment of the macrophage cell and is a critical step for the establishment of the infection. To study this cytodifferentiation process, we differentially screened an amastigote cDNA library with life cycle stage-specific cDNA probes and isolated seven cDNAs representing amastigote-specific transcripts. Five of these were closely related (A2 series) and recognized, by Northern (RNA) blot analyses, a 3.5-kb transcript in amastigotes and in amastigote-infected macrophages. Expression of the amastigote-specific A2 gene was induced in promastigotes when they were transferred from culture medium at 26 degrees C and pH 7.4 to medium at 37 degrees C and pH 4.5, conditions which mimic the macrophage phagolysosomal environment. A2 genes are clustered in tandem arrays, and a 6-kb fragment corresponding to a unit of the cluster was cloned and partially sequenced. An open reading frame found within the A2-transcribed region potentially encoded a 22-kDa protein containing repetitive sequences. The recombinant A2 protein produced in Escherichia coli cells was specifically recognized by immune serum from a patient with visceral leishmaniasis. The A2 protein repetitive element has strong homology with an S antigen of Plasmodium falciparum, the protozoan parasite responsible for malaria. Both the A2 protein of Leishmania donovani and the S antigen of P. falciparum are stage specific and developmentally expressed in mammalian hosts.

Download Full-text

Deciphering the genetic links between NAFLD and co-occurring conditions using a liver gene regulatory network

10.1101/2021.12.08.471841 ◽

2021 ◽

Author(s):

Sreemol Gokuladhas ◽

William Schierding ◽

Roan Eltigani Zaied ◽

Tayaza Fadason ◽

Murim Choi ◽

...

Keyword(s):

Gene Regulatory Network ◽

Regulatory Network ◽

Complex Traits ◽

Target Genes ◽

Specific Gene ◽

Fat Percentage ◽

Alcoholic Fatty Liver ◽

Regulatory Interactions ◽

Hepatic Diseases ◽

Gene Regulatory

Background & Aims: Non-alcoholic fatty liver disease (NAFLD) is a multi-system metabolic disease that co-occurs with various hepatic and extra-hepatic diseases. The phenotypic manifestation of NAFLD is primarily observed in the liver. Therefore, identifying liver-specific gene regulatory interactions between variants associated with NAFLD and multimorbid conditions may help to improve our understanding of underlying shared aetiology. Methods: Here, we constructed a liver-specific gene regulatory network (LGRN) consisting of genome-wide spatially constrained expression quantitative trait loci (eQTLs) and their target genes. The LGRN was used to identify regulatory interactions involving NAFLD-associated genetic modifiers and their inter-relationships to other complex traits. Results and Conclusions: We demonstrate that MBOAT7 and IL32, which are associated with NAFLD progression, are regulated by spatially constrained eQTLs that are enriched for an association with liver enzyme levels. MBOAT7 transcript levels are also linked to eQTLs associated with cirrhosis, and other traits that commonly co-occur with NAFLD. In addition, genes that encode interacting partners of NAFLD-candidate genes within the liver-specific protein-protein interaction network were affected by eQTLs enriched for phenotypes relevant to NAFLD (e.g. IgG glycosylation patterns, OSA). Furthermore, we identified distinct gene regulatory networks formed by the NAFLD-associated eQTLs in normal versus diseased liver, consistent with the context-specificity of the eQTLs effects. Interestingly, genes targeted by NAFLD-associated eQTLs within the LGRN were also affected by eQTLs associated with NAFLD-related traits (e.g. obesity and body fat percentage). Overall, the genetic links identified between these traits expand our understanding of shared regulatory mechanisms underlying NAFLD multimorbidities.

Download Full-text

Loss of critical developmental and human disease-causing genes in 58 mammals

10.1101/819169 ◽

2019 ◽

Author(s):

Yatish Turakhia ◽

Heidi I. Chen ◽

Amir Marcovitz ◽

Gill Bejerano

Keyword(s):

Evolutionary Biology ◽

Large Scale ◽

Gene Annotation ◽

Synonymous Substitution ◽

Specific Gene ◽

High Confidence ◽

Protein Coding ◽

Congenital Diseases ◽

Manual Curation ◽

Human Genes

Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools and protein databases focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (deletion and non-synonymous substitution) as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence protein-coding gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using the hg38 human assembly as a reference, we discovered over 500 unique human genes affected by such high-confidence erosion events in different clades across 58 mammals. While most of these events likely have benign consequences, we also found dozens of clade-specific gene losses that result in early lethality in outgroup mammals or are associated with severe congenital diseases in humans. Our discoveries yield intriguing potential for translational medical genetics and for evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.

Download Full-text

Identification and characterization of cis-regulatory elements for photoreceptor type-specific transcription in zebrafish

10.1101/683284 ◽

2019 ◽

Author(s):

Wei Fang ◽

Yi Wen ◽

Xiangyun Wei

Keyword(s):

Core Promoter ◽

Regulatory Elements ◽

Specific Gene ◽

Protein Coding ◽

Core Promoters ◽

Protein Coding Genes ◽

The Core ◽

Cell Type Specific ◽

Identification And Characterization

AbstractTissue-specific or cell type-specific transcription of protein-coding genes is controlled by both trans-regulatory elements (TREs) and cis-regulatory elements (CREs). However, it is challenging to identify TREs and CREs, which are unknown for most genes. Here, we describe a protocol for identifying two types of transcription-activating CREs—core promoters and enhancers—of zebrafish photoreceptor type-specific genes. This protocol is composed of three phases: bioinformatic prediction, experimental validation, and characterization of the CREs. To better illustrate the principles and logic of this protocol, we exemplify it with the discovery of the core promoter and enhancer of the mpp5b apical polarity gene (also known as ponli), whose red, green, and blue (RGB) cone-specific transcription requires its enhancer, a member of the rainbow enhancer family. While exemplified with an RGB cone-specific gene, this protocol is general and can be used to identify the core promoters and enhancers of other protein-coding genes.

Download Full-text

Chromatin accessibility is dynamically regulated across C. elegans development and ageing

10.1101/279158 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jürgen Jänes ◽

Yan Dong ◽

Michael Schoof ◽

Jacques Serizay ◽

Alex Appert ◽

...

Keyword(s):

Regulatory Mechanism ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Protein Coding ◽

C Elegans ◽

Transcription Profiles ◽

Physiological Processes ◽

Global Identification ◽

Identification And Characterization

AbstractAn essential step for understanding the transcriptional circuits that control development and physiology is the global identification and characterization of regulatory elements. Here we present the first map of regulatory elements across the development and ageing of an animal, identifying 42,245 elements accessible in at least one C. elegans stage. Based on nuclear transcription profiles, we define 15,714 protein-coding promoters and 19,231 putative enhancers, and find that both types of element can drive orientation-independent transcription. Additionally, hundreds of promoters produce transcripts antisense to protein coding genes, suggesting involvement in a widespread regulatory mechanism. We find that the accessibility of most elements is regulated during development and/or ageing and that patterns of accessibility change are linked to specific developmental or physiological processes. The map and characterization of regulatory elements across C. elegans life provides a platform for understanding how transcription controls development and ageing.

Download Full-text

Natural Selection at an Exceptionally Long GGC Repeat in the Human RASGEF1C and Divergent Genotypes in Late-onset Neurocognitive Disorder

10.21203/rs.3.rs-517583/v1 ◽

2021 ◽

Author(s):

Z Jafarian ◽

S Khamse ◽

H Afshar ◽

Khorram Khorshid HR ◽

A Delbari ◽

...

Keyword(s):

Natural Selection ◽

Evolutionary Biology ◽

Late Onset ◽

Core Promoter ◽

Human Subjects ◽

Specific Gene ◽

Protein Coding ◽

Repeat Allele ◽

Complex Disorders ◽

Selection For

Abstract Across the human protein-coding genes, the neuron-specific gene, RASGEF1C, contains the longest (GGC)-repeat, spanning its core promoter and 5′ untranslated region (RASGEF1C-201 ENST00000361132.9). RASGEF1C expression dysregulation occurs in late-onset neurocognitive disorders (NCDs), such as Alzheimer’s disease. Here we sequenced the GGC-repeat in a sample of human subjects (N = 269), consisting of late-onset NCDs (N = 115) and controls (N = 154). We also studied the status of this STR across vertebrates. The 6-repeat allele of this repeat was the predominant allele in the controls (frequency = 0.85) and NCD patients (frequency = 0.78). The NCD genotype compartment consisted of an excess of genotypes that lacked the 6-repeat (Mid-P exact = 0.004). We also detected divergent genotypes that were present in five NCD patients and not in the controls (Mid-P exact = 0.007). This STR expanded beyond 2-repeats specifically in primates, and was at maximum length in human. We conclude that there is natural selection for the 6-repeat allele of the RASGEF1C (GGC)-repeat in human, and significant divergence from that allele in late-onset NCDs. Indication of natural selection for predominantly abundant STR alleles and divergent genotypes enhance the perspective of evolutionary biology and disease pathogenesis in human complex disorders.

Download Full-text

Chromosome-scale assembly of the Sparassis latifolia genome obtained using long-read and Hi-C sequencing

10.1101/2021.01.08.426014 ◽

2021 ◽

Author(s):

Chi yang ◽

Lu Ma ◽

Donglai Xiao ◽

Xiaoyu Liu ◽

Xiaoling Jiang ◽

...

Keyword(s):

Repetitive Sequences ◽

Draft Genome ◽

Edible Mushroom ◽

Illumina Hiseq ◽

Protein Coding ◽

Long Reads ◽

Oxford Nanopore ◽

Genome Features ◽

Long Read ◽

Genomic Studies

Sparassis latifolia is a valuable edible mushroom cultivated in China. In 2018, our research group reported an incomplete and low quality genome of S. latifolia was obtained by Illumina HiSeq 2500 sequencing. These limitations in the available genome have constrained genetic and genomic studies in this mushroom resource. Herein, an updated draft genome sequence of S. latifolia was generated by Oxford Nanopore sequencing and the Hi-C technique. A total of 8.24 Gb of Oxford Nanopore long reads representing ~198.08X coverage of the S. latifolia genome were generated. Subsequently, a high-quality genome of 41.41 Mb, with scaffold and contig N50 sizes of 3.31 Mb and 1.51 Mb, respectively, was assembled. Hi-C scaffolding of the genome resulted in 12 pseudochromosomes containing 93.56% of the bases in the assembled genome. Genome annotation further revealed that 17.47% of the genome was composed of repetitive sequences. In addition, 13,103 protein-coding genes were predicted, among which 98.72% were functionally annotated. BUSCO assay results further revealed that there were 92.07% complete BUSCOs. The improved chromosome-scale assembly and genome features described here will aid further molecular elucidation of various traits, breeding of S. latifolia, and evolutionary studies with related taxa.

Download Full-text