New Data and New Features of the FunRiceGenes (Functionally Characterized Rice Genes) Database: 2021 Update

Author(s):  
Fangfang Huang ◽  
Yingru Jiang ◽  
Tiantian Chen ◽  
Haoran Li ◽  
Mengjia Fu ◽  
...  

Abstract As a major food crop and model organism, rice has been mostly studied with the largest number of functionally characterized genes among all crops. We previously built the funRiceGenes database including ∼2800 functionally characterized rice genes and ∼5000 members of different gene families. Since being published, the funRiceGenes database has been accessed by more than 49,000 users with over 490,000 page views. The funRiceGenes database has been continuously updated with newly cloned rice genes and newly published literature, based on the progress of rice functional genomics studies. Up to Nov 2021, ≥4100 functionally characterized rice genes and ∼6000 members of different gene families were collected in funRiceGenes, accounting for 22.3% of the 39,045 annotated protein-coding genes in the rice genome. Here, we summarized the update of the funRiceGenes database with new data and new features in the last five years.

2021 ◽  
Author(s):  
Aaron Wacholder ◽  
Omer Acar ◽  
Anne-Ruxandra Carvunis

Ribosome profiling experiments demonstrate widespread translation of eukaryotic genomes outside of annotated protein-coding genes. However, it is unclear how much of this "noncanonical" translation contributes biologically relevant microproteins rather than insignificant translational noise. Here, we developed an integrative computational framework (iRibo) that leverages hundreds of ribosome profiling experiments to detect signatures of translation with high sensitivity and specificity. We deployed iRibo to construct a reference translatome in the model organism S. cerevisiae. We identified ~19,000 noncanonical translated elements outside of the ~5,400 canonical yeast protein-coding genes. Most (65%) of these non-canonical translated elements were located on transcripts annotated as non-coding, or entirely unannotated, while the remainder were located on the 5' and 3' ends of mRNA transcripts. Only 14 non-canonical translated elements were evolutionarily conserved. In stark contrast with canonical protein-coding genes, the great majority of the yeast noncanonical translatome appeared evolutionarily transient and showed no signatures of selection. Yet, we uncovered phenotypes for 53% of a representative subset of evolutionarily transient translated elements. The iRibo framework and reference translatome described here provide a foundation for further investigation of a largely unexplored, but biologically significant, evolutionarily transient translatome.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Rashmi Jain ◽  
Jerry Jenkins ◽  
Shengqiang Shu ◽  
Mawsheng Chern ◽  
Joel A. Martin ◽  
...  

Abstract Background The availability of thousands of complete rice genome sequences from diverse varieties and accessions has laid the foundation for in-depth exploration of the rice genome. One drawback to these collections is that most of these rice varieties have long life cycles, and/or low transformation efficiencies, which limits their usefulness as model organisms for functional genomics studies. In contrast, the rice variety Kitaake has a rapid life cycle (9 weeks seed to seed) and is easy to transform and propagate. For these reasons, Kitaake has emerged as a model for studies of diverse monocotyledonous species. Results Here, we report the de novo genome sequencing and analysis of Oryza sativa ssp. japonica variety KitaakeX, a Kitaake plant carrying the rice XA21 immune receptor. Our KitaakeX sequence assembly contains 377.6 Mb, consisting of 33 scaffolds (476 contigs) with a contig N50 of 1.4 Mb. Complementing the assembly are detailed gene annotations of 35,594 protein coding genes. We identified 331,335 genomic variations between KitaakeX and Nipponbare (ssp. japonica), and 2,785,991 variations between KitaakeX and Zhenshan97 (ssp. indica). We also compared Kitaake resequencing reads to the KitaakeX assembly and identified 219 small variations. The high-quality genome of the model rice plant KitaakeX will accelerate rice functional genomics. Conclusions The high quality, de novo assembly of the KitaakeX genome will serve as a useful reference genome for rice and will accelerate functional genomics studies of rice and other species.


Author(s):  
Alaina Shumate ◽  
Aleksey V. Zimin ◽  
Rachel M. Sherman ◽  
Daniela Puiu ◽  
Justin M. Wagner ◽  
...  

AbstractHere we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are >99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. 40 of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. 11 genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Qingzhen Wei ◽  
Jinglei Wang ◽  
Wuhong Wang ◽  
Tianhua Hu ◽  
Haijiao Hu ◽  
...  

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.


2020 ◽  
Vol 12 (3) ◽  
pp. 185-202
Author(s):  
Xia Han ◽  
Jindan Guo ◽  
Erli Pang ◽  
Hongtao Song ◽  
Kui Lin

Abstract How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.


2021 ◽  
Author(s):  
Kazi Rahman ◽  
Alex A. Compton

The interferon-induced transmembrane ( IFITM ) family performs multiple functions in immunity, including inhibition of virus entry into cells. The IFITM repertoire varies widely between species and consists of protein-coding genes and pseudogenes. The selective forces driving pseudogenization within gene families are rarely understood. In this issue, the human pseudogene IFITM4P is characterized as a virus-induced, long non-coding RNA that contributes to restriction of Influenza A virus by regulating mRNA levels of IFITM1 , IFITM2 , and IFITM3 .


Author(s):  
Yun-Xia Luan ◽  
Yingying Cui ◽  
Wan-Jun Chen ◽  
Jianfeng Jin ◽  
Ai-Min Liu ◽  
...  

The collembolan Folsomia candida Willem, 1902, is an important representative soil arthropod that is widely distributed throughout the world and has been frequently used as a test organism in soil ecology and ecotoxicology studies. However, it is questioned as an ideal “standard” because of differences in reproductive modes and cryptic genetic diversity between strains from various geographical origins. In this study, we present two high-quality chromosome-level genomes of F. candida, for the parthenogenetic Danish strain (FCDK, 219.08 Mb, N50 of 38.47 Mb, 25,139 protein-coding genes) and the sexual Shanghai strain (FCSH, 153.09 Mb, N50 of 25.75 Mb, 21,609 protein-coding genes). The seven chromosomes of FCDK are each 25–54% larger than the corresponding chromosomes of FCSH, showing obvious repetitive element expansions and large-scale inversions and translocations but no whole-genome duplication. The strain-specific genes, expanded gene families and genes in nonsyntenic chromosomal regions identified in FCDK are highly related to its broader environmental adaptation. In addition, the overall sequence identity of the two mitogenomes is only 78.2%, and FCDK has fewer strain-specific microRNAs than FCSH. In conclusion, FCDK and FCSH have accumulated independent genetic changes and evolved into distinct species since diverging 10 Mya. Our work shows that F. candida represents a good model of rapidly cryptic speciation. Moreover, it provides important genomic resources for studying the mechanisms of species differentiation, soil arthropod adaptation to soil ecosystems, and Wolbachia-induced parthenogenesis as well as the evolution of Collembola, a pivotal phylogenetic clade between Crustacea and Insecta.


1998 ◽  
Vol 06 (01) ◽  
pp. 49-70 ◽  
Author(s):  
Julius H. Jackson ◽  
Roy George ◽  
Hezekiah O. Adeyemi ◽  
Michael A. Winrow ◽  
Patricia A. Herring ◽  
...  

A Fourier Transform of Equal Symbols (FTES) was applied as a spectral density analysis method to identify DNA bases that repeat at any frequency in selected protein-coding genes. The analysis especially focused on identification of bases responsible for the dominant signal at frequency f=1/3 found in all protein-coding genes. The study included homologous sequences from two gene families and multiple unrelated sequences from single organisms. No signal pattern or spectrum specifically characterized either gene family. However, the patterns of bases comprising the signal at f=1/3 suggested the presence of a genome-specific label for protein-coding genes from the same genome. Data suggest that three factors form the informational basis for the signal structure at f=1/3: (1) codon base positional bias; (2) codon preference; and (3) codon arrangement. Quantitative measure of the contribution of each base to the period-3 signal suggests a basis to distinguish protein-coding genes from different organisms. Application of the FTES analysis characterized genes from Escherichia coli as different from the genes from Pseudomonas aeruginosa. Preliminary analyses of genes from these and three other bacteria by artificial neural nets, using FTES parameters, support our suggestion that the period-3 informational structure contains labels for the genomic origins of protein-coding genes. FTES analysis alone or in combination with other informational measures may reveal pathways and processes of gene flow into and through natural systems of microbial cell populations.


2016 ◽  
Author(s):  
Chia-Yi Cheng ◽  
Vivek Krishnakumar ◽  
Agnes Chan ◽  
Seth Schobel ◽  
Christopher D. Town

ABSTRACTThe flowering plant Arabidopsis thaliana is a dicot model organism for research in many aspects of plant biology. A comprehensive annotation of its genome paves the way for understanding the functions and activities of all types of transcripts, including mRNA, noncoding RNA, and small RNA. The most recent annotation update (TAIR10) released more than five years ago had a profound impact on Arabidopsis research. Maintaining the accuracy of the annotation continues to be a prerequisite for future progress. Using an integrative annotation pipeline, we assembled tissue-specific RNA-seq libraries from 113 datasets and constructed 48,359 transcript models of protein-coding genes in eleven tissues. In addition, we annotated various classes of noncoding RNA including small RNA, long intergenic RNA, small nucleolar RNA, natural antisense transcript, small nuclear RNA, and microRNA using published datasets and in-house analytic results. Altogether, we identified 738 novel protein-coding genes, 508 novel transcribed regions, 5051 non-coding genes, and 35846 small-RNA loci that formerly eluded annotation. Analysis on the splicing events and RNA-seq based expression profile revealed the landscapes of gene structures, untranslated regions, and splicing activities to be more intricate than previously appreciated. Furthermore, we present 692 uniformly expressed housekeeping genes, 43% of whose human orthologs are also housekeeping genes. This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further our understanding of the biological processes of this plant model but also of other species.


Sign in / Sign up

Export Citation Format

Share Document