Genome Wide Search to Identify Reference Genes candidates for Gene Expression Analysis in Gossypium hirsutum
Abstract Background: With the advent of newer breeds and transgenic varieties of commercial crops, qPCR (quantitative polymerase chain reaction) experiments have become extremely popular for quick expression checks. Selection of appropriate reference genes plays a critical role in quantifying the expression of target gene. Most commonly used reference genes in expression studies are the “house-keeping genes”, involved in basic cellular processes. However, expression levels of such genes often vary in response to experimental conditions, forcing the researchers to validate the reference genes in every experiment. This study presents a data science driven unbiased genome-wide search results for selection of reference genes by assessing variation of >50,000 genes in a publicly available RNA-seq dataset of cotton species Gossypium hirsutum. Selected candidate genes were validated experimentally across 33 samples from normal and transgenic G. hirsutum plants, harvested from different areas of the plant at different time points under various developmental conditions. Experimental validation also includes commonly used genes from literature to suggest the most stable set of 5 genes to be used for assessment of quantitative expression in cotton plants (Fig.1). Result: Five genes (TMN5, TBL6, UTR5B, AT1g65240, CYP76B6) identified by data-driven analysis, along with two commonly used reference genes for cotton found in literature (GhPP2A1 and GhuBQ14) were validated using qPCR in a set of 33 experimental samples consisting of different tissues (leaves, square, stem and root), different stages of leaf (young and mature) and square development (small, medium and large) in both transgenic and non-transgenic plants. Expression stability of the genes was evaulated using four different algorithms - DeltaCT, Genorm, BestKeeper and Normfinder. GhPP2A1 and TMN5 were identified as the most stable genes, followed by GhuBQ14 across all the samples tested. Conclusion: This study, for the first time successfully displays a data science driven genome-wide search method followed by experimental validation as a method of choice for selection of stable reference genes for experiment with cotton species. Based on the results we recommend use of GhPP2A1, TMN5 and GhuBQ14 as the optimal candidate reference genes in qPCR experiments with normal or transgenic cotton plant tissues.