On predicting highy the y expressed genes for Escherichia coli based on mRNA microarray data
Highly expressed genes [HEG] are genees available in the organism, which carry the preferred codons for the expression system. Identifying HEG helps to find preferred codons and use them in the gene optimization to express target protein. Currently, HEG-DB are the only database to store HEG data of many strains of microorganisms, but the data are still not updated and maintained. Therefore, our research is carried out to predict HEG in the E. coli K-12 MG1655 strain based on reference sets that are the mostly used ribosomal protein coding genes and genes with high transcription levels from microarray data proposed by the research. Next, the results of HEG from the two above reference sets, HEG-RP and HEG-mRNA, were compared. Finally, we analyzed and compared the HEG that the project predicted with HEG from HEG-DB database. The results from RP and 100-mRNA reference sets were completely identical and were better than data from HEG-DB in the number of HEGs, CAI values and the number of genes contributing to important metabolic pathways. The results showed that it was possible to use reference sets from mRNA microarray data instead of ribosomal protein reference sets in HEG prediction