scholarly journals Using Optimal F-Measure and Random Resampling in Gene Ontology Enrichment Calculations

Author(s):  
Weihao Ge ◽  
Zeeshan Fazal ◽  
Eric Jakobsson
2017 ◽  
Author(s):  
Weihao Ge ◽  
Zeeshan Fazal ◽  
Eric Jakobsson

AbstractBackgroundA central question in bioinformatics is how to minimize arbitrariness and bias in analysis of patterns of enrichment in data. A prime example of such a question is enrichment of gene ontology (GO) classes in lists of genes. Our paper deals with two issues within this larger question. One is how to calculate the false discovery rate (FDR) within a set of apparently enriched ontologies, and the second how to set that FDR within the context of assessing significance for addressing biological questions, to answer these questions we compare a random resampling method with a commonly used method for assessing FDR, the Benjamini-Hochberg (BH) method. We further develop a heuristic method for evaluating Type II (false negative) errors to enable utilization of F-Measure binary classification theory for distinguishing “significant” from “non-significant” degrees of enrichment.ResultsThe results show the preferability and feasibility of random resampling assessment of FDR over the analytical methods with which we compare it. They also show that the reasonableness of any arbitrary threshold depends strongly on the structure of the dataset being tested, suggesting that the less arbitrary method of F-measure optimization to determine significance threshold is preferable.ConclusionTherefore, we suggest using F-measure optimization instead of placing an arbitrary threshold to evaluate the significance of Gene Ontology Enrichment results, and using resampling to replace analytical methods


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Maksim A. Nesterenko ◽  
Viktor V. Starunov ◽  
Sergei V. Shchenkov ◽  
Anna R. Maslova ◽  
Sofia A. Denisova ◽  
...  

Abstract Background Parasitic flatworms (Trematoda: Digenea) represent one of the most remarkable examples of drastic morphological diversity among the stages within a life cycle. Which genes are responsible for extreme differences in anatomy, physiology, behavior, and ecology among the stages? Here we report a comparative transcriptomic analysis of parthenogenetic and amphimictic generations in two evolutionary informative species of Digenea belonging to the family Psilostomatidae. Methods In this study the transcriptomes of rediae, cercariae and adult worm stages of Psilotrema simillimum and Sphaeridiotrema pseudoglobulus, were sequenced and analyzed. High-quality transcriptomes were generated, and the reference sets of protein-coding genes were used for differential expression analysis in order to identify stage-specific genes. Comparative analysis of gene sets, their expression dynamics and Gene Ontology enrichment analysis were performed for three life stages within each species and between the two species. Results Reference transcriptomes for P. simillimum and S. pseudoglobulus include 21,433 and 46,424 sequences, respectively. Among 14,051 orthologous groups (OGs), 1354 are common and specific for two analyzed psilostomatid species, whereas 13 and 43 OGs were unique for P. simillimum and S. pseudoglobulus, respectively. In contrast to P. simillimum, where more than 60% of analyzed genes were active in the redia, cercaria and adult worm stages, in S. pseudoglobulus less than 40% of genes had such a ubiquitous expression pattern. In general, 7805 (36.41%) and 30,622 (65.96%) of genes were preferentially expressed in one of the analyzed stages of P. simillimum and S. pseudoglobulus, respectively. In both species 12 clusters of co-expressed genes were identified, and more than a half of the genes belonging to the reference sets were included into these clusters. Functional specialization of the life cycle stages was clearly supported by Gene Ontology enrichment analysis. Conclusions During the life cycles of the two species studied, most of the genes change their expression levels considerably, consequently the molecular signature of a stage is not only a unique set of expressed genes, but also the specific levels of their expression. Our results indicate unexpectedly high level of plasticity in gene regulation between closely related species. Transcriptomes of P. simillimum and S. pseudoglobulus provide high quality reference resource for future evolutionary studies and comparative analyses.


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Ashley J. Waardenberg ◽  
Samuel D. Bassett ◽  
Romaric Bouveret ◽  
Richard P. Harvey

Sign in / Sign up

Export Citation Format

Share Document