gene predictor
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 5)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Raíssa Silva ◽  
Kleber Padovani ◽  
Fabiana Góes ◽  
Ronnie Alves

Abstract Background Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates. Results We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval. Conclusions We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark.


2020 ◽  
Author(s):  
Raíssa Silva ◽  
Kleber Padovani ◽  
Fabiana Góes ◽  
Ronnie Alves

AbstractMotivationMicrobes perform a fundamental economic, social and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also create a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available which can aid gene annotation process though they lack of handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates.ResultsWe introduce geneRFinder, a ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval.ConclusionsWe provide geneRFinder, a approach for gene prediction in distinct metagenomic complexities, available at github.com/railorena/geneRFinder, and also we provide a novel, comprehensive benchmark data for gene prediction — which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions – avaliable at sourceforge.net/p/generfinder-benchmark.


Oncotarget ◽  
2020 ◽  
Vol 11 (24) ◽  
pp. 2302-2309
Author(s):  
Joshua D. Bloomstein ◽  
Rie von Eyben ◽  
Andy Chan ◽  
Erinn B. Rankin ◽  
Daniel R. Fregoso ◽  
...  

Author(s):  
J. Bloomstein ◽  
R. Von Eyben ◽  
E. Rankin ◽  
J. Wang-Chiang ◽  
S. MacLaughlan David ◽  
...  

2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Bruno V. Sinn ◽  
Chunxiao Fu ◽  
Rosanna Lau ◽  
Jennifer Litton ◽  
Tsung-Heng Tsai ◽  
...  

Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 4095-4095
Author(s):  
Anjali Silva ◽  
Clementine Sarkozy ◽  
Tracy Lackraj ◽  
Anja Mottok ◽  
Vindi Jurinovic ◽  
...  

Abstract Introduction : Follicular lymphoma (FL) is a clinically and genetically heterogeneous disease with highly variable patient outcomes. Recently, Huet et al. proposed a 23-gene expression-based risk score for predicting progression-free survival (PFS) in FL patients treated with rituximab and chemotherapy (Huet et al. Lancet Oncology 2018). The m7-FLIPI risk score has also been described as a clinico-genetic model predicting patient outcomes (Pastore et al. Lancet Oncology 2015). Moreover, EZH2 wild-type status and high expression of the FOXP1 transcription factor are associated with increased risk of lymphoma progression (Mottok et al. Blood 2018). This multitude of prognostic tools in FL raises the question whether they identify common biology. The aims of this study were to assess whether the 23-gene predictor score identifies a poor risk group of patients in our own gene expression dataset, and whether commonality exists between the 23-gene score, the m7-FLIPI, EZH2 mutation status and FOXP1 expression. Methods: In our previous work, we generated Illumina DASL microarray expression profiles for 137 FL patients who were treated with rituximab and CVP chemotherapy (cyclophosphamide, vincristine and prednisone). Using genes from the 23-gene linear risk predictor, we determined each patient's risk score by setting coefficients at -1 and +1 for genes associated with favorable and unfavorable PFS, respectively. We dichotomized the distribution of scores using the maximally selected log-rank statistic. We also performed unsupervised, hierarchical clustering to identify underlying subgroups in an unbiased fashion. Survival analyses were performed using the log-rank test and Cox regression analyses. We used gene set enrichment analysis to identify concordant differences of relevant gene signatures between specimens with either low or high expression of FOXP1. Results : Twenty genes from the 23-gene predictor (87%) were identified in the DASL gene expression dataset. The coefficients from univariate Cox regression analysis from our data were correlated with coefficients from Huet et al. (Pearson r = .7, P < .001; Spearman r = .44, P = .051). All poor-risk genes from the 23-gene predictor were associated with poor PFS in our data, and vice versa. Concordantly, calculated risk scores were significantly associated with PFS in the univariate Cox regression analysis (P = .007). Dichotomizing the distribution of risk scores identified 68% of cases with high risk score who had inferior PFS and OS compared to 32% of cases with low risk score (5-year PFS 54% vs. 77%, P = .004; 5-year OS 73% vs. 86%, P = .04). Hence, the risk score stratified patients into groups with diverging outcomes. This association was found to be independent of the Follicular Lymphoma Prognostic Index (FLIPI). In addition, the mean risk score was significantly higher in cases with high expression of FOXP1 (P < .001) and in cases with high m7-FLIPI risk score (P = .023). Unsupervised hierarchical clustering identified two main clusters ("cluster 1" and "cluster 2") that were characterized by low and high expression of genes associated with poor outcome, respectively. Patients from "cluster 2" experienced worse PFS compared to patients in "cluster 1" (P = .046; 5-year PFS 54% vs. 68%). The 5-year OS was 72% for patients in "cluster 2", vs. 81% in "cluster 1" (P = .13). We have previously reported that a germinal centre dark zone signature is enriched in cases with high FOXP1 expression, and the ICA13 signature reported by Huet et al. has been described as being highly expressed in centroblasts. Using gene set enrichment analysis, we found that genes with positive weight and coefficients in the ICA13 and the 23-gene predictor score, respectively, were enriched in the FOXP1-high phenotype (adjusted P = .009 and .005, respectively). GeneMANIA illustrated co-expression interconnectivity among ORAI2, TCF4, AFF3, FOXO1, CXCR4 and FOXP1, suggesting that genes with prognostic significance operate in tightly regulated networks. Conclusions: Our results exemplify the robustness of the predictor model by Huet et al. Further, we demonstrate biomarker convergence on a common phenotype: FOXP1 expression, EZH2 wild-type status and expression of dark zone-related genes, which characterize a subset of FL cases with adverse outcome following rituximab and chemotherapy. Disclosures Sarkozy: Roche/Genentech: Consultancy. Sehn:Roche/Genentech: Consultancy, Honoraria; Amgen: Consultancy, Honoraria; Karyopharm: Consultancy, Honoraria; Lundbeck: Consultancy, Honoraria; Seattle Genetics: Consultancy, Honoraria; Janssen: Consultancy, Honoraria; Abbvie: Consultancy, Honoraria; Celgene: Consultancy, Honoraria; TG Therapeutics: Consultancy, Honoraria; Merck: Consultancy, Honoraria; Morphosys: Consultancy, Honoraria. Weigert:Novartis: Research Funding; Roche: Research Funding. Steidl:Juno Therapeutics: Consultancy; Tioma: Research Funding; Bristol-Myers Squibb: Research Funding; Roche: Consultancy; Seattle Genetics: Consultancy; Nanostring: Patents & Royalties: patent holding.


2017 ◽  
pp. 1-10 ◽  
Author(s):  
Tomohiro F. Nishijima ◽  
Jordan Kardos ◽  
Shengjie Chai ◽  
Christof C. Smith ◽  
Dante S. Bortone ◽  
...  

Purpose Claudin-low molecular subtypes have been identified in breast and bladder cancers and are characterized by low expression of claudins, enrichment for epithelial-to-mesenchymal transition (EMT), and tumor-initiating cell (TIC) features. We evaluated whether the claudin-low subtype also exists in gastric cancer. Materials and Methods Four hundred fifteen tumors from The Cancer Genome Atlas (TCGA) gastric cancer mRNA data set were clustered on the claudin, EMT, and TIC gene sets to identify claudin-low tumors. We derived a 24-gene predictor that classifies gastric cancer into claudin-low and non–claudin-low subtypes. This predictor was validated with the Asian Cancer Research Group (ACRG) data set. We characterized molecular and clinical features of claudin-low tumors. Results We identified 46 tumors that had consensus enrichment for claudin-low features in TCGA data set. Claudin-low tumors were most commonly diffuse histologic type (82%) and originally classified as TCGA genomically stable (GS) subtype (78%). Compared with GS subtype, claudin-low subtype had significant activation in Rho family of GTPases signaling, which appears to play a key role in its EMT and TIC properties. In the ACRG data set, 28 of 300 samples were classified as claudin-low tumors by the 24-gene predictor and were phenotypically similar to the initially derived claudin-low tumors. Clinically, claudin-low subtype had the worst overall survival. Of note, the hazard ratios that compared claudin-low versus GS subtype were 2.10 (95% CI, 1.07 to 4.11) in TCGA and 2.32 (95% CI, 1.18 to 4.55) in the ACRG cohorts, with adjustment for age and pathologic stage. Conclusion We identified a gastric claudin-low subtype that carries a poor prognosis likely related to therapeutic resistance as a result of its EMT and TIC phenotypes.


2017 ◽  
Author(s):  
Robert M. Waterhouse ◽  
Mathieu Seppey ◽  
Felipe A. Simão ◽  
Mosè Manni ◽  
Panagiotis Ioannidis ◽  
...  

ABSTRACTGenomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). Now in its third release, BUSCO utilities extend beyond quality control to applications in comparative genomics, gene predictor training, metagenomics, and phylogenomics.


Sign in / Sign up

Export Citation Format

Share Document