Multidimensional Classification of Catalysts in Oxidative Coupling of Methane through Machine Learning and High-Throughput Data

Abstract Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What’s more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.

Download Full-text

Revisiting Machine Learning Predictions for Oxidative Coupling of Methane (OCM) based on Literature Data

ChemCatChem ◽

10.1002/cctc.202001032 ◽

2020 ◽

Vol 12 (23) ◽

pp. 5888-5892

Author(s):

Shun Nishimura ◽

Junya Ohyama ◽

Takaaki Kinoshita ◽

Son Dinh Le ◽

Keisuke Takahashi

Keyword(s):

Machine Learning ◽

Oxidative Coupling ◽

Oxidative Coupling Of Methane

Download Full-text

Front Cover: Unveiling Hidden Catalysts for the Oxidative Coupling of Methane based on Combining Machine Learning with Literature Data (ChemCatChem 15/2018)

ChemCatChem ◽

10.1002/cctc.201801203 ◽

2018 ◽

Vol 10 (15) ◽

pp. 3133-3133 ◽

Cited By ~ 1

Author(s):

Keisuke Takahashi ◽

Itsuki Miyazato ◽

Shun Nishimura ◽

Junya Ohyama

Keyword(s):

Machine Learning ◽

Oxidative Coupling ◽

Oxidative Coupling Of Methane ◽

Front Cover

Download Full-text

Constructing catalyst knowledge networks from catalyst big data in oxidative coupling of methane for designing catalysts

Chemical Science ◽

10.1039/d1sc04390k ◽

2021 ◽

Author(s):

Lauren Takahashi ◽

Thanh Nhat Nguyen ◽

Sunao Nakanowatari ◽

Aya Fujiwara ◽

Toshiaki Taniike ◽

...

Keyword(s):

Big Data ◽

High Throughput ◽

Oxidative Coupling ◽

Oxidative Coupling Of Methane ◽

Catalyst Design ◽

Knowledge Networks ◽

New Method ◽

High Throughput Experimentation

Catalyst data created through high-throughput experimentation is transformed into catalyst knowledge networks, leading to a new method of catalyst design where successfully designed catalysts result in high C2 yields during the OCM reaction.

Download Full-text