Identification of the diagnostic signature of sepsis based on bioinformatic analysis of gene expression and machine learning

Background: Sepsis is a life-threatening disease caused by the dysregulated host response to the infection, and being the major cause of death to patients in intensive care unit (ICU). Objective: Early diagnosis of sepsis could significantly reduce in-hospital mortality. Though generated from infection, the development of sepsis follows its own psychological process and disciplines, alters with gender, health status and other factors. Hence, the analysis of mass data by bioinformatic tools and machine learning is a promising method for exploring early diagnosis manners. Method: We collected miRNA and mRNA expression data of sepsis blood samples from Gene Expression Omnibus (GEO) and ArrayExpress databases, screened out differentially expressed genes (DEGs) by R software, predicted miRNA targets on TargetScanHuman and miRTarBase websites, conducted Gene Ontology (GO) term and KEGG pathway enrichment based on overlapping DEGs. The STRING database and Cytoscape were used to build protein-protein interaction (PPI) network and predict hub genes. Then we constructed a Random Forest model by using the hub genes to assess sample type. Results: Bioinformatic analysis of GEO dataset revealed 46 overlapping DEGs in sepsis. The PPI network analysis identified five hub genes, SOCS3, KBTBD6, FBXL5, FEM1C and WSB1. Random Forest model based on these five hub genes was used to assess GSE95233 and GSE95233 datasets, and the area under curve (AUC) of ROC are 0.900 and 0.7988, respectively, which confirmed the efficacy of this model. Conclusion: The integrated analysis of gene expression in sepsis and the effective Random Forest model built in this study may provide promising diagnostic methods for sepsis.

Download Full-text

Data efficient Random Forest model for avalanche forecasting

10.5194/nhess-2019-379 ◽

2019 ◽

Author(s):

Manesh Chawla ◽

Amreek Singh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Direct Reduction ◽

Random Forest Model ◽

Machine Learning Techniques ◽

Control Structures ◽

Avalanche Forecasting ◽

Forest Model ◽

Avalanche Flow ◽

Data Efficiency

Abstract. Fast downslope release of snow (avalanche) is a serious hazard to people living in snow bound mountains. Released snow mass can gain sufficient momentum on its down slope path to kill humans, uproot trees and rocks, destroy buildings. Direct reduction of avalanche threat is done by building control structures to add mechanical support to snowpack and reduce or deflect downward avalanche flow. On large terrains it is economically infeasible to use these methods on each high risk site.Therefore predicting and avoiding avalanches is the only feasible method to reduce threat but sufficient snow stability data for accurate forecasting is generally unavailable and difficult to collect. Forecasters infer snow stability from their knowledge of local weather, terrain and sparsely available snowpack observations. This inference process is vulnerable to human bias therefore machine learning models are used to find patterns from past data and generate helpful outputs to minimise and quantify uncertainty in forecasting process. These machine learning techniques require long past records of avalanches which are difficult to obtain. In this paper we propose a data efficient Random Forest model to address this problem. The model can generate a descriptive forecast showing reasoning and patterns which are difficult to observe manually. Our model advances the field by being inexpensive and convenient for operational forecasting due to its data efficiency, ease of automation and ability to describe its decisions.

Download Full-text

Random Forest Model For Predicting IDH1 Gene Expression Based on Volumetric Texture Parameters of WHO Grade II/III Glioma and Peritumoral EdemaRandom Forest Model for Predicting IDH1 Gene Expression Based on Volumetric Texture Parameters of WHO Grade II/III Glioma and Peritumoral Edema

10.21203/rs.3.rs-496762/v1 ◽

2021 ◽

Author(s):

Wenting Lan ◽

Zhan Feng ◽

Yan Zhang ◽

ZhengYa Zhao ◽

Yi Huang ◽

...

Keyword(s):

Gene Expression ◽

Random Forest ◽

Random Forest Model ◽

Peritumoral Edema ◽

Wild Type ◽

Who Grade ◽

Forest Model ◽

Texture Parameters ◽

Grade Ii ◽

Volumetric Texture

Abstract Background: The incidence of Isocitrate dehydrogenase (IDH) gene mutation had closed contact with the development and prognosis of WHO grade II/III glioma. This study aims to establish and evaluate the predicting random Forest model for IDH1 gene mutation based on parenchyma and peritumoral edema ADC image texture parameters of WHO grade II/III glioma. Materials and Methods: 146 patients (77 males and 69 females) with histologically confirmed anaplastic glioma were divided into training and validation groups in a ratio of 7:3 according to the requirements of Random Forest Model. The training group consisted of 102 patients (42 IDH1 mutant and 60 wild type) and the validation group included 44 patients (18 IDH1 mutant and 26 wild type). Conventional MRI features of two independent samples (IDH1 mutant and wild type) were evaluated by the Visually Accessible Rembrandt Images (VASARI) scoring system, Texture analysis (TA) of ADC image was based on the entire tumor volume and peritumoral edema and was used as Principal component analysis (PCA) to screen texture features labels. Random forest diagnosis models (VASARI+TumorADC、VASARI+TumorADC+EdemaADC) were constructed on the basis of morphological single-factor variables, texture feature labels. Result: The diagnostic accuracy of the random forest diagnosis model (VASARI+Tumor ADC) was 71.5%, the specificity was 75.40%, and the AUC was 0.769, The model (VASARI+TumorADC +peritumoral edema ADC ) was 80.9%, 79.5% ,and 0.819 correspondingly. Conclusion: The texture parameters of peritumoral edema ADC image were non-invasive markers to predict IDH1 mutational status and they have played a certain role in improving the efficiency of diagnostic model.

Download Full-text

Integrated Bioinformatic Analysis of Key Biomarkers and Signaling Pathways in Psoriasis

10.21203/rs.3.rs-421570/v1 ◽

2021 ◽

Author(s):

Suwei Tang ◽

Ping Xu ◽

Shaoqiong Xie ◽

Wencheng Jiang ◽

Jiajing Lu ◽

...

Keyword(s):

Gene Expression ◽

Signaling Pathways ◽

Immune Cell ◽

Cell Types ◽

Bioinformatic Analysis ◽

Receptor Interaction ◽

Ppi Network ◽

Hub Genes ◽

Diagnostic Efficacy ◽

Network Analyses

Abstract Background: Psoriasis is a relatively common autoimmune inflammatory skin disease with a chronic etiology. The present study was designed to detect novel biomarkers and pathways associated with psoriasis incidence. Methods: Differentially expressed genes (DEGs) associated with psoriasis in the Gene Expression Omnibus (GEO) database were identified, and their functional roles and interactions were then annotated and evaluated through GO, KEGG, and gene set variation (GSVA) analyses. In addition, the STRING database was leveraged to construct a protein-protein interaction (PPI) network, and key hub genes from this network were validated as being relevant through receiver operating characteristic (ROC) curve analyses of three additional GEO datasets. The CIBERSORT database was additionally used to assess the relationship between these gene expression-related findings and immune cell infiltration. Results: In total 197 psoriasis-related DEGs were identified and found to primarily be associated with the NOD-like receptor, IL-17, and cytokine-cytokine receptor interaction signaling pathways. GSVA revealed significant differences between normal and lesional groups (P < 0.05), while PPI network analyses identified CXCL10 as the hub gene with the highest degree value, whereas IRF7, IFIT3, OAS1, GBP1, and ISG15 were promising candidate genes for the therapeutic treatment of psoriasis. ROC analyses confirmed that these 6 hub genes exhibited good diagnostic efficacy (AUC > 70%), and were predicted to be associated with increased sensitivity to 10 drugs (P < 0.01). The CIBERSORT database further predicted that these hub genes were associated with infiltration by 22 different immune cell types. Conclusion: These results offer a robust foundation for future studies of the molecular basis for psoriasis, potentially guiding efforts to treat this common and disruptive disease.

Download Full-text

Machine Learning Models That Integrate Tumor Texture and Perfusion Characteristics Using Low-Dose Breast Computed Tomography Are Promising for Predicting Histological Biomarkers and Treatment Failure in Breast Cancer Patients

Cancers ◽

10.3390/cancers13236013 ◽

2021 ◽

Vol 13 (23) ◽

pp. 6013

Author(s):

Hyun-Soo Park ◽

Kwang-sig Lee ◽

Bo-Kyoung Seo ◽

Eun-Sil Kim ◽

Kyu-Ran Cho ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Cancer Patients ◽

Low Dose ◽

Random Forest Model ◽

Breast Cancer Patients ◽

Learning Models ◽

Forest Model ◽

Machine Learning Models

This prospective study enrolled 147 women with invasive breast cancer who underwent low-dose breast CT (80 kVp, 25 mAs, 1.01–1.38 mSv) before treatment. From each tumor, we extracted eight perfusion parameters using the maximum slope algorithm and 36 texture parameters using the filtered histogram technique. Relationships between CT parameters and histological factors were analyzed using five machine learning algorithms. Performance was compared using the area under the receiver-operating characteristic curve (AUC) with the DeLong test. The AUCs of the machine learning models increased when using both features instead of the perfusion or texture features alone. The random forest model that integrated texture and perfusion features was the best model for prediction (AUC = 0.76). In the integrated random forest model, the AUCs for predicting human epidermal growth factor receptor 2 positivity, estrogen receptor positivity, progesterone receptor positivity, ki67 positivity, high tumor grade, and molecular subtype were 0.86, 0.76, 0.69, 0.65, 0.75, and 0.79, respectively. Entropy of pre- and postcontrast images and perfusion, time to peak, and peak enhancement intensity of hot spots are the five most important CT parameters for prediction. In conclusion, machine learning using texture and perfusion characteristics of breast cancer with low-dose CT has potential value for predicting prognostic factors and risk stratification in breast cancer patients.

Download Full-text

Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2021443 ◽

2021 ◽

Vol 18 (6) ◽

pp. 8997-9015

Author(s):

Ahmed Hammad ◽

◽

Mohamed Elshaer ◽

Xiuwen Tang ◽

◽

...

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Machine Learning ◽

Roc Curve ◽

Functional Enrichment ◽

Differentially Expressed ◽

Ppi Network ◽

Hub Genes ◽

Survival Analyses ◽

Potential Biomarkers

<abstract> <p>Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.</p> </abstract>

Download Full-text

Machine Learning Approaches to Radiogenomics of Breast Cancer using Low-Dose Perfusion Computed Tomography: Predicting Prognostic Biomarkers and Molecular Subtypes

Scientific Reports ◽

10.1038/s41598-019-54371-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Eun Kyung Park ◽

Kwang-sig Lee ◽

Bo Kyoung Seo ◽

Kyu Ran Cho ◽

Ok Hee Woo ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Invasive Breast Cancer ◽

Low Dose ◽

Molecular Subtypes ◽

Random Forest Model ◽

Prognostic Biomarkers ◽

Learning Approaches ◽

Forest Model

AbstractRadiogenomics investigates the relationship between imaging phenotypes and genetic expression. Breast cancer is a heterogeneous disease that manifests complex genetic changes and various prognosis and treatment response. We investigate the value of machine learning approaches to radiogenomics using low-dose perfusion computed tomography (CT) to predict prognostic biomarkers and molecular subtypes of invasive breast cancer. This prospective study enrolled a total of 723 cases involving 241 patients with invasive breast cancer. The 18 CT parameters of cancers were analyzed using 5 machine learning models to predict lymph node status, tumor grade, tumor size, hormone receptors, HER2, Ki67, and the molecular subtypes. The random forest model was the best model in terms of accuracy and the area under the receiver-operating characteristic curve (AUC). On average, the random forest model had 13% higher accuracy and 0.17 higher AUC than the logistic regression. The most important CT parameters in the random forest model for prediction were peak enhancement intensity (Hounsfield units), time to peak (seconds), blood volume permeability (mL/100 g), and perfusion of tumor (mL/min per 100 mL). Machine learning approaches to radiogenomics using low-dose perfusion breast CT is a useful noninvasive tool for predicting prognostic biomarkers and molecular subtypes of invasive breast cancer.

Download Full-text

Characterization of Molecular Cluster Detection and Evaluation of Cluster Investigation Criteria Using Machine Learning Methods and Statewide Surveillance Data in Washington State

Viruses ◽

10.3390/v12020142 ◽

2020 ◽

Vol 12 (2) ◽

pp. 142 ◽

Cited By ~ 1

Author(s):

Steven J. Erly ◽

Joshua T. Herbeck ◽

Roxanne P. Kerani ◽

Jennifer R. Reuer

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hiv Transmission ◽

White Male ◽

Washington State ◽

Cluster Detection ◽

Random Forest Model ◽

Molecular Cluster ◽

Forest Model ◽

Sex With Men

Molecular cluster detection can be used to interrupt HIV transmission but is dependent on identifying clusters where transmission is likely. We characterized molecular cluster detection in Washington State, evaluated the current cluster investigation criteria, and developed a criterion using machine learning. The population living with HIV (PLWH) in Washington State, those with an analyzable genotype sequences, and those in clusters were described across demographic characteristics from 2015 to2018. The relationship between 3- and 12-month cluster growth and demographic, clinical, and temporal predictors were described, and a random forest model was fit using data from 2016 to 2017. The ability of this model to identify clusters with future transmission was compared to Centers for Disease Control and Prevention (CDC) and the Washington state criteria in 2018. The population with a genotype was similar to all PLWH, but people in a cluster were disproportionately white, male, and men who have sex with men. The clusters selected for investigation by the random forest model grew on average 2.3 cases (95% CI 1.1–1.4) in 3 months, which was not significantly larger than the CDC criteria (2.0 cases, 95% CI 0.5–3.4). Disparities in the cases analyzed suggest that molecular cluster detection may not benefit all populations. Jurisdictions should use auxiliary data sources for prediction or continue using established investigation criteria.

Download Full-text

Accurate prediction of birth implementing a statistical model through the determination of steroid hormones in saliva

Scientific Reports ◽

10.1038/s41598-021-84924-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Silvia Alonso ◽

Sara Cáceres ◽

Daniel Vélez ◽

Luis Sanz ◽

Gema Silvan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Forest Model ◽

Spontaneous Labour ◽

Hormonal Mechanism ◽

First Time ◽

Estrone Sulphate

AbstractSteroidal hormone interaction in pregnancy is crucial for adequate fetal evolution and preparation for childbirth and extrauterine life. Estrone sulphate, estriol, progesterone and cortisol play important roles in the initiation of labour mechanism at the start of contractions and cervical effacement. However, their interaction remains uncertain. Although several studies regarding the hormonal mechanism of labour have been reported, the prediction of date of birth remains a challenge. In this study, we present for the first time machine learning algorithms for the prediction of whether spontaneous labour will occur from week 37 onwards. Estrone sulphate, estriol, progesterone and cortisol were analysed in saliva samples collected from 106 pregnant women since week 34 by enzyme-immunoassay (EIA) techniques. We compared a random forest model with a traditional logistic regression over a dataset constructed with the values observed of these measures. We observed that the results, evaluated in terms of accuracy and area under the curve (AUC) metrics, are sensibly better in the random forest model. For this reason, we consider that machine learning methods contribute in an important way to the obstetric practice.

Download Full-text

Utilizing physics-based input features within a machine learning model to predict wind speed forecasting error

Wind Energy Science ◽

10.5194/wes-6-295-2021 ◽

2021 ◽

Vol 6 (1) ◽

pp. 295-309

Author(s):

Daniel Vassallo ◽

Raghavendra Krishnamurthy ◽

Harindra J. S. Fernando

Keyword(s):

Machine Learning ◽

Wind Speed ◽

Random Forest ◽

Hybrid Model ◽

Arima Model ◽

Time Of Day ◽

Random Forest Model ◽

Forest Model ◽

Forecasting Error ◽

Atmospheric Variables

Abstract. Machine learning is quickly becoming a commonly used technique for wind speed and power forecasting. Many machine learning methods utilize exogenous variables as input features, but there remains the question of which atmospheric variables are most beneficial for forecasting, especially in handling non-linearities that lead to forecasting error. This question is addressed via creation of a hybrid model that utilizes an autoregressive integrated moving-average (ARIMA) model to make an initial wind speed forecast followed by a random forest model that attempts to predict the ARIMA forecasting error using knowledge of exogenous atmospheric variables. Variables conveying information about atmospheric stability and turbulence as well as inertial forcing are found to be useful in dealing with non-linear error prediction. Streamwise wind speed, time of day, turbulence intensity, turbulent heat flux, vertical velocity, and wind direction are found to be particularly useful when used in unison for hourly and 3 h timescales. The prediction accuracy of the developed ARIMA–random forest hybrid model is compared to that of the persistence and bias-corrected ARIMA models. The ARIMA–random forest model is shown to improve upon the latter commonly employed modeling methods, reducing hourly forecasting error by up to 5 % below that of the bias-corrected ARIMA model and achieving an R2 value of 0.84 with true wind speed.

Download Full-text

Prediction Of Plastic Degrading Microbes

10.1101/2021.08.01.454681 ◽

2021 ◽

Author(s):

Hemalatha N ◽

Akhil Wilson ◽

Akhil Thankachan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Random Forest Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Plastic Pollution ◽

Forest Model

Plastic pollution is one of the challenging problems in the environment. But a life without plastic we cannot imagine. This paper deals with the prediction of plastic degrading microbes using Machine Learning. Here we have used Decision Tree, Random Forest, Support vector Machine and K Nearest Neighbor algorithms in order to predict the plastic degrading microbes. Among the four classifiers, Random Forest model gave the best accuracy of 99.1%.

Download Full-text