scholarly journals Predication of oxygen requirement in COVID-19 patients using dynamic change of inflammatory markers: CRP, hypertension, age, neutrophil and lymphocyte (CHANeL)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Eunyoung Emily Lee ◽  
Woochang Hwang ◽  
Kyoung-Ho Song ◽  
Jongtak Jung ◽  
Chang Kyung Kang ◽  
...  

AbstractThe objective of the study was to develop and validate a prediction model that identifies COVID-19 patients at risk of requiring oxygen support based on five parameters: C-reactive protein (CRP), hypertension, age, and neutrophil and lymphocyte counts (CHANeL). This retrospective cohort study included 221 consecutive COVID-19 patients and the patients were randomly assigned randomly to a training set and a test set in a ratio of 1:1. Logistic regression, logistic LASSO regression, Random Forest, Support Vector Machine, and XGBoost analyses were performed based on age, hypertension status, serial CRP, and neutrophil and lymphocyte counts during the first 3 days of hospitalization. The ability of the model to predict oxygen requirement during hospitalization was tested. During hospitalization, 45 (41.8%) patients in the training set (n = 110) and 41 (36.9%) in the test set (n = 111) required supplementary oxygen support. The logistic LASSO regression model exhibited the highest AUC for the test set, with a sensitivity of 0.927 and a specificity of 0.814. An online risk calculator for oxygen requirement using CHANeL predictors was developed. “CHANeL” prediction models based on serial CRP, neutrophil, and lymphocyte counts during the first 3 days of hospitalization, along with age and hypertension status, provide a reliable estimate of the risk of supplement oxygen requirement among patients hospitalized with COVID-19.

2011 ◽  
Vol 460-461 ◽  
pp. 667-672
Author(s):  
Yun Zhao ◽  
Xing Xu ◽  
Yong He

The main objective of this paper is to classify four kinds of automobile lubricant by near-infrared (NIR) spectral technology and to observe whether NIR spectroscopy could be used for predicting water content. Principle component analysis (PCA) was applied to reduce the information from the spectral data and first two PCs were used to cluster the samples. Partial least square (PLS), least square support vector machine (LS-SVM), and Gaussian processes classification (GPC) were employed to develop prediction models. There were 120 samples for training set and test set. Two LS-SVM models with first five PCs and first six PCs were built, respectively, and accuracy of the model with five PCs is adequate with less calculation. The results from the experiment indicate that the LS-SVM model outperforms the PLS model and GPC model outperforms the LS-SVM model.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yuqi Wang ◽  
Liangxu Wang ◽  
Yanli Sun ◽  
Miao Wu ◽  
Yingjie Ma ◽  
...  

Abstract Background Osteoporosis is a gradually recognized health problem with risks related to disease history and living habits. This study aims to establish the optimal prediction model by comparing the performance of four prediction models that incorporated disease history and living habits in predicting the risk of Osteoporosis in Chongqing adults. Methods We conduct a cross-sectional survey with convenience sampling in this study. We use a questionnaire From January 2019 to December 2019 to collect data on disease history and adults’ living habits who got dual-energy X-ray absorptiometry. We established the prediction models of osteoporosis in three steps. Firstly, we performed feature selection to identify risk factors related to osteoporosis. Secondly, the qualified participants were randomly divided into a training set and a test set in the ratio of 7:3. Then the prediction models of osteoporosis were established based on Artificial Neural Network (ANN), Deep Belief Network (DBN), Support Vector Machine (SVM) and combinatorial heuristic method (Genetic Algorithm - Decision Tree (GA-DT)). Finally, we compared the prediction models’ performance through accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC) to select the optimal prediction model. Results The univariate logistic model found that taking calcium tablet (odds ratio [OR] = 0.431), SBP (OR = 1.010), fracture (OR = 1.796), coronary heart disease (OR = 4.299), drinking alcohol (OR = 1.835), physical exercise (OR = 0.747) and other factors were related to the risk of osteoporosis. The AUCs of the training set and test set of the prediction models based on ANN, DBN, SVM and GA-DT were 0.901, 0.762; 0.622, 0.618; 0.698, 0.627; 0.744, 0.724, respectively. After evaluating four prediction models’ performance, we selected a three-layer back propagation neural network (BPNN) with 18, 4, and 1 neuron in the input layer, hidden and output layers respectively, as the optimal prediction model. When the probability was greater than 0.330, osteoporosis would occur. Conclusions Compared with DBN, SVM and GA-DT, the established ANN model had the best prediction ability and can be used to predict the risk of osteoporosis in physical examination of the Chongqing population. The model needs to be further improved through large sample research.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 2032-2032
Author(s):  
Protiva Rahman ◽  
Michele LeNoue-Newton ◽  
Sandip Chaugai ◽  
Marilyn Holt ◽  
Neha M Jain ◽  
...  

2032 Background: 30-50% of patients with non-early NSCLC will eventually develop BM, with a median survival of less than one year from BM diagnosis. There are no widely accepted clinical risk models for development of BM in patients without them at baseline. We predicted the binary risk of BM using clinical and genetic factors from a large multi-institutional cohort. Methods: Stage II-IV NSCLC patients from the AACR Project GENIE Biopharma Consortium dataset were eligible. This consisted of 4 academic institutions who curated clinical data of patients who had somatic next-generation tumor sequencing (NGS) between 2015-2017. We excluded patients who had BM at baseline, died within 30 days of NSCLC diagnosis, or did not undergo brain imaging. Covariates included demographics, anticancer therapies (received up to 90 days prior to BM development and within 5 years from NSCLC diagnosis), and NGS data; radiotherapy (RT) data were not available. NGS features included mutations and copy number alterations. These features were restricted to those classified as oncogenic by OncoKB. Univariate feature selection with Fisher’s test (p<.1) was performed on medication and genetic features. We compared 5 different machine learning models for prediction: random forest (RF), support vector machine (SVM), lasso regression, ridge regression, and an ensemble classifier. We split our data into training and test sets. 10-fold cross-validation was done on the training set for parameter tuning. The area under the receiver-operating curve (AUC) is reported on the test set. Results: 956 patients were included, 192 (20%) in the test set. Univariate features associated with BM were treatment with etoposide, Asian race, presence of bone metastases at NSCLC diagnosis, mutations in TP53 and EGFR, amplifications of ERBB2 and EGFR, and deletions of RB1, CDKN2A and CDKN2B. Univariate features inversely associated with BM were older age, treatment with nivolumab, vinorelbine, alectinib, pembrolizumab, atezolizumab, and gemcitabine, as well as mutations in NOTCH1 and KRAS. Ridge regression had the best AUC, 0.73 (Table). Conclusions: We achieved reasonable prediction performance using commonly obtained clinical and genomic information in non-early NSCLC. The biologic role of the associated alterations deserves further scrutiny; this study replicates similar findings for EGFR and KRAS in a much smaller cohort. Certain subsets of NSCLC patients may benefit from increased surveillance for BM and transition to drug therapies known to effectively cross the blood-brain barrier, e.g., nivolumab and alectinib. Inclusion of additional covariates, e.g., brain RT, may further improve model performance.[Table: see text]


2020 ◽  
pp. 410-423
Author(s):  
Prabhu RV Shankar ◽  
Anupama Kesari ◽  
Priya Shalini ◽  
N. Kamalashree ◽  
Charan Bharadwaj ◽  
...  

As part of a data mining competition, a training and test set of laboratory test data about patients with and without surgical site infection (SSI) were provided. The task was to develop predictive models with training set and identify patients with SSI in the no label test set. Lab test results are vital resources that guide healthcare providers make decisions about all aspects of surgical patient management. Many machine learning models were developed after pre-processing and imputing the lab tests data and only the top performing methods are discussed. Overall, RANDOM FOREST algorithms performed better than Support Vector Machine and Logistic Regression. Using a set of 74 lab tests, with RF, there were only 4 false positives in the training set and predicted 35 out of 50 SSI patients in the test set (Accuracy 0.86, Sensitivity 0.68, and Specificity 0.91). Optimal ways to address healthcare data quality concerns and imputation methods as well as newer generalizable algorithms need to be explored further to decipher new associations and knowledge among laboratory biomarkers and SSI.


SPE Journal ◽  
2018 ◽  
Vol 23 (04) ◽  
pp. 1075-1089 ◽  
Author(s):  
Jared Schuetter ◽  
Srikanta Mishra ◽  
Ming Zhong ◽  
Randy LaFollette (ret.)

Summary Considerable amounts of data are being generated during the development and operation of unconventional reservoirs. Statistical methods that can provide data-driven insights into production performance are gaining in popularity. Unfortunately, the application of advanced statistical algorithms remains somewhat of a mystery to petroleum engineers and geoscientists. The objective of this paper is to provide some clarity to this issue, focusing on how to build robust predictive models and how to develop decision rules that help identify factors separating good wells from poor performers. The data for this study come from wells completed in the Wolfcamp Shale Formation in the Permian Basin. Data categories used in the study included well location and assorted metrics capturing various aspects of well architecture, well completion, stimulation, and production. Predictive models for the production metric of interest are built using simple regression and other advanced methods such as random forests (RFs), support-vector regression (SVR), gradient-boosting machine (GBM), and multidimensional Kriging. The data-fitting process involves splitting the data into a training set and a test set, building a regression model on the training set and validating it with the test set. Repeated application of a “cross-validation” procedure yields valuable information regarding the robustness of each regression-modeling approach. Furthermore, decision rules that can identify extreme behavior in production wells (i.e., top x% of the wells vs. bottom x%, as ranked by the production metric) are generated using the classification and regression-tree algorithm. The resulting decision tree (DT) provides useful insights regarding what variables (or combinations of variables) can drive production performance into such extreme categories. The main contributions of this paper are to provide guidelines on how to build robust predictive models, and to demonstrate the utility of DTs for identifying factors responsible for good vs. poor wells.


2007 ◽  
Vol 06 (03) ◽  
pp. 495-509 ◽  
Author(s):  
GUI-KAI YAN ◽  
JUN-JIE LI ◽  
BING-RUI LI ◽  
JIA HU ◽  
WEN-PING GUO

Support vector machine (SVM) is used to predict the enthalpies of formation at 298 K [Formula: see text] for 261 molecules based on B3LYP/6-311g (3df,2p) results. With data randomly separated into two parts: 195 for training set and 66 for test set, the resulting mean absolute deviation (MAD) and maximum deviation (MD) for training set are 1.51 kcal/mol and 9.23 kcal/mol (correlation coefficient R = 0.9995), and for test set they become to 1.78 kcal/mol and 7.31 kcal/mol (R = 0.9990). The result is improved according to G2 method.


Author(s):  
Ade Nurhopipah ◽  
Uswatun Hasanah

The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set.


Author(s):  
Botao Jiang ◽  
Fuyu Zhao

Critical heat flux (CHF) is one of the most crucial design criteria in other boiling systems such as evaporator, steam generators, fuel cooling system, boiler, etc. This paper presents an alternative CHF prediction method named projection support vector regression (PSVR), which is a combination of feature vector selection (FVS) method and support vector regression (SVR). In PSVR, the FVS method is first used to select a relevant subset (feature vectors, FVs) from the training data, and then both the training data and the test data are projected into the subspace constructed by FVs, and finally SVR is applied to estimate the projected data. An available CHF dataset taken from the literature is used in this paper. The CHF data are split into two subsets, the training set and the test set. The training set is used to train the PSVR model and the test set is then used to evaluate the trained model. The predicted results of PSVR are compared with those of artificial neural networks (ANNs). The parametric trends of CHF are also investigated using the PSVR model. It is found that the results of the proposed method not only fit the general understanding, but also agree well with the experimental data. Thus, PSVR can be used successfully for prediction of CHF in contrast to ANNs.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Qing Ning ◽  
Dali Wang ◽  
Fei Cheng ◽  
Yuheng Zhong ◽  
Qi Ding ◽  
...  

Abstract Background Mutations in an enzyme target are one of the most common mechanisms whereby antibiotic resistance arises. Identification of the resistance mutations in bacteria is essential for understanding the structural basis of antibiotic resistance and design of new drugs. However, the traditionally used experimental approaches to identify resistance mutations were usually labor-intensive and costly. Results We present a machine learning (ML)-based classifier for predicting rifampicin (Rif) resistance mutations in bacterial RNA Polymerase subunit β (RpoB). A total of 186 mutations were gathered from the literature for developing the classifier, using 80% of the data as the training set and the rest as the test set. The features of the mutated RpoB and their binding energies with Rif were calculated through computational methods, and used as the mutation attributes for modeling. Classifiers based on five ML algorithms, i.e. decision tree, k nearest neighbors, naïve Bayes, probabilistic neural network and support vector machine, were first built, and a majority consensus (MC) approach was then used to obtain a new classifier based on the classifications of the five individual ML algorithms. The MC classifier comprehensively improved the predictive performance, with accuracy, F-measure and AUC of 0.78, 0.83 and 0.81for training set whilst 0.84, 0.87 and 0.83 for test set, respectively. Conclusion The MC classifier provides an alternative methodology for rapid identification of resistance mutations in bacteria, which may help with early detection of antibiotic resistance and new drug discovery.


2020 ◽  
Author(s):  
Dongyan Ding ◽  
Tingyuan Lang ◽  
Dongling Zou ◽  
Jiawei Tan ◽  
Jia Chen ◽  
...  

Abstract Backgroud: Accurately forecasting the prognosis could improve therapeutic management of cancer patients, however, the currently used clinical features are difficult to provide enought information. The purpose of this study is to develop a survival prediction model for cervical cancer patients with big data and machine learning algorithms. Results: The cancer genome atlas cervical cancer data, including the expression of 1046 microRNAs and the clinical information of 309 cervical and endocervical cancer and 3 control samples, were downloaded. Missing values and outliers imputation, samples normalization, log transformation and features scaling were performed for preprocessing and 3 control, 2 metastatic samples and 707 microRNAs with missing values ≥ 20% were excluded. By Cox Proportional-Hazards analysis, 55 prognosis-related microRNAs (20 positively and 35 negatively correlated with survival) were identified. K-means clustering analysis showed that the cervical cancer samples can be separated into two and three subgroups with top 20 identified survival-related microRNAs for best stratification. By Support Vector Machine algorithm, two prediction models were developed which can segment the patients into two and three groups with different survival rate, respectively. The models exhibite high performance : for two classes, Area under the curve = 0.976 (training set), 0.972 (test set), 0.974 (whole data set); for three classes, AUC = 0.983, 0.996 and 0.991 (group1, 2 and 3 in training set), 0.955, 0.989 and 0.991 (group 1, 2 and 3 in test set), 0.974, 0.993 and 0.991 (group 1, 2 and 3 in whole data set) .Conclusion: The survival prediction models for cervical cancer were developed. The patients with very low survival rate (≤ 40%) can be separated by the three classes prediction model first. The rest patients can be identified by the two classes prediction model as high survival rate (≈ 75%) and low survival rate (≈ 50%).


Sign in / Sign up

Export Citation Format

Share Document