A machine learning based prediction model of anti-PD-1 therapy response using noninvasive clinical information and blood markers of lung cancer patients.

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e14138-e14138
Author(s):  
Beung-Chul AHN ◽  
Kyoung Ho Pyo ◽  
Dongmin Jung ◽  
Chun-Feng Xin ◽  
Chang Gon Kim ◽  
...  

e14138 Background: Immune checkpoint inhibitors have become breakthrough therapy for various types of cancers. However, regarding their total response rate around 20% based on clinical trials, predicting accurate aPD-1 response for individual patient is unestablished. The presence of PD-L1 expression or tumor infiltrating lymphocyte may be used as indicators of response but are limited. We developed models using machine learning methods to predict the aPD-1 response. Methods: A total of 126 advanced NSCLC patients treated with the aPD-1 were enrolled. Their clinical characteristics, treatment outcomes, and adverse events were collected. Total clinical data (n = 126) consist of 15 variables were divided into two subsets, discovery set (n = 63) and test set (n = 63). Thirteen supervised learning algorithms including support vector machine and regularized regression (lasso, ridge, elastic net) were applied on discovery set for model development and on test set for validation. Each model were evaluated according to the ROC curve and cross-validation method. Same methods were used to the subset which had additional flow cytometry data (n = 40). Results: The median age was 64 and 69.8% were male. Adenocarcinoma was predominant (69.8%) and twenty patients (15.1%) were driver mutation positive. Clinical data set (n = 126) demonstrated that the Ridge regression (AUC: 0.79) was the best model for prediction. Of 15 clinical variables, tumor burden, age, ECOG PS and PD-L1, were most important based on the random forest algorithm. When we merged the clinical and flow cytometry data, the Ridge regression model (AUC:0.82) showed better performance compared to using clinical data only. Among 52 variables of merged set, the top most important immune markers were as follows: CD3+CD8+CD25+/Teff-CD28, CD3+CD8+CD25-/Teff-Ki-67, and CD3+CD8+CD25+/Teff-NY-ESO/Teff-PD-1, which indicate activated tumor specific T cell subset. Conclusions: Our machine learning based model has benefit for predicting aPD-1 responses. After further validation in independent patient cohort, the supervised learning based non-invasive predictive score can be established to predict aPD-1 response.

A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


Author(s):  
Guo-Zheng Li

This chapter introduces great challenges and the novel machine learning techniques employed in clinical data processing. It argues that the novel machine learning techniques including support vector machines, ensemble learning, feature selection, feature reuse by using multi-task learning, and multi-label learning provide potentially more substantive solutions for decision support and clinical data analysis. The authors demonstrate the generalization performance of the novel machine learning techniques on real world data sets including one data set of brain glioma, one data set of coronary heart disease in Chinese Medicine and some tumor data sets of microarray. More and more machine learning techniques will be developed to improve analysis precision of clinical data sets.


2012 ◽  
pp. 875-897
Author(s):  
Guo-Zheng Li

This chapter introduces great challenges and the novel machine learning techniques employed in clinical data processing. It argues that the novel machine learning techniques including support vector machines, ensemble learning, feature selection, feature reuse by using multi-task learning, and multi-label learning provide potentially more substantive solutions for decision support and clinical data analysis. The authors demonstrate the generalization performance of the novel machine learning techniques on real world data sets including one data set of brain glioma, one data set of coronary heart disease in Chinese Medicine and some tumor data sets of microarray. More and more machine learning techniques will be developed to improve analysis precision of clinical data sets.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2020 ◽  
Vol 16 (8) ◽  
pp. 1088-1105
Author(s):  
Nafiseh Vahedi ◽  
Majid Mohammadhosseini ◽  
Mehdi Nekoei

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ruolan Zeng ◽  
Jiyong Deng ◽  
Limin Dang ◽  
Xinliang Yu

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2017 ◽  
Vol 71 (2) ◽  
pp. 174-179 ◽  
Author(s):  
Gregory David Scott ◽  
Susan K Atwater ◽  
Dita A Gratzinger

AimsTo create clinically relevant normative flow cytometry data for understudied benign lymph nodes and characterise outliers.MethodsClinical, histological and flow cytometry data were collected and distributions summarised for 380 benign lymph node excisional biopsies. Outliers for kappa:lambda light chain ratio, CD10:CD19 coexpression, CD5:CD19 coexpression, CD4:CD8 ratios and CD7 loss were summarised for histological pattern, concomitant diseases and follow-up course.ResultsWe generated the largest data set of benign lymph node immunophenotypes by an order of magnitude. B and T cell antigen outliers often had background immunosuppression or inflammatory disease but did not subsequently develop lymphoma.ConclusionsDiagnostic immunophenotyping data from benign lymph nodes provide normative ranges for clinical use. Outliers raising suspicion for B or T cell lymphoma are not infrequent (26% of benign lymph nodes). Caution is indicated when interpreting outliers in the absence of excisional biopsy or clinical history, particularly in patients with concomitant immunosuppression or inflammatory disease.


Author(s):  
Noor Asyikin Sulaiman ◽  
Md Pauzi Abdullah ◽  
Hayati Abdullah ◽  
Muhammad Noorazlan Shah Zainudin ◽  
Azdiana Md Yusop

Air conditioning system is a complex system and consumes the most energy in a building. Any fault in the system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to energy wastage and reduction in the system’s coefficient of performance (COP). Due to the complexity of the air conditioning system, detecting those faults is hard as it requires exhaustive inspections. This paper consists of two parts; i) to investigate the impact of different faults related to the air conditioning system on COP and ii) to analyse the performances of machine learning algorithms to classify those faults. Three supervised learning classifier models were developed, which were deep learning, support vector machine (SVM) and multi-layer perceptron (MLP). The performances of each classifier were investigated in terms of six different classes of faults. Results showed that different faults give different negative impacts on the COP. Also, the three supervised learning classifier models able to classify all faults for more than 94%, and MLP produced the highest accuracy and precision among all.


Sign in / Sign up

Export Citation Format

Share Document