scholarly journals Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zia U. Ahmed ◽  
Kang Sun ◽  
Michael Shelly ◽  
Lina Mu

AbstractMachine learning (ML) has demonstrated promise in predicting mortality; however, understanding spatial variation in risk factor contributions to mortality rate requires explainability. We applied explainable artificial intelligence (XAI) on a stack-ensemble machine learning model framework to explore and visualize the spatial distribution of the contributions of known risk factors to lung and bronchus cancer (LBC) mortality rates in the conterminous United States. We used five base-learners—generalized linear model (GLM), random forest (RF), Gradient boosting machine (GBM), extreme Gradient boosting machine (XGBoost), and Deep Neural Network (DNN) for developing stack-ensemble models. Then we applied several model-agnostic approaches to interpret and visualize the stack ensemble model's output in global and local scales (at the county level). The stack ensemble generally performs better than all the base learners and three spatial regression models. A permutation-based feature importance technique ranked smoking prevalence as the most important predictor, followed by poverty and elevation. However, the impact of these risk factors on LBC mortality rates varies spatially. This is the first study to use ensemble machine learning with explainable algorithms to explore and visualize the spatial heterogeneity of the relationships between LBC mortality and risk factors in the contiguous USA.

2021 ◽  
Vol 5 (4) ◽  
pp. 55
Author(s):  
Anastasiia Kolevatova ◽  
Michael A. Riegler ◽  
Francesco Cherubini ◽  
Xiangping Hu ◽  
Hugo L. Hammer

A general issue in climate science is the handling of big data and running complex and computationally heavy simulations. In this paper, we explore the potential of using machine learning (ML) to spare computational time and optimize data usage. The paper analyzes the effects of changes in land cover (LC), such as deforestation or urbanization, on local climate. Along with green house gas emission, LC changes are known to be important causes of climate change. ML methods were trained to learn the relation between LC changes and temperature changes. The results showed that random forest (RF) outperformed other ML methods, and especially linear regression models representing current practice in the literature. Explainable artificial intelligence (XAI) was further used to interpret the RF method and analyze the impact of different LC changes on temperature. The results mainly agree with the climate science literature, but also reveal new and interesting findings, demonstrating that ML methods in combination with XAI can be useful in analyzing the climate effects of LC changes. All parts of the analysis pipeline are explained including data pre-processing, feature extraction, ML training, performance evaluation, and XAI.


2020 ◽  
Author(s):  
Carlos Pedro Gonçalves ◽  
José Rouco

AbstractWe compare the performance of major decision tree-based ensemble machine learning models on the task of COVID-19 death probability prediction, conditional on three risk factors: age group, sex and underlying comorbidity or disease, using the US Centers for Disease Control and Prevention (CDC)’s COVID-19 case surveillance dataset. To evaluate the impact of the three risk factors on COVID-19 death probability, we extract and analyze the conditional probability profile produced by the best performer. The results show the presence of an exponential rise in death probability from COVID-19 with the age group, with males exhibiting a higher exponential growth rate than females, an effect that is stronger when an underlying comorbidity or disease is present, which also acts as an accelerator of COVID-19 death probability rise for both male and female subjects. The results are discussed in connection to healthcare and epidemiological concerns and in the degree to which they reinforce findings coming from other studies on COVID-19.


2021 ◽  
Vol 15 (1) ◽  
pp. 112-120
Author(s):  
Umar Farooq

In the current era, SQL Injection Attack is a serious threat to the security of the ongoing cyber world particularly for many web applications that reside over the internet. Many webpages accept the sensitive information (e.g. username, passwords, bank details, etc.) from the users and store this information in the database that also resides over the internet. Despite the fact that this online database has much importance for remotely accessing the information by various business purposes but attackers can gain unrestricted access to these online databases or bypass authentication procedures with the help of SQL Injection Attack. This attack results in great damage and variation to database and has been ranked as the topmost security risk by OWASP TOP 10. Considering the trouble of distinguishing unknown attacks by the current principle coordinating technique, a strategy for SQL injection detection dependent on Machine Learning is proposed. Our motive is to detect this attack by splitting the queries into their corresponding tokens with the help of tokenization and then applying our algorithms over the tokenized dataset. We used four Ensemble Machine Learning algorithms: Gradient Boosting Machine (GBM), Adaptive Boosting (AdaBoost), Extended Gradient Boosting Machine (XGBM), and Light Gradient Boosting Machine (LGBM). The results yielded by our models are near to perfection with error rate being almost negligible. The best results are yielded by LGBM with an accuracy of 0.993371, and precision, recall, f1 as 0.993373, 0.993371, and 0.993370, respectively. The LGBM also yielded less error rate with False Positive Rate (FPR) and Root Mean Squared Error (RMSE) to be 0.120761 and 0.007, respectively. The worst results are yielded by AdaBoost with an accuracy of 0.991098, and precision, recall, f1 as 0.990733, 0.989175, and 0.989942, respectively. The AdaBoost also yielded high False Positive Rate (FPR) to be 0.009.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2021 ◽  
Vol 10 (1) ◽  
pp. 42
Author(s):  
Kieu Anh Nguyen ◽  
Walter Chen ◽  
Bor-Shiun Lin ◽  
Uma Seeboonruang

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.


2020 ◽  
Vol 6 ◽  
pp. 205520762096835
Author(s):  
C Blease ◽  
C Locher ◽  
M Leon-Carlyle ◽  
M Doraiswamy

Background The potential for machine learning to disrupt the medical profession is the subject of ongoing debate within biomedical informatics. Objective This study aimed to explore psychiatrists’ opinions about the potential impact innovations in artificial intelligence and machine learning on psychiatric practice Methods In Spring 2019, we conducted a web-based survey of 791 psychiatrists from 22 countries worldwide. The survey measured opinions about the likelihood future technology would fully replace physicians in performing ten key psychiatric tasks. This study involved qualitative descriptive analysis of written responses (“comments”) to three open-ended questions in the survey. Results Comments were classified into four major categories in relation to the impact of future technology on: (1) patient-psychiatrist interactions; (2) the quality of patient medical care; (3) the profession of psychiatry; and (4) health systems. Overwhelmingly, psychiatrists were skeptical that technology could replace human empathy. Many predicted that ‘man and machine’ would increasingly collaborate in undertaking clinical decisions, with mixed opinions about the benefits and harms of such an arrangement. Participants were optimistic that technology might improve efficiencies and access to care, and reduce costs. Ethical and regulatory considerations received limited attention. Conclusions This study presents timely information on psychiatrists’ views about the scope of artificial intelligence and machine learning on psychiatric practice. Psychiatrists expressed divergent views about the value and impact of future technology with worrying omissions about practice guidelines, and ethical and regulatory issues.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arturo Moncada-Torres ◽  
Marissa C. van Maaren ◽  
Mathijs P. Hendriks ◽  
Sabine Siesling ◽  
Gijs Geleijnse

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.


Author(s):  
Gudipally Chandrashakar

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.


Sign in / Sign up

Export Citation Format

Share Document