scholarly journals Application of Machine Learning Algorithms for Geogenic Radon Potential Mapping in Danyang-Gun, South Korea

2021 ◽  
Vol 9 ◽  
Author(s):  
Fatemeh Rezaie ◽  
Sung Won Kim ◽  
Mohsen Alizadeh ◽  
Mahdi Panahi ◽  
Hyesu Kim ◽  
...  

Continuous generation of radon gas by soil and rocks rich in components of the uranium chain, along with prolonged inhalation of radon progeny in enclosed spaces, can lead to severe respiratory diseases. Detection of radon-prone areas and acquisition of detailed knowledge regarding relationships between indoor radon variations and geogenic factors can facilitate the implementation of more appropriate radon mitigation strategies in high-risk residential zones. In the present study, 10 factors (i.e., lithology; fault density; mean soil calcium oxide [CaO], copper [Cu], lead [Pb], and ferric oxide [Fe2O3] concentrations; elevation; slope; valley depth; and the topographic wetness index [TWI]) were selected to map radon potential areas based on measurements of indoor radon levels in 1,452 dwellings. Mapping was performed using three machine learning methods: long short-term memory (LSTM), extreme learning machine (ELM), and random vector functional link (RVFL). The results were validated in terms of the area under the receiver operating characteristic curve (AUROC), root mean square error (RMSE), and standard deviation (StD). The prediction abilities of all models were satisfactory; however, the ELM model had the best performance, with AUROC, RMSE, and StD values of 0.824, 0.209, and 0.207, respectively. Moreover, approximately 40% of the study area was covered by very high and high-risk radon potential zones that mainly included populated areas in Danyang-gun, South Korea. Therefore, the map can be used to establish more appropriate construction regulations in radon-priority areas, and identify more cost-effective remedial actions for existing buildings, thus reducing indoor radon levels and, by extension, radon exposure-associated effects on human health.

2021 ◽  
Author(s):  
Fang He ◽  
John H Page ◽  
Kerry R Weinberg ◽  
Anirban Mishra

BACKGROUND The current COVID-19 pandemic is unprecedented; under resource-constrained setting, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients, however there are few risk scores derived from a substantially large EHR dataset, using simplified predictors as input. OBJECTIVE To develop and validate simplified machine learning algorithms which predicts COVID-19 adverse outcomes, to evaluate the AUC (area under the receiver operating characteristic curve), sensitivity, specificity and calibration of the algorithms, to derive clinically meaningful thresholds. METHODS We conducted machine learning model development and validation via cohort study using multi-center, patient-level, longitudinal electronic health records (EHR) from Optum® COVID-19 database which provides anonymized, longitudinal EHR from across US. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, ICU admission, respiratory failure, mechanical ventilator usages at inpatient setting. Data from patients who were admitted prior to Sep 7, 2020, is randomly sampled into development, test and validation datasets; data collected from Sep 7, 2020 through Nov 15, 2020 was reserved as prospective validation dataset. RESULTS Of 3.7M patients in the analysis, a total of 585,867 patients were diagnosed or tested positive for SARS-CoV-2; and 50,703 adult patients were hospitalized with COVID-19 between Feb 1 and Nov 15, 2020. Among the study cohort (N=50,703), there were 6,204 deaths, 9,564 ICU admissions, 6,478 mechanically ventilated or EMCO patients and 25,169 patients developed ARDS or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC = 0.89 (0.89 - 0.89) on validation dataset (N=10,752)), consistent prediction through the second wave of pandemic from September to November (AUC = 0.85 (0.85 - 0.86) on post-development validation (N= 14,863)), great clinical relevance and utility. Besides, a comprehensive 386 input covariates from baseline and at admission was included in the analysis; the end-to-end pipeline automates feature selection and model development process, producing 10 key predictors as input such as age, blood urea nitrogen, oxygen saturation, which are both commonly measured and concordant with recognized risk factors for COVID-19. CONCLUSIONS The systematic approach and rigorous validations demonstrate consistent model performance to predict even beyond the time period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated and reliable prediction model based on only ten clinical features as a prognostic tool to stratifying COVID-19 patients into intermediate, high and very high-risk groups. This simple predictive tool could be shared with a wider healthcare community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize healthcare resources. CLINICALTRIAL N/A


2020 ◽  
Vol 9 (8) ◽  
pp. 2603 ◽  
Author(s):  
Dong-Woo Seo ◽  
Hahn Yi ◽  
Beomhee Park ◽  
Youn-Jung Kim ◽  
Dae Ho Jung ◽  
...  

Clinical risk-scoring systems are important for identifying patients with upper gastrointestinal bleeding (UGIB) who are at a high risk of hemodynamic instability. We developed an algorithm that predicts adverse events in patients with initially stable non-variceal UGIB using machine learning (ML). Using prospective observational registry, 1439 out of 3363 consecutive patients were enrolled. Primary outcomes included adverse events such as mortality, hypotension, and rebleeding within 7 days. Four machine learning algorithms, namely, logistic regression with regularization (LR), random forest classifier (RF), gradient boosting classifier (GB), and voting classifier (VC), were compared with the Glasgow–Blatchford score (GBS) and Rockall scores. The RF model showed the highest accuracies and significant improvement over conventional methods for predicting mortality (area under the curve: RF 0.917 vs. GBS 0.710), but the performance of the VC model was best in hypotension (VC 0.757 vs. GBS 0.668) and rebleeding within 7 days (VC 0.733 vs. GBS 0.694). Clinically significant variables including blood urea nitrogen, albumin, hemoglobin, platelet, prothrombin time, age, and lactate were identified by the global feature importance analysis. These results suggest that ML models will be useful early predictive tools for identifying high-risk patients with initially stable non-variceal UGIB admitted at an emergency department.


2020 ◽  
Vol 21 (Supplement_1) ◽  
Author(s):  
D M Adamczak ◽  
M Bednarski ◽  
A Rogala ◽  
M Antoniak ◽  
T Kiebalo ◽  
...  

Abstract BACKGROUND Hypertrophic cardiomyopathy (HCM) is a heart disease characterized by hypertrophy of the left ventricular myocardium. The disease is the most common cause of sudden cardiac death (SCD) in young people and competitive athletes due to fatal ventricular arrhythmias, but in most patients, however, HCM has a benign course. Therefore, it is of the utmost importance to properly evaluate patients and identify those who would benefit from a cardioverter-defibrillator (ICD) implantation. The HCM SCD-Risk Calculator is a useful tool for estimating the 5-year risk of SCD. Parameters included in the model at evaluation are: age, maximum left ventricular wall thickness, left atrial dimension, maximum gradient in left ventricular outflow tract, family history of SCD, non-sustained ventricular tachycardia and unexplained syncope. Patients’ risk of SCD is classified as low (<4%), intermediate (4-<6%) or high (≥6%). Those in the high-risk group should have an ICD implantation. It can also be considered in the intermediate-risk group. However, the calculator still needs improvement and machine learning (ML) has the potential to fulfill this task. ML algorithm creates a model for solving a specific problem without explicit programming - instead it relies only on available data - by discovering patterns and relations. METHODS 252 HCM patients (aged 20-88 years, 49,6% were men) treated in our Department from 2005 to 2018, have been enrolled. The follow-up lasted 0-13 years (average: 3.8 years). SCD was defined as sudden cardiac arrest (SCA) or an appropriate ICD intervention. All parameters from HCM SCD-Risk Calculator have been obtained and the risk of SCD has been calculated for all patients during the first echocardiographic evaluation. ML model with variables from HCM SCD-Risk Calculator has been created. Both methods have been compared. RESULTS 20 patients reached an SCD end-point. 1 patient died due to SCA and 19 had an appropriate ICD intervention. Among them, there were respectively 6, 7 and 7 patients in the low, intermediate and high-risk group of SCD. 1 patient, who died, had a low risk. The ML model correctly assessed the SCD event only in 1 patient. According to ML, the risk of SCD ≤2.07% was a negative predictor. CONCLUSIONS The study did not show an advantage of ML over HCM SCD-Risk Calculator. Because of the characteristic of the dataset (approximately the same number of features and observations), the selection of machine learning algorithms was limited. Best results (evaluated using LOOCV) were achieved with a decision tree. We expect that bigger dataset would allow improving model performance because of strong regularization need in the current setup.


2019 ◽  
Vol 2019 ◽  
pp. 1-17
Author(s):  
Ju-Young Shin ◽  
Yonghun Ro ◽  
Joo-Wan Cha ◽  
Kyu-Rang Kim ◽  
Jong-Chul Ha

Machine learning algorithms should be tested for use in quantitative precipitation estimation models of rain radar data in South Korea because such an application can provide a more accurate estimate of rainfall than the conventional ZR relationship-based model. The applicability of random forest, stochastic gradient boosted model, and extreme learning machine methods to quantitative precipitation estimation models was investigated using case studies with polarization radar data from Gwangdeoksan radar station. Various combinations of input variable sets were tested, and results showed that machine learning algorithms can be applied to build the quantitative precipitation estimation model of the polarization radar data in South Korea. The machine learning-based quantitative precipitation estimation models led to better performances than ZR relationship-based models, particularly for heavy rainfall events. The extreme learning machine is considered the best of the algorithms used based on evaluation criteria.


2020 ◽  
Vol 12 (21) ◽  
pp. 3568
Author(s):  
Shahab S. Band ◽  
Saeid Janizadeh ◽  
Subodh Chandra Pal ◽  
Asish Saha ◽  
Rabin Chakrabortty ◽  
...  

Flash flooding is considered one of the most dynamic natural disasters for which measures need to be taken to minimize economic damages, adverse effects, and consequences by mapping flood susceptibility. Identifying areas prone to flash flooding is a crucial step in flash flood hazard management. In the present study, the Kalvan watershed in Markazi Province, Iran, was chosen to evaluate the flash flood susceptibility modeling. Thus, to detect flash flood-prone zones in this study area, five machine learning (ML) algorithms were tested. These included boosted regression tree (BRT), random forest (RF), parallel random forest (PRF), regularized random forest (RRF), and extremely randomized trees (ERT). Fifteen climatic and geo-environmental variables were used as inputs of the flash flood susceptibility models. The results showed that ERT was the most optimal model with an area under curve (AUC) value of 0.82. The rest of the models’ AUC values, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively. In the ERT model, the areal coverage for very high to moderate flash flood susceptible area was 582.56 km2 (28.33%), and the rest of the portion was associated with very low to low susceptibility zones. It is concluded that topographical and hydrological parameters, e.g., altitude, slope, rainfall, and the river’s distance, were the most effective parameters. The results of this study will play a vital role in the planning and implementation of flood mitigation strategies in the region.


2018 ◽  
Author(s):  
Jaram Park ◽  
Jeong-Whun Kim ◽  
Borim Ryu ◽  
Eunyoung Heo ◽  
Se Young Jung ◽  
...  

BACKGROUND Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension. OBJECTIVE Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level. METHODS We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients’ medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics. RESULTS Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered. CONCLUSIONS We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance.


2015 ◽  
Vol 2 (2) ◽  
pp. e17 ◽  
Author(s):  
Li Guan ◽  
Bibo Hao ◽  
Qijin Cheng ◽  
Paul SF Yip ◽  
Tingshao Zhu

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.


Sign in / Sign up

Export Citation Format

Share Document