Link Adaptation on an Underwater Communications Network Using Machine Learning Algorithms: Boosted Regression Tree Approach

Data Mining is the process of extracting useful information from large sets of data. Data mining enablesthe users to have insights into the data and make useful decisions out of the knowledge mined from databases. The purpose of higher education organizations is to offer superior opportunities to its students. As with data mining, now-a-days Education Data Mining (EDM) also is considered as a powerful tool in the field of education. It portrays an effective method for mining the student’s performance based on various parameters to predict and analyze whether a student (he/she) will be recruited or not in the campus placement. Predictions are made using the machine learning algorithms J48, Naïve Bayes, Random Forest, and Random Tree in weka tool and Multiple Linear Regression, binomial logistic regression, Recursive Partitioning and Regression Tree (rpart), conditional inference tree (ctree) and Neural Network (nnet) algorithms in R studio. The results obtained from each approaches are then compared with respect to their performance and accuracy levels by graphical analysis. Based on the result, higher education organizations can offer superior training to its students.

Download Full-text

Downscaling Satellite Retrieved Soil Moisture Using Regression Tree‐Based Machine Learning Algorithms Over Southwest France

Earth and Space Science ◽

10.1029/2020ea001267 ◽

2020 ◽

Vol 7 (10) ◽

Author(s):

Yangxiaoyue Liu ◽

Xiaolin Xia ◽

Ling Yao ◽

Wenlong Jing ◽

Chenghu Zhou ◽

...

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Learning Algorithms ◽

Regression Tree ◽

Machine Learning Algorithms

Download Full-text

Blood Pressure Estimation Using Photoplethysmography Only: Comparison between Different Machine Learning Approaches

Journal of Healthcare Engineering ◽

10.1155/2018/1548647 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 26

Author(s):

Syed Ghufran Khalid ◽

Jufen Zhang ◽

Fei Chen ◽

Dingchang Zheng

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Cardiovascular Diseases ◽

Learning Algorithms ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Estimation Accuracy ◽

Online Database ◽

Iso Standard ◽

Device Validation

Introduction. Blood pressure (BP) has been a potential risk factor for cardiovascular diseases. BP measurement is one of the most useful parameters for early diagnosis, prevention, and treatment of cardiovascular diseases. At present, BP measurement mainly relies on cuff-based techniques that cause inconvenience and discomfort to users. Although some of the present prototype cuffless BP measurement techniques are able to reach overall acceptable accuracies, they require an electrocardiogram (ECG) and a photoplethysmograph (PPG) that make them unsuitable for true wearable applications. Therefore, developing a single PPG-based cuffless BP estimation algorithm with enough accuracy would be clinically and practically useful. Methods. The University of Queensland vital sign dataset (online database) was accessed to extract raw PPG signals and its corresponding reference BPs (systolic BP and diastolic BP). The online database consisted of PPG waveforms of 32 cases from whom 8133 (good quality) signal segments (5 s for each) were extracted, preprocessed, and normalised in both width and amplitude. Three most significant pulse features (pulse area, pulse rising time, and width 25%) with their corresponding reference BPs were used to train and test three machine learning algorithms (regression tree, multiple linear regression (MLR), and support vector machine (SVM)). A 10-fold cross-validation was applied to obtain overall BP estimation accuracy, separately for the three machine learning algorithms. Their estimation accuracies were further analysed separately for three clinical BP categories (normotensive, hypertensive, and hypotensive). Finally, they were compared with the ISO standard for noninvasive BP device validation (average difference no greater than 5 mmHg and SD no greater than 8 mmHg). Results. In terms of overall estimation accuracy, the regression tree achieved the best overall accuracy for SBP (mean and SD of difference: −0.1 ± 6.5 mmHg) and DBP (mean and SD of difference: −0.6 ± 5.2 mmHg). MLR and SVM achieved the overall mean difference less than 5 mmHg for both SBP and DBP, but their SD of difference was >8 mmHg. Regarding the estimation accuracy in each BP categories, only the regression tree achieved acceptable ISO standard for SBP (−1.1 ± 5.7 mmHg) and DBP (−0.03 ± 5.6 mmHg) in the normotensive category. MLR and SVM did not achieve acceptable accuracies in any BP categories. Conclusion. This study developed and compared three machine learning algorithms to estimate BPs using PPG only and revealed that the regression tree algorithm was the best approach with overall acceptable accuracy to ISO standard for BP device validation. Furthermore, this study demonstrated that the regression tree algorithm achieved acceptable measurement accuracy only in the normotensive category, suggesting that future algorithm development for BP estimation should be more specific for different BP categories.

Download Full-text

429. County-level predictors of COVID-19 testing across the 62 counties in New York State: A comparison across machine learning algorithms

Open Forum Infectious Diseases ◽

10.1093/ofid/ofaa439.623 ◽

2020 ◽

Vol 7 (Supplement_1) ◽

pp. S281-S281

Author(s):

Chengbo Zeng ◽

Yunyu Xiao

Keyword(s):

Machine Learning ◽

New York ◽

Ridge Regression ◽

Prediction Models ◽

Learning Algorithms ◽

New York State ◽

Regression Tree ◽

Machine Learning Algorithms ◽

County Level ◽

Public Datasets

Abstract Background More than 360,000 people infected with COVID-19 in New York State (NYS) by the end of May 2020. Although expanded testing could effectively control statewide COVID-19 outbreak, the county-level factors predicting the number of testing are unknown. Accurately identifying the county-level predictors of testing may contribute to more effective testing allocation across counties in NYS. This study leveraged multiple public datasets and machine learning algorithms to construct and compare county-level prediction models of COVID-19 testing in NYS. Methods Testing data by May 15th was extracted from the Department of Health in NYS. A total of 28 county-level predictors derived from multiple public datasets (e.g., American Community Survey and US Health Data) were used to construct the prediction models. Three machine learning algorithms, including generalized linear regression with the least absolute shrinkage and selection operator(LASSO), ridge regression, and regression tree, were used to identify the most important county-level predictors, adjusting for prevalence and incidence. Model performances were assessed using the mean square error (MSE), with smaller MSE indicating a better model performance. Results The testing rate was 70.3 per 1,000 people in NYS. Counties (Rockland and Westchester) closed to the epicenter had high testing rates while counties (Chautauqua and Clinton) located at the boundary of NYS and were far away from the epicenter had low testing rates. The MSEs of linear regression with the LASSO penalty, ridge regression, and regression tree was 123.60, 40.59, and 298.0, respectively. Ridge regression was selected as the final model and revealed that the mental health provider rate was positively associated with testing (β=5.11, p=.04) while the proportion of religious adherents (β=-3.91, p=.05) was inversely related to the variation of testing rate across counties. Conclusion This study identified healthcare resources and religious environment as the strongest predictor of spatial variations of COVID-19 testing across NYS. Structural or policy efforts should address the spatial variations and target the relevant county-level predictors to promote statewide testing. Disclosures All Authors: No reported disclosures

Download Full-text

A Comparison of Supervised Machine Learning Algorithms for Classification of Communications Network Traffic

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-70087-8_47 ◽

2017 ◽

pp. 445-454 ◽

Cited By ~ 10

Author(s):

Pramitha Perera ◽

Yu-Chu Tian ◽

Colin Fidge ◽

Wayne Kelly

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Communications Network

Download Full-text

A Generalized Method for Modeling the Adsorption of Heavy Metals with Machine Learning Algorithms

Water ◽

10.3390/w12123490 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3490

Author(s):

Noor Hafsa ◽

Sayeed Rushd ◽

Mohammed Al-Yaari ◽

Muhammad Rahman

Keyword(s):

Machine Learning ◽

Heavy Metals ◽

Mean Squared Error ◽

Learning Algorithms ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting

Applications of machine learning algorithms (MLAs) to modeling the adsorption efficiencies of different heavy metals have been limited by the adsorbate–adsorbent pair and the selection of specific MLAs. In the current study, adsorption efficiencies of fourteen heavy metal–adsorbent (HM-AD) pairs were modeled with a variety of ML models such as support vector regression with polynomial and radial basis function kernels, random forest (RF), stochastic gradient boosting, and bayesian additive regression tree (BART). The wet experiment-based actual measurements were supplemented with synthetic data samples. The first batch of dry experiments was performed to model the removal efficiency of an HM with a specific AD. The ML modeling was then implemented on the whole dataset to develop a generalized model. A ten-fold cross-validation method was used for the model selection, while the comparative performance of the MLAs was evaluated with statistical metrics comprising Spearman’s rank correlation coefficient, coefficient of determination (R2), mean absolute error, and root-mean-squared-error. The regression tree methods, BART, and RF demonstrated the most robust and optimum performance with 0.96 ⫹ R2 ⫹ 0.99. The current study provides a generalized methodology to implement ML in modeling the efficiency of not only a specific adsorption process but also a group of comparable processes involving multiple HM-AD pairs.

Download Full-text

Urban flood risk mapping using data-driven geospatial techniques for a flood-prone case area in Iran

Hydrology Research ◽

10.2166/nh.2019.090 ◽

2019 ◽

Vol 51 (1) ◽

pp. 127-142 ◽

Cited By ~ 5

Author(s):

Hamid Darabi ◽

Ali Torabi Haghighi ◽

Mohamad Ayob Mohamadi ◽

Mostafa Rashidpour ◽

Alan D. Ziegler ◽

...

Keyword(s):

Machine Learning ◽

Land Use ◽

Population Density ◽

Flood Risk ◽

Flood Hazard ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Runoff Generation ◽

Boosted Regression Tree ◽

Building Density

Abstract In an effort to improve tools for effective flood risk assessment, we applied machine learning algorithms to predict flood-prone areas in Amol city (Iran), a site with recent floods (2017–2018). An ensemble approach was then implemented to predict hazard probabilities using the best machine learning algorithms (boosted regression tree, multivariate adaptive regression spline, generalized linear model, and generalized additive model) based on a receiver operator characteristic-area under the curve (ROC-AUC) assessment. The algorithms were all trained and tested on 92 randomly selected points, information from a flood inundation survey, and geospatial predictor variables (precipitation, land use, elevation, slope percent, curve number, distance to river, distance to channel, and depth to groundwater). The ensemble model had 0.925 and 0.892 accuracy for training and testing data, respectively. We then created a vulnerability map from data on building density, building age, population density, and socio-economic conditions and assessed risk as a product of hazard and vulnerability. The results indicated that distance to channel, land use, and runoff generation were the most important factors associated with flood hazard, while population density and building density were the most important factors determining vulnerability. Areas of highest and lowest flood risks were identified, leading to recommendations on where to implement flood risk reduction measures to guide flood governance in Amol city.

Download Full-text

Quantitative Precipitation Estimates Using Machine Learning Approaches with Operational Dual-Polarization Radar Data

Remote Sensing ◽

10.3390/rs13040694 ◽

2021 ◽

Vol 13 (4) ◽

pp. 694

Author(s):

Kyuhee Shin ◽

Joon Jin Song ◽

Wonbae Bang ◽

GyuWon Lee

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Regression Tree ◽

Radar Data ◽

Rainfall Rate ◽

Machine Learning Algorithms ◽

Dual Polarization ◽

Forest Model ◽

Independent Variables

Traditional radar-based rainfall estimation is typically done by known functional relationships between the rainfall intensity (R) and radar measurables, such as R–Zh, R–(Zh, ZDR), etc. One of the biggest advantages of machine learning algorithms is the applicability to a non-linear relationship between a dependent variable and independent variables without any predefined relationships. We explored the potential use of two supervised machine learning methods (regression tree and random forest) in rainfall estimation using dual-polarization radar variables. The regression tree does not require normalization and scaling of data; however, this method is quite unstable since each split depends on the parent split. Since the random forest is an ensemble method of regression trees, it has less variability in prediction compared with regression trees, but consumes more computer resources. We considered several different configurations for machine learning algorithms with different sets of dependent and independent variables. The random forest model was appropriately tuned. In the test of variable importance, the specific differential phase (differential reflectivity) was the most important variable to predict the rainfall rate (residual that is the difference between the true rainfall rate and the one estimated from the R–Z relationship). The models were evaluated by 10-fold cross-validation. The best model was the random forest model using a residual with the non-classified training set. The results indicated that the machine learning algorithms outperformed the traditional R–Z relationship. Then, we applied the best machine learning model to an S-band dual-polarization radar (Mt. Myeonbong) and validated the result with ground rain gauges. The results of the application to radar data showed that the estimates of the residuals had spatial variability. The stratiform and weak rain areas had positive residuals while convective areas had negative residuals, indicating that the spatial error structure driven by the R–Z relationship was well captured by the model. The rainfall rates of all pixels over the study area were adjusted with the estimated residuals. The rainfall rates adjusted by residual showed excellent agreement with the rain gauge, especially at high rainfall rates.

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text