Composite Model Fabrication of Classification with Transformed Target Regressor for Customer Segmentation using Machine Learning

In Current internet world, the customers prefer to buy the products through online rather than spending their time on show rooms. The online customers of wine increases day by day due to the availability of high brands in online sellers. So the customers buy the wine products based on the product description and the satisfaction of other customers those who have bought before. This makes the industries to focus on machine learning that concentrates on target transformation of the dependent variable. This paper endeavor to forecast the customer segmentation for the wine data set extracted from UCI Machine learning repository. The raw wine data set is subjected to target transformation for various classifiers like Huber Regressor, SGD Regressor, RidgeCV Regression, Logistic RegressionCV and Passive Aggressive Regressor. The performance of the various classifiers is analyzed with and without target transformation using the metrics like Mean Absolute Error and R2 Score. The implementation is done in Anaconda Navigator with Python. Experimental results shows that after applying target transformation RidgeCV Regression is found to be effective with the R2 Score of 82% and Mean Absolute Error of 0.0 compared to other classifiers.

Download Full-text

Machine Learning Based Target Transformation using Regressor Classification for Heart Disease Envisage

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e4987.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1526-1531

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Mean Absolute Error ◽

Economic Status ◽

Absolute Error ◽

Background Information ◽

World Population ◽

Data Set ◽

Technological Growth ◽

The Mean

In the modern scenario of technological growth, the life style of an individual varies with the economic status. The world population is prone towards chronic deadly diseases due to the variety of food habits. The usages of electronic equipments have raised the population to waste their quality time towards exercise. The lack of physical activity has symptoms towards bad quality of life. With this background information, this paper concentrates on predicting the type of heart disease by applying target transformation using various machine learning regression models. This paper uses the Heart disease data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data segregation is done and it is preprocessed to extract the relationship and dependency of each parameters. Second, the dataset is subjected to process to identify the target distribution of classes in the dependent variable. Third, the dataset is fitted to the Ridge regressor, Huber regressor, SGD regressor and PerceptronCV regressor by applying with and without target transformation. Fourth, dataset is feature scaled and then fitted to the Ridge regressor, Huber regressor, SGD regressor and PerceptronCV regressor by applying with and without target transformation. Fifth, the performance analysis is done by analyzing the Mean Absolute Error and R2 Score. Experimental results show that, the Perceptron regressor CV has the effectiveness with the mean absolute error of 1.00 and R2 score of 0.04 for the heart disease prediction.

Download Full-text

Detection and Severity Evaluation of Combined Rail Defects Using Deep Learning

Vibration ◽

10.3390/vibration4020022 ◽

2021 ◽

Vol 4 (2) ◽

pp. 341-356

Author(s):

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Mean Absolute Error ◽

Absolute Error ◽

Machine Learning Techniques ◽

Rolling Stock ◽

Raw Data ◽

Learning Techniques ◽

Combined Defects

Various techniques have been developed to detect railway defects. One of the popular techniques is machine learning. This unprecedented study applies deep learning, which is a branch of machine learning techniques, to detect and evaluate the severity of rail combined defects. The combined defects in the study are settlement and dipped joint. Features used to detect and evaluate the severity of combined defects are axle box accelerations simulated using a verified rolling stock dynamic behavior simulation called D-Track. A total of 1650 simulations are run to generate numerical data. Deep learning techniques used in the study are deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN). Simulated data are used in two ways: simplified data and raw data. Simplified data are used to develop the DNN model, while raw data are used to develop the CNN and RNN model. For simplified data, features are extracted from raw data, which are the weight of rolling stock, the speed of rolling stock, and three peak and bottom accelerations from two wheels of rolling stock. In total, there are 14 features used as simplified data for developing the DNN model. For raw data, time-domain accelerations are used directly to develop the CNN and RNN models without processing and data extraction. Hyperparameter tuning is performed to ensure that the performance of each model is optimized. Grid search is used for performing hyperparameter tuning. To detect the combined defects, the study proposes two approaches. The first approach uses one model to detect settlement and dipped joint, and the second approach uses two models to detect settlement and dipped joint separately. The results show that the CNN models of both approaches provide the same accuracy of 99%, so one model is good enough to detect settlement and dipped joint. To evaluate the severity of the combined defects, the study applies classification and regression concepts. Classification is used to evaluate the severity by categorizing defects into light, medium, and severe classes, and regression is used to estimate the size of defects. From the study, the CNN model is suitable for evaluating dipped joint severity with an accuracy of 84% and mean absolute error (MAE) of 1.25 mm, and the RNN model is suitable for evaluating settlement severity with an accuracy of 99% and mean absolute error (MAE) of 1.58 mm.

Download Full-text

Estimation of Soil Cohesion Using Machine Learning Method: A Random Forest Approach

Advances in Civil Engineering ◽

10.1155/2021/8873993 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Hai-Bang Ly ◽

Thuy-Anh Nguyen ◽

Binh Thai Pham

Keyword(s):

Machine Learning ◽

Random Forest ◽

Soil Properties ◽

Clay Content ◽

Absolute Error ◽

Experimental Methods ◽

Liquid Limit ◽

Support Vector ◽

Data Set ◽

Soil Cohesion

Soil cohesion (C) is one of the critical soil properties and is closely related to basic soil properties such as particle size distribution, pore size, and shear strength. Hence, it is mainly determined by experimental methods. However, the experimental methods are often time-consuming and costly. Therefore, developing an alternative approach based on machine learning (ML) techniques to solve this problem is highly recommended. In this study, machine learning models, namely, support vector machine (SVM), Gaussian regression process (GPR), and random forest (RF), were built based on a data set of 145 soil samples collected from the Da Nang-Quang Ngai expressway project, Vietnam. The database also includes six input parameters, that is, clay content, moisture content, liquid limit, plastic limit, specific gravity, and void ratio. The performance of the model was assessed by three statistical criteria, namely, the correlation coefficient (R), mean absolute error (MAE), and root mean square error (RMSE). The results demonstrated that the proposed RF model could accurately predict soil cohesion with high accuracy (R = 0.891) and low error (RMSE = 3.323 and MAE = 2.511), and its predictive capability is better than SVM and GPR. Therefore, the RF model can be used as a cost-effective approach in predicting soil cohesion forces used in the design and inspection of constructions.

Download Full-text

One- year mortality in patients with advanced hepatocellular carcinoma on immunotherapy: Prediction using machine learning models (Preprint)

10.2196/preprints.32281 ◽

2021 ◽

Author(s):

Thomas Ka-Luen Lui ◽

Ka Shing, Michael Cheung ◽

Wai Keung Leung

Keyword(s):

Machine Learning ◽

Hepatocellular Carcinoma ◽

Characteristic Curve ◽

False Negative ◽

False Negative Rate ◽

Absolute Error ◽

Advanced Hepatocellular Carcinoma ◽

Data Set ◽

One Year ◽

Related Mortality

BACKGROUND Immunotherapy is a new promising treatment for patients with advanced hepatocellular carcinoma (HCC), but is costly and potentially associated with considerable side effects. OBJECTIVE This study aimed to evaluate the role of machine learning (ML) models in predicting the one-year cancer-related mortality in advanced HCC patients treated with immunotherapy METHODS 395 HCC patients who had received immunotherapy (including nivolumab, pembrolizumab or ipilimumab) in 2014 - 2019 in Hong Kong were included. The whole data set were randomly divided into training (n=316) and validation (n=79) set. The data set, including 45 clinical variables, was used to construct six different ML models in predicting the risk of one-year mortality. The performances of ML models were measured by the area under receiver operating characteristic curve (AUC) and the mean absolute error (MAE) using calibration analysis. RESULTS The overall one-year cancer-related mortality was 51.1%. Of the six ML models, the random forest (RF) has the highest AUC of 0.93 (95%CI: 0.86-0.98), which was better than logistic regression (0.82, p=0.01) and XGBoost (0.86, p=0.04). RF also had the lowest false positive (6.7%) and false negative rate (2.8%). High baseline AFP, bilirubin and alkaline phosphatase were three common risk factors identified by all ML models. CONCLUSIONS ML models could predict one-year cancer-related mortality of HCC patients treated with immunotherapy, which may help to select patients who would most benefit from this new treatment option.

Download Full-text

Machine learning and Grad-Cam based vascular aging assessment using photoplethysmogram (Preprint)

10.2196/preprints.31709 ◽

2021 ◽

Author(s):

Hangsik Shin

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Age Estimation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Absolute Error ◽

Coefficient Of Determination ◽

Vascular Aging ◽

Squared Error ◽

Vascular Age

BACKGROUND Arterial stiffness due to vascular aging is a major indicator for evaluating cardiovascular risk. OBJECTIVE In this study, we propose a method of estimating age by applying machine learning to photoplethysmogram for non-invasive vascular age assessment. METHODS The machine learning-based age estimation model that consists of three convolutional layers and two-layer fully connected layers, was developed using segmented photoplethysmogram by pulse from a total of 752 adults aged 19–87 years. The performance of the developed model was quantitatively evaluated using mean absolute error, root-mean-squared-error, Pearson’s correlation coefficient, coefficient of determination. The Grad-Cam was used to explain the contribution of photoplethysmogram waveform characteristic in vascular age estimation. RESULTS Mean absolute error of 8.03, root mean squared error of 9.96, 0.62 of correlation coefficient, and 0.38 of coefficient of determination were shown through 10-fold cross validation. Grad-Cam, used to determine the weight that the input signal contributes to the result, confirmed that the contribution to the age estimation of the photoplethysmogram segment was high around the systolic peak. CONCLUSIONS The machine learning-based vascular aging analysis method using the PPG waveform showed comparable or superior performance compared to previous studies without complex feature detection in evaluating vascular aging. CLINICALTRIAL 2015-0104

Download Full-text

Measurement Method for Evaluating the Lockdown Policies during the COVID-19 Pandemic

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17155574 ◽

2020 ◽

Vol 17 (15) ◽

pp. 5574 ◽

Cited By ~ 2

Author(s):

Mohammed Al Zobbi ◽

Belal Alsinglawi ◽

Omar Mubin ◽

Fady Alnajjar

Keyword(s):

Machine Learning ◽

Infection Rate ◽

Global Economy ◽

Data Analytics ◽

Measurement Method ◽

Mean Absolute Error ◽

Absolute Error ◽

Decision Makers ◽

Machine Learning Classification ◽

Data Analyses

Coronavirus Disease 2019 (COVID-19) has affected day to day life and slowed down the global economy. Most countries are enforcing strict quarantine to control the havoc of this highly contagious disease. Since the outbreak of COVID-19, many data analyses have been done to provide close support to decision-makers. We propose a method comprising data analytics and machine learning classification for evaluating the effectiveness of lockdown regulations. Lockdown regulations should be reviewed on a regular basis by governments, to enable reasonable control over the outbreak. The model aims to measure the efficiency of lockdown procedures for various countries. The model shows a direct correlation between lockdown procedures and the infection rate. Lockdown efficiency is measured by finding a correlation coefficient between lockdown attributes and the infection rate. The lockdown attributes include retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, residential, and schools. Our results show that combining all the independent attributes in our study resulted in a higher correlation (0.68) to the dependent value Interquartile 3 (Q3). Mean Absolute Error (MAE) was found to be the least value when combining all attributes.

Download Full-text

Machine learning model for feature recognition of sports competition based on improved TLD algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189312 ◽

2020 ◽

pp. 1-12

Author(s):

Qinglong Ding ◽

Zhenfeng Ding

Keyword(s):

Machine Learning ◽

Feature Recognition ◽

Experimental Results ◽

Pedestrian Tracking ◽

Data Set ◽

Recognition Model ◽

Standard Data ◽

Machine Learning Model ◽

Environmental Background

Sports competition characteristics play an important role in judging the fairness of the game and improving the skills of the athletes. At present, the feature recognition of sports competition is affected by the environmental background, which causes problems in feature recognition. In order to improve the effect of feature recognition of sports competition, this study improves the TLD algorithm, and uses machine learning to build a feature recognition model of sports competition based on the improved TLD algorithm. Moreover, this study applies the TLD algorithm to the long-term pedestrian tracking of PTZ cameras. In view of the shortcomings of the TLD algorithm, this study improves the TLD algorithm. In addition, the improved TLD algorithm is experimentally analyzed on a standard data set, and the improved TLD algorithm is experimentally verified. Finally, the experimental results are visually represented by mathematical statistics methods. The research shows that the method proposed by this paper has certain effects.

Download Full-text

A Predictive Model Implemented in KNIME Based on Learning Analytics for Timely Decision Making in Virtual Learning Environments

International Journal of Information and Education Technology ◽

10.18178/ijiet.2022.12.2.1591 ◽

2022 ◽

Vol 12 (2) ◽

pp. 91-99

Author(s):

Maraza-Quispe Benjamín ◽

◽

Enrique Damián Valderrama-Chauca ◽

Lenin Henry Cari-Mogrovejo ◽

Jorge Milton Apaza-Huanca ◽

...

Keyword(s):

Academic Performance ◽

Predictive Model ◽

Mean Absolute Error ◽

Simple Random Sampling ◽

Absolute Error ◽

Analysis Data ◽

Problem Analysis ◽

Data Set ◽

Behavioral Indicators ◽

The Mean

The present research aims to implement a predictive model in the KNIME platform to analyze and compare the prediction of academic performance using data from a Learning Management System (LMS), identifying students at academic risk in order to generate timely and timely interventions. The CRISP-DM methodology was used, structured in six phases: Problem analysis, data analysis, data understanding, data preparation, modeling, evaluation and implementation. Based on the analysis of online learning behavior through 22 behavioral indicators observed in the LMS of the Faculty of Educational Sciences of the National University of San Agustin. These indicators are distributed in five dimensions: Academic Performance, Access, Homework, Social Aspects and Quizzes. The model has been implemented in the KNIME platform using the Simple Regression Tree Learner training algorithm. The total population consists of 30,000 student records from which a sample of 1,000 records has been taken by simple random sampling. The accuracy of the model for early prediction of students' academic performance is evaluated, the 22 observed behavioral indicators are compared with the means of academic performance in three courses. The prediction results of the implemented model are satisfactory where the mean absolute error compared to the mean of the first course was 3. 813 and with an accuracy of 89.7%, the mean absolute error compared to the mean of the second course was 2.809 with an accuracy of 94.2% and the mean absolute error compared to the mean of the third course was 2.779 with an accuracy of 93.8%. These results demonstrate that the proposed model can be used to predict students' future academic performance from an LMS data set.

Download Full-text

Machine learning and geostatistical approaches for estimating aboveground biomass in Chinese subtropical forests

10.21203/rs.3.rs-25148/v2 ◽

2020 ◽

Author(s):

huiyi su ◽

Wenjuan Shen ◽

Jingrui Wang ◽

Arshad Ali ◽

Mingshi Li

Keyword(s):

Machine Learning ◽

Forest Management ◽

Random Forest ◽

Aboveground Biomass ◽

Ordinary Kriging ◽

Temporal Dynamics ◽

Mean Absolute Error ◽

Absolute Error ◽

Subtropical Forests ◽

Management Actions

Abstract Background: Aboveground biomass (AGB) is a fundamental indicator of forest ecosystem productivity and health and hence plays an essential role in evaluating forest carbon reserves and supporting the development of targeted forest management plans. Methods: Here, we proposed a random forest/co-kriging framework that integrates the strengths of machine learning and geostatistical approaches to improve the mapping accuracies of AGB in northern Guangdong province of China. We used Landsat time-series observations, Advanced Land Observing Satellite (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) data, and National Forest Inventory (NFI) plot measurements, to generate the forest AGB maps at three time points (1992, 2002, and 2010) showing the spatio-temporal dynamics of AGB in the subtropical forests in Guangdong, China. Results: The proposed model provided excellent performance for mapping AGB using spectral, textural, and topographical variables, and the radar backscatter coefficients. The root mean square error of the plot-level AGB validation was between 15.62 and 53.78 (t/ha), the mean absolute error ranged from 6.54 to 32.32 t/ha, and the relative improvement over the random forest algorithm was between 3.8% and 17.7%. The highest coefficient of determination (0.81) and the lowest mean absolute error (6.54 t/ha) were observed in the 1992 AGB map. The spectral saturation effect was minimized by adding the PALSAR data to the modeling variable set in 2010. By adding elevation as a covariable, the co-kriging outperformed the ordinary kriging method for the prediction of the AGB residuals, because co-kriging resulted in better interpolation results in the valleys and plains of the study area. Conclusions: Validation of the three AGB maps with an independent dataset indicated that the random forest/co-kriging performed best for AGB prediction, followed by random forest coupled with ordinary kriging (random forest/ordinary kriging), and the random forest model. The proposed random forest/co-kriging framework provides an accurate and reliable method for AGB mapping in subtropical forest regions with complex topography. The resulting AGB maps are suitable for the targeted development of forest management actions to promote carbon sequestration and sustainable forest management in the context of climate change.

Download Full-text

Malware Detection in Android Apps Using Static Analysis

Journal of Cases on Information Technology ◽

10.4018/jcit.20220701.oa6 ◽

2022 ◽

Vol 24 (3) ◽

pp. 1-25

Author(s):

Nishtha Paul ◽

Arpita Jadhav Bhatt ◽

Sakeena Rizvi ◽

Shubhangi

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Detection Rate ◽

Empirical Studies ◽

Malware Detection ◽

Personal Data ◽

Data Set ◽

Android Apps ◽

Malicious Apps ◽

Day By Day

Frequency of malware attacks because Android apps are increasing day by day. Current studies have revealed startling facts about data harvesting incidents, where user’s personal data is at stake. To preserve privacy of users, a permission induced risk interface MalApp to identify privacy violations rising from granting permissions during app installation is proposed. It comprises of multi-fold process that performs static analysis based on app’s category. First, concept of reverse engineering is applied to extract app permissions to construct a Boolean-valued permission matrix. Second, ranking of permissions is done to identify the risky permissions across category. Third, machine learning and ensembling techniques have been incorporated to test the efficacy of the proposed approach on a data set of 404 benign and 409 malicious apps. The empirical studies have identified that our proposed algorithm gives a best case malware detection rate of 98.33%. The highlight of interface is that any app can be classified as benign or malicious even before running it using static analysis.

Download Full-text