scholarly journals Supervised machine learning based liver disease prediction approach with LASSO feature selection

2021 ◽  
Vol 10 (6) ◽  
pp. 3369-3376
Author(s):  
Saima Afrin ◽  
F. M. Javed Mehedi Shamrat ◽  
Tafsirul Islam Nibir ◽  
Mst. Fahmida Muntasim ◽  
Md. Shakil Moharram ◽  
...  

In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system. 

Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 403
Author(s):  
Muhammad Waleed ◽  
Tai-Won Um ◽  
Tariq Kamal ◽  
Syed Muhammad Usman

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.


Author(s):  
Sanam Aamir ◽  
Aqsa Rahim ◽  
Sajid Bashir ◽  
Muddasar Naeem

Breast cancer has made its mark as the primary cause of female deaths and disability worldwide, making it a significant health problem. However, early diagnosis of breast cancer can lead to its effective treatment. The relevant diagnostic features available in the patient’s medical data may be used in an effective way to diagnose, categorize and classify breast cancer. Considering the importance of early detection of breast cancer in its effective treatment, it is important to accurately diagnose and classify breast cancer using diagnostic features present in available data. Automated techniques based on machine learning are an effective way to classify data for diagnosis. Various machine learning based automated techniques have been proposed by researches for early prediction/diagnosis of breast cancer. However, due to the inherent criticalities and the risks coupled with wrong diagnosis, there is a dire need that the accuracy of the predicted diagnosis must be improved. In this paper, we have introduced a novel supervised machine learning based approach that embodies Random Forest, Gradient Boosting, Support Vector Machine, Artificial Neural Network and Multilayer Perception methods. Experimental results show that the proposed framework has achieved an accuracy of 99.12%. Results obtained after the process of feature selection indicate that both preprocessing methods and feature selection increase the success of the classification system.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


2019 ◽  
Vol 1 (1) ◽  
pp. 384-399 ◽  
Author(s):  
Thais de Toledo ◽  
Nunzio Torrisi

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.


Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm


2021 ◽  
Vol 12 (2) ◽  
pp. 28-55
Author(s):  
Fabiano Rodrigues ◽  
Francisco Aparecido Rodrigues ◽  
Thelma Valéria Rocha Rodrigues

Este estudo analisa resultados obtidos com modelos de machine learning para predição do sucesso de startups. Como proxy de sucesso considera-se a perspectiva do investidor, na qual a aquisição da startup ou realização de IPO (Initial Public Offering) são formas de recuperação do investimento. A revisão da literatura aborda startups e veículos de financiamento, estudos anteriores sobre predição do sucesso de startups via modelos de machine learning, e trade-offs entre técnicas de machine learning. Na parte empírica, foi realizada uma pesquisa quantitativa baseada em dados secundários oriundos da plataforma americana Crunchbase, com startups de 171 países. O design de pesquisa estabeleceu como filtro startups fundadas entre junho/2010 e junho/2015, e uma janela de predição entre junho/2015 e junho/2020 para prever o sucesso das startups. A amostra utilizada, após etapa de pré-processamento dos dados, foi de 18.571 startups. Foram utilizados seis modelos de classificação binária para a predição: Regressão Logística, Decision Tree, Random Forest, Extreme Gradiente Boosting, Support Vector Machine e Rede Neural. Ao final, os modelos Random Forest e Extreme Gradient Boosting apresentaram os melhores desempenhos na tarefa de classificação. Este artigo, envolvendo machine learning e startups, contribui para áreas de pesquisa híbridas ao mesclar os campos da Administração e Ciência de Dados. Além disso, contribui para investidores com uma ferramenta de mapeamento inicial de startups na busca de targets com maior probabilidade de sucesso.   


2021 ◽  
Vol 8 (1) ◽  
pp. 30-35
Author(s):  
Jayalakshmi R ◽  
Savitha Devi M

Agriculture sector is recognized as the backbone of the Indian economy that plays a crucial role in the growth of the nation’s economy. It imparts on weather and other environmental aspects. Some of the factors on which agriculture is reliant are Soil, climate, flooding, fertilizers, temperature, precipitation, crops, insecticides, and herb. The soil fertility is dependent on these factors and hence difficult to predict. However, the Agriculture sector in India is facing the severe problem of increasing crop productivity. Farmers lack the essential knowledge of nutrient content of the soil, selection of crop best suited for the soil and they also lack efficient methods for predicting crop well in advance so that appropriate methods have been used to improve crop productivity. This paper presents different Supervised Machine Learning Algorithms such as Decision tree, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) to predict the fertility of soil based on macro-nutrients and micro-nutrients status found in the dataset. Supervised Machine Learning algorithms are applied on the training dataset and are tested with the test dataset, and the implementation of these algorithms is done using R Tool. The performance analysis of these algorithms is done using different evaluation metrics like mean absolute error, cross-validation, and accuracy. Result analysis shows that the Decision tree is produced the best accuracy of 99% with a very less mean square error (MSE) rate.


Energies ◽  
2020 ◽  
Vol 13 (19) ◽  
pp. 5193
Author(s):  
Nasir Ayub ◽  
Muhammad Irfan ◽  
Muhammad Awais ◽  
Usman Ali ◽  
Tariq Ali ◽  
...  

Electrical load forecasting provides knowledge about future consumption and generation of electricity. There is a high level of fluctuation behavior between energy generation and consumption. Sometimes, the energy demand of the consumer becomes higher than the energy already generated, and vice versa. Electricity load forecasting provides a monitoring framework for future energy generation, consumption, and making a balance between them. In this paper, we propose a framework, in which deep learning and supervised machine learning techniques are implemented for electricity-load forecasting. A three-step model is proposed, which includes: feature selection, extraction, and classification. The hybrid of Random Forest (RF) and Extreme Gradient Boosting (XGB) is used to calculate features’ importance. The average feature importance of hybrid techniques selects the most relevant and high importance features in the feature selection method. The Recursive Feature Elimination (RFE) method is used to eliminate the irrelevant features in the feature extraction method. The load forecasting is performed with Support Vector Machines (SVM) and a hybrid of Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN). The meta-heuristic algorithms, i.e., Grey Wolf Optimization (GWO) and Earth Worm Optimization (EWO) are applied to tune the hyper-parameters of SVM and CNN-GRU, respectively. The accuracy of our enhanced techniques CNN-GRU-EWO and SVM-GWO is 96.33% and 90.67%, respectively. Our proposed techniques CNN-GRU-EWO and SVM-GWO perform 7% and 3% better than the State-Of-The-Art (SOTA). In the end, a comparison with SOTA techniques is performed to show the improvement of the proposed techniques. This comparison showed that the proposed technique performs well and results in the lowest performance error rates and highest accuracy rates as compared to other techniques.


Sign in / Sign up

Export Citation Format

Share Document