Machine Learning in Python

Diabetes is a disease of the modern world. The modern lifestyle has led to unhealthy eating habits causing type 2 diabetes. Machine learning has gained a lot of popularity in the recent days. It has applications in various fields and has proven to be increasingly effective in the medical field. The purpose of this chapter is to predict the diabetes outcome of a person based on other factors or attributes. Various machine learning algorithms like logistic regression (LR), tuned and not tuned random forest (RF), and multilayer perceptron (MLP) have been used as classifiers for diabetes prediction. This chapter also presents a comparative study of these algorithms based on various performance metrics like accuracy, sensitivity, specificity, and F1 score.

Download Full-text

AI-Enabled Support System for Melanoma Detection and Classification

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2021100104 ◽

2021 ◽

Vol 10 (4) ◽

pp. 58-75

Author(s):

Vivek Sen Saxena ◽

Prashant Johri ◽

Avneesh Kumar

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Skin Lesion ◽

Performance Metrics ◽

Similarity Index ◽

Skin Lesions ◽

Machine Learning Algorithms ◽

Lesion Area ◽

Melanoma Detection ◽

Grabcut Algorithm

Skin lesion melanoma is the deadliest type of cancer. Artificial intelligence provides the power to classify skin lesions as melanoma and non-melanoma. The proposed system for melanoma detection and classification involves four steps: pre-processing, resizing all the images, removing noise and hair from dermoscopic images; image segmentation, identifying the lesion area; feature extraction, extracting features from segmented lesion and classification; and categorizing lesion as malignant (melanoma) and benign (non-melanoma). Modified GrabCut algorithm is employed to generate skin lesion. Segmented lesions are classified using machine learning algorithms such as SVM, k-NN, ANN, and logistic regression and evaluated on performance metrics like accuracy, sensitivity, and specificity. Results are compared with existing systems and achieved higher similarity index and accuracy.

Download Full-text

Attack and Anomaly Detection in IoT Networks Using Supervised Machine Learning Approaches

Revue d intelligence artificielle ◽

10.18280/ria.350102 ◽

2021 ◽

Vol 35 (1) ◽

pp. 11-21

Author(s):

Himani Tyagi ◽

Rajendra Kumar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Testing Time ◽

Learning Approaches ◽

Reduction Techniques ◽

Share Data

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.

Download Full-text

Supermarket Sales Prediction Using Regression

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/951022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1153-1157

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Low Cost ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Customer Data ◽

Sales Data ◽

Online Marketplace ◽

Sales Prediction ◽

The Future

Sales forecasting is an important when it comes to companies who are engaged in retailing, logistics, manufacturing, marketing and wholesaling. It allows companies to allocate resources efficiently, to estimate revenue of the sales and to plan strategies which are better for company’s future. In this paper, predicting product sales from a particular store is done in a way that produces better performance compared to any machine learning algorithms. The dataset used for this project is Big Mart Sales data of the 2013.Nowadays shopping malls and Supermarkets keep track of the sales data of the each and every individual item for predicting the future demand of the customer. It contains large amount of customer data and the item attributes. Further, the frequent patterns are detected by mining the data from the data warehouse. Then the data can be used for predicting the sales of the future with the help of several machine learning techniques (algorithms) for the companies like Big Mart. In this project, we propose a model using the Xgboost algorithm for predicting sales of companies like Big Mart and founded that it produces better performance compared to other existing models. An analysis of this model with other models in terms of their performance metrics is made in this project. Big Mart is an online marketplace where people can buy or sell or advertise your merchandise at low cost. The goal of the paper is to make Big Mart the shopping paradise for the buyers and a marketing solutions for the sellers as well. The ultimate aim is the complete satisfaction of the customers. The project “SUPERMARKET SALES PREDICTION” builds a predictive model and finds out the sales of each of the product at a particular store. The Big Mart use this model to under the properties of the products which plays a major role in increasing the sales. This can also be done on the basis hypothesis that should be done before looking at the data

Download Full-text

A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Gestational Diabetes Prediction

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39434 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1001-1005

Author(s):

Lokesh Kola

Keyword(s):

Machine Learning ◽

Gestational Diabetes ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

World Health ◽

Middle Income ◽

Diabetes Prediction ◽

Supervised Machine Learning Classifiers ◽

Health Organization

Abstract: Diabetes is the deadliest chronic diseases in the world. According to World Health Organization (WHO) around 422 million people are currently suffering from diabetes, particularly in low and middle-income countries. Also, the number of deaths due to diabetes is close to 1.6 million. Recent research has proven that the occurrence of diabetes is likely to be seen in people aged between 18 and this has risen from 4.7 to 8.5% from 1980 to 2014. Early diagnosis is necessary so that the disease does not go into advanced stages which is quite difficult to cure. Significant research has been performed in diabetes predictions. As time passes, challenges keep increasing to build a system to detect diabetes systematically. The hype for Machine Learning is increasing day to day to analyse medical data to diagnose a disease. Previous research has focused on just identifying the diabetes without specifying its type. In this paper, we have we have predicted gestational diabetes (Type-3) by comparing various supervised and semi-supervised machine learning algorithms on two datasets i.e., binned and non-binned datasets and compared the performance based on evaluation metrics. Keywords: Gestational diabetes, Machine Learning, Supervised Learning, Semi-Supervised Learning, Diabetes Prediction

Download Full-text

Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study (Preprint)

10.2196/preprints.16850 ◽

2019 ◽

Author(s):

Lei Zhang ◽

Xianwen Shang ◽

Subhashaan Sreedharan ◽

Xixi Yan ◽

Jianbin Liu ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Risk Prediction ◽

Machine Learning Algorithms ◽

Survey Study ◽

Gradient Boosting ◽

Diabetes Incidence ◽

Diabetes Onset ◽

Prediction Of Diabetes

BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (<i>P</i><.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.

Download Full-text

Hierarchical Fusion of Machine Learning Algorithms in Indoor Positioning and Localization

Applied Sciences ◽

10.3390/app9183665 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3665 ◽

Cited By ~ 4

Author(s):

Ahmet Çağdaş Seçkin ◽

Aysun Coşkun

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Deductive Reasoning ◽

Real Life ◽

Indoor Positioning ◽

Machine Learning Algorithms ◽

Learning Models ◽

Positioning Systems ◽

Common Solution ◽

Variable Reduction

Wi-Fi-based indoor positioning offers significant opportunities for numerous applications. Examining the Wi-Fi positioning systems, it was observed that hundreds of variables were used even when variable reduction was applied. This reveals a structure that is difficult to repeat and is far from producing a common solution for real-life applications. It aims to create a common and standardized dataset for indoor positioning and localization and present a system that can perform estimations using this dataset. To that end, machine learning (ML) methods are compared and the results of successful methods with hierarchical inclusion are then investigated. Further, new features are generated according to the measurement point obtained from the dataset. Subsequently, learning models are selected according to the performance metrics for the estimation of location and position. These learning models are then fused hierarchically using deductive reasoning. Using the proposed method, estimation of location and position has proved to be more successful by using fewer variables than the current studies. This paper, thus, identifies a lack of applicability present in the research community and solves it using the proposed method. It suggests that the proposed method results in a significant improvement for the estimation of floor and longitude.

Download Full-text

339: Future Type-2 diabetes prediction following pregnancy - using a novel machine learning algorithm

American Journal of Obstetrics and Gynecology ◽

10.1016/j.ajog.2019.11.355 ◽

2020 ◽

Vol 222 (1) ◽

pp. S228

Author(s):

Ohad houri ◽

Yotam Gil ◽

Alexandra Berezowsky ◽

Arnon Wiznitzer ◽

Eran Hadar ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Diabetes Prediction

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text