supervised machine learning classifiers
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 20)

H-INDEX

4
(FIVE YEARS 2)

Author(s):  
Lokesh Kola

Abstract: Diabetes is the deadliest chronic diseases in the world. According to World Health Organization (WHO) around 422 million people are currently suffering from diabetes, particularly in low and middle-income countries. Also, the number of deaths due to diabetes is close to 1.6 million. Recent research has proven that the occurrence of diabetes is likely to be seen in people aged between 18 and this has risen from 4.7 to 8.5% from 1980 to 2014. Early diagnosis is necessary so that the disease does not go into advanced stages which is quite difficult to cure. Significant research has been performed in diabetes predictions. As time passes, challenges keep increasing to build a system to detect diabetes systematically. The hype for Machine Learning is increasing day to day to analyse medical data to diagnose a disease. Previous research has focused on just identifying the diabetes without specifying its type. In this paper, we have we have predicted gestational diabetes (Type-3) by comparing various supervised and semi-supervised machine learning algorithms on two datasets i.e., binned and non-binned datasets and compared the performance based on evaluation metrics. Keywords: Gestational diabetes, Machine Learning, Supervised Learning, Semi-Supervised Learning, Diabetes Prediction


Author(s):  
MHD RAJA ABOU HARB ◽  
◽  
Serhat Ozekes ◽  

DoH is a modern protocol used as an alternative to the existing DNS protocol, which provides confidentiality and integrity to DNS functions by using protected channels. Since this kind of connection can pass through the current protection systems, it can be used for spreading malicious software. There is a need to find defense mechanisms that can detect and prevent these forms of malicious behaviors. In this study, we propose a method to classify malicious DoH connections using machine learning techniques, and we propose a feature selection process which reduced the number of used features till 27% of the total 33 features, and resulted improved the detection level of the malicious DoH connections. The study involves employing twelve different supervised machine learning classifiers, and the designed feature selection process used 8 different feature selection methods based on machine learning techniques for counting the importance of the features. The reached results were promising since the accuracy scores were about 100% in detecting malicious DoH connections.


Author(s):  
Yunita Nurmasari ◽  
Arie Wahyu Wijayanto

The objective of this work is to assess the capability of multispectral optical Landsat and Sentinel images to detect oil palm plantations in Rokan Hulu, Riau, one of the largest palm oil producers in Indonesia, by combining multispectral bands and composite indices. In addition to comparing two different sets of satellite images, we also ascertain which gives the best performance among the supervised machine learning classifiers CART Decision Tree, Random Forest, Support Vector Machine, and Naive Bayes. With the use of multispectral bands and derived composite indices, the best classifier achieved an overall accuracy of up to 92%. The findings and contributions of the study include: (1) insight into a set of feature combinations that provides the highest model accuracy, and (2) an extensive evaluation of machine learning-based classifiers on two different optical satellite imageries. Our study could further be beneficial for the government in providing more scalable plantation statistics.


Author(s):  
Akram Q. M. Algaolahi ◽  
Abdullah A. Hasan ◽  
Amer Sallam ◽  
Abdullah M. Sharaf ◽  
Aseel A. Abdu ◽  
...  

2021 ◽  
Vol 10 (4) ◽  
pp. 2163-2169
Author(s):  
Tanvirul Islam ◽  
Nadim Ahmed ◽  
Subhenur Latif

The use of Bangla abusive texts has been accelerated with the progressive use of social media. Through this platform, one can spread the hatred or negativity in a viral form. Plenty of research has been done on detecting abusive text in the English language. Bangla abusive text detection has not been done to a great extent. In this experimental study, we have applied three distinct approaches to a comprehensive dataset to obtain a better outcome. In the first study, a large dataset collected from Facebook and YouTube has been utilized to detect abusive texts. After extensive pre-processing and feature extraction, a set of consciously selected supervised machine learning classifiers i.e. multinomial Naïve Bayes (MNB), multi layer perceptron (MLP), support vector machine (SVM), decision tree, random forrest, stochastic gradient descent (SGD), ridge, perceptron and k-nearest neighbors (k-NN) has been applied to determine the best result. The second experiment is conducted by constructing a balanced dataset by random under sampling the majority class and finally, a Bengali stemmer is employed on the dataset and then the final experiment is conducted. In all three experiments, SVM with the full dataset obtained the highest accuracy of 88%.


Author(s):  
Marina Azer ◽  
◽  
Mohamed Taha ◽  
Hala H. Zayed ◽  
Mahmoud Gadallah

Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), Knearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.


2021 ◽  
Vol 13 (2) ◽  
pp. 971
Author(s):  
Papiya Debnath ◽  
Pankaj Chittora ◽  
Tulika Chakrabarti ◽  
Prasun Chakrabarti ◽  
Zbigniew Leonowicz ◽  
...  

Earthquakes are one of the most overwhelming types of natural hazards. As a result, successfully handling the situation they create is crucial. Due to earthquakes, many lives can be lost, alongside devastating impacts to the economy. The ability to forecast earthquakes is one of the biggest issues in geoscience. Machine learning technology can play a vital role in the field of geoscience for forecasting earthquakes. We aim to develop a method for forecasting the magnitude range of earthquakes using machine learning classifier algorithms. Three different ranges have been categorized: fatal earthquake; moderate earthquake; and mild earthquake. In order to distinguish between these classifications, seven different machine learning classifier algorithms have been used for building the model. To train the model, six different datasets of India and regions nearby to India have been used. The Bayes Net, Random Tree, Simple Logistic, Random Forest, Logistic Model Tree (LMT), ZeroR and Logistic Regression algorithms have been applied to each dataset. All of the models have been developed using the Weka tool and the results have been noted. It was observed that Simple Logistic and LMT classifiers performed well in each case.


Sign in / Sign up

Export Citation Format

Share Document