scholarly journals Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset

Author(s):  
Parilkumar Shiroya

Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.


Artificial intelligence is the technology that lets a machine mimic the thinking ability of a human being. Machine learning is the subset of AI, that makes this machine exhibit human behavior by making it learn from the known data, without the need of explicitly programming it. The health care sector has adopted this technology, for the development of medical procedures, maintaining huge patient’s records, assist physicians in the prediction, detection, and treatment of diseases and many more. In this paper, a comparative study of six supervised machine learning algorithms namely Logistic Regression(LR),support vector machine(SVM),Decision Tree(DT).Random Forest(RF),k-nearest neighbor(k-NN),Naive Bayes (NB) are made for the classification and prediction of diseases. Result shows out of compared supervised learning algorithms here, logistic regression is performing best with an accuracy of 81.4 % and the least performing is k-NN with just an accuracy of 69.01% in the classification and prediction of diseases.


Author(s):  
Sandy C. Lauguico ◽  
◽  
Ronnie S. Concepcion II ◽  
Jonnel D. Alejandrino ◽  
Rogelio Ruzcko Tobias ◽  
...  

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.


Diagnostics ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 104 ◽  
Author(s):  
Ahmed ◽  
Yigit ◽  
Isik ◽  
Alpkocak

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Bhagya M. Patil ◽  
Vishwanath Burkpalli

Cotton is one of the major crops in India, where 23% of cotton gets exported to other countries. The cotton yield depends on crop growth, and it gets affected by diseases. In this paper, cotton disease classification is performed using different machine learning algorithms. For this research, the cotton leaf image database was used to segment the images from the natural background using modified factorization-based active contour method. First, the color and texture features are extracted from segmented images. Later, it has to be fed to the machine learning algorithms such as multilayer perceptron, support vector machine, Naïve Bayes, Random Forest, AdaBoost, and K-nearest neighbor. Four color features and eight texture features were extracted, and experimentation was done using three cases: (1) only color features, (2) only texture features, and (3) both color and texture features. The performance of classifiers was better when color features are extracted compared to texture feature extraction. The color features are enough to classify the healthy and unhealthy cotton leaf images. The performance of the classifiers was evaluated using performance parameters such as precision, recall, F-measure, and Matthews correlation coefficient. The accuracies of classifiers such as support vector machine, Naïve Bayes, Random Forest, AdaBoost, and K-nearest neighbor are 93.38%, 90.91%, 95.86%, 92.56%, and 94.21%, respectively, whereas that of the multilayer perceptron classifier is 96.69%.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
M J Espinosa Pascual ◽  
P Vaquero Martinez ◽  
V Vaquero Martinez ◽  
J Lopez Pais ◽  
B Izquierdo Coronel ◽  
...  

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms


MATICS ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 21-27
Author(s):  
Via Ardianto Nugroho ◽  
Derry Pramono Adi ◽  
Achmad Teguh Wibowo ◽  
MY Teguh Sulistyono ◽  
Agustinus Bimo Gumelar

Pada industri jasa pelayanan peti kemas, Terminal Nilam merupakan pelanggan dari PT. BIMA, yang secara khusus bergerak dibidang jasa perbaikan dan perawatan alat berat. Terminal ini menjadi sentral tempat untuk melakukan aktifitas bongkar muat peti kemas domestik yang memiliki empat buah container crane untuk melayani dua kapal. Proses perawatan alat berat seperti container crane yang selama ini beroperasi, agaknya kurang memperhatikan data pengelompokkan atau klasifikasi jenis perawatan yang dibutuhkan oleh alat berat tersebut. Di kemudian hari, alat berat dapat menunjukkan kinerja yang tidak maksimal bahkan dapat berujung pada kecelakaan kerja. Selain itu, kelalaian perawatan container crane juga dapat menyebabkan pembengkakan biaya perawatan lanjut. Target produksi bongkar muat dapat berkurang dan juga keterlambatan jadwal kapal sandar sangat mungkin terjadi. Metode pembelajaran menggunakan mesin atau biasa disebut dengan Machine Learning (ML), dengan mudah dapat melenyapkan kemungkinan-kemungkinan tersebut. ML dalam penelitian ini, kami rancang agar bekerja dengan mengidentifikasi lalu mengelompokkan jenis perawatan container crane yang sesuai, yaitu ringan atau berat. Metode ML yang pilih untuk digunakan dalam penelitian ini yaitu Random Forest, Support Vector Machine, k-Nearest Neighbor, Naïve Bayes, Logistic Regression, J48, dan Decision Tree. Penelitian ini menunjukkan keberhasilan ML model tree dalam melakukan pembelajaran jenis data perawatan container crane (numerik dan kategoris), dengan J48 menunjukkan performa terbaik dengan nilai akurasi dan nilai ROC-AUC mencapai 99,1%. Pertimbangan klasifikasi kami lakukan dengan mengacu kepada tanggal terakhir perawatan, hour meter, breakdown, shutdown, dan sparepart.


Current global huge cyber protection attacks resulting from Infected Encryption ransomware structures over all international locations and businesses with millions of greenbacks lost in paying compulsion abundance. This type of malware encrypts consumer files, extracts consumer files, and charges higher ransoms to be paid for decryption of keys. An attacker could use different types of ransomware approach to steal a victim's files. Some of ransomware attacks like Scareware, Mobile ransomware, WannaCry, CryptoLocker, Zero-Day ransomware attack etc. A zero-day vulnerability is a software program security flaw this is regarded to the software seller however doesn’t have patch in vicinity to restore a flaw. Despite the fact that machine learning algorithms are already used to find encryption Ransomware. This is based on the analysis of a large number of PE file data Samples (benign software and ransomware utility) makes use of supervised machine learning algorithms for ascertain Zero-day attacks. This work was done on a Microsoft Windows operating system (the most attacked os through encryption ransomware) and estimated it. We have used four Supervised learning Algorithms, Random Forest Classifier , K-Nearest Neighbor, Support Vector Machine and Logistic Regression. Tests using machine learning algorithms evaluate almost null false positives with a 99.5% accuracy with a random forest algorithm.


2021 ◽  
Author(s):  
Choudhary Sobhan Shakeel ◽  
Saad Jawaid Khan ◽  
Syeda Fatima Aijaz ◽  
Umer Hassan ◽  
Beenish Chaudhry

BACKGROUND Alopecia areata is an auto-immune disorder that involves non-scarring hair loss in well-defined patches as well as affecting the entire scalp region and ultimately leads to baldness. The latest worldwide statistics have exhibited that Alopecia areata affects millions of people. Furthermore, the use of conventional methods often leads to poor diagnosis of Alopecia ultimately increasing the medical financial burden on the population. It has been reported that 85% of the individuals suffering from Alopecia areata complain about significant financial burden along with associated costs that are beyond cosmetic concerns. Many individuals adhere to treatment discontinuation owing to enhanced expenses and poor diagnosis. OBJECTIVE The objectives of the study comprise of utilizing datasets of healthy hairs and Alopecia areata, extracting color, texture and shape features from the images and applying machine learning algorithms including support vector machine (SVM) and k-nearest neighbor (KNN). METHODS Two datasets with images of healthy hairs and Alopecia areata have been utilized. A total of 200 healthy hair images were retrieved from Figaro1k dataset. A total of 68 images of Alopecia areata were retrieved from a dataset known as Dermnet. The images initially go through pre-processing steps including enhancement and segmentation. Following image segmentation, three features of color, texture and shape are extracted. Following feature extraction, machine learning algorithms including support vector machine (SVM) and k-nearest neighbor (KNN) are applied that aid in classifying Alopecia areata and healthy hairs. RESULTS A total of 81 images are tested with support vector machine (SVM) and k- nearest neighbor (KNN) yielding an accuracy of 91.4% and 88.9% respectively. The results of the paired sample T-test via SPSS analysis demonstrate a p < 0.001 and exhibits that the accuracies acquired from the two machine learning techniques are significantly different. The accuracies reported will enable a hair expert in recommending a suitable diagnosis and hair treatment regimen to a patient. CONCLUSIONS The application of support vector machine (SVM) presented an accuracy of 91.4% and that of k-nearest neighbor (KNN) presented an accuracy of 88.9%. These accuracies exhibit that the proposed classification framework is found to be successful and robust. However, future work with deep learning techniques such as convolutional neural networks (CNN) can be also be carried out and integrated with the existing system.


Sign in / Sign up

Export Citation Format

Share Document