scholarly journals Identifying the User As Genuine/Malign Based on Search Logs and Search History

2020 ◽  
Vol 9 (1) ◽  
pp. 2046-2048

-One of the major challenges a developer may face is security issues/threats on the labelled data. The labelled data comprises of system logs, network traffic or any other enriched data with threat/not threat classification. . There were few studies which categorized the URLs to a specific category like Arts, Technology, etc. In this paper the main research is on the classification of users based on the search logs(URLs). Manually it is difficult to differentiate the user based on search logs. So, we train a machine learning model that takes raw data as input and classifies the user to genuine or malign. This model helps in intrusion detection/suspicious activity detection. For this first we gather data of past malicious URLS as training set for Naïve Bayes algorithm to detect the malicious users. By implementing KNN algorithm effectively we can detect the malign users up to an accuracy of 94.28%. With the help of Machine Learning algorithms like Naïve Bayes, KNN, Random Forest classifiers we can classify the malign and genuine users.

Author(s):  
Ahmed T. Shawky ◽  
Ismail M. Hagag

In today’s world using data mining and classification is considered to be one of the most important techniques, as today’s world is full of data that is generated by various sources. However, extracting useful knowledge out of this data is the real challenge, and this paper conquers this challenge by using machine learning algorithms to use data for classifiers to draw meaningful results. The aim of this research paper is to design a model to detect diabetes in patients with high accuracy. Therefore, this research paper using five different algorithms for different machine learning classification includes, Decision Tree, Support Vector Machine (SVM), Random Forest, Naive Bayes, and K- Nearest Neighbor (K-NN), the purpose of this approach is to predict diabetes at an early stage. Finally, we have compared the performance of these algorithms, concluding that K-NN algorithm is a better accuracy (81.16%), followed by the Naive Bayes algorithm (76.06%).


Web use and digitized information are getting expanded each day. The measure of information created is likewise getting expanded. On the opposite side, the security assaults cause numerous security dangers in the system, sites and Internet. Interruption discovery in a fast system is extremely a hard undertaking. The Hadoop Implementation is utilized to address the previously mentioned test that is distinguishing interruption in a major information condition at constant. To characterize the strange bundle stream, AI methodologies are used. Innocent Bayes does grouping by a vector of highlight esteems produced using some limited set. Choice Tree is another Machine Learning classifier which is likewise an administered learning model. Choice tree is the stream diagram like tree structure. J48 and Naïve Bayes Algorithm are actualized in Hadoop MapReduce Framework for parallel preparing by utilizing the KDDCup Data Corrected Benchmark dataset records. The outcome acquired is 89.9% True Positive rate and 0.04% False Positive rate for Naive Bayes Algorithm and 98.06% True Positive rate and 0.001% False Positive rate for Decision Tree Algorithm.


2020 ◽  
Vol 6 (2) ◽  
pp. 213-222
Author(s):  
Ahmad Fauzi ◽  
Fanny Fatma Wati ◽  
Indah Sulistyowati ◽  
Muhammad Faittullah Akbar ◽  
Eka Rahmawati ◽  
...  

Abstract: Competition between banks can be seen from the various attempts by banks to find customers through various marketing activities in order to get as many customers as possible. In the past, business actors offered goods or services to consumers in a face-to-face manner, now by utilizing existing and sophisticated technology, they can use long-distance communication tools such as telephone and fax, as well as other electronic media. To make it easier to manage customer data, a data calcification is needed. Machine Learning Algorithms can be used to predict or classify data. One of the algorithms in Machine Learning is the Naive Bayes method. Naive Bayes is a simple probabilistic classification that calculates a set of probabilities by summing the frequency and value combinations from a given dataset. This research will predict a successful Telemarketing call in selling Bank products to customers. The Naive Bayes algorithm and the Backward Elimination feature selection can increase the accuracy value in predicting the success of telemarketing in selling bank products well, as evidenced by the accuracy value generated by Naive Bayes of 83.04%, then after being applied with the selection of the backward elimination feature it increases by 6.41. % to 89.45%. Keywords: Telemarketing, Machine Learning, Naive Bayes Abstrak: Persaingan antar bank dapat dilihat dari berbagai upaya bank dalam mencari nasabah dengan berbagai kegiatan pemasaran agar mendapat nasabah sebanyak-banyaknya. Dahulu para pelaku usaha menawarkan barang atau jasa kepada konsumen dengan cara bertatap muka langsung, sekarang dengan memanfaatkan teknologi yang ada dan canggih bisa menggunakan alat komunikasi jarak jauh seperti telepon dan fax, serta media elektronik lainnya. Untuk mempermudah mengelola data nasabah maka dibutuhkan sebuah pengkalsifikasian data. Algoritma Machine Learning dapat digunakan dalam memprediksi atau mengklasifikasikan sebuah data. Salah satu algoritma dalam Machine Learning adalah metode Naive Bayes. Naive Bayes merupakan sebuah pengklasifikasian probabilistik sederhana yang menghitung sekumpulan probabilitas dengan menjumlahkan frekuensi dan kombinasi nilai dari dataset yang diberikan. Pada penelitian ini akan memprediksi sebuah keberhasilan panggilan Telemarketing dalam menjula produk Bank kepada para nasabah. Algoritma Naive Bayes dan seleksi fitur Backward Elimination mampu meningkatkan nilai akurasi dalam memprediksi keberhasilan telemarketing dalam menjual produk bank dengan baik, dibuktikan dengan nilai akurasi yang dihasilkan naive bayes sebesar 83,04 %, kemudian setelah diterapkan dengan seleksi fitur backward elimination meningkat sebesa 6,41% menjadi 89,45%. Kata kunci: Telemarketing, Machine Learning, Naive Bayes


2020 ◽  
Vol 1 (2) ◽  
pp. 61-66
Author(s):  
Febri Astiko ◽  
Achmad Khodar

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.


2018 ◽  
Vol 7 (2.32) ◽  
pp. 363 ◽  
Author(s):  
N Rajesh ◽  
Maneesha T ◽  
Shaik Hafeez ◽  
Hari Krishna

Heart disease is the one of the most common disease. This disease is quite common now a days we used different attributes which can relate to this heart diseases well to find the better method to predict and we also used algorithms for prediction. Naive Bayes, algorithm is analyzed on dataset based on risk factors. We also used decision trees and combination of algorithms for the prediction of heart disease based on the above attributes. The results shown that when the dataset is small naive Bayes algorithm gives the accurate results and when the dataset is large decision trees gives the accurate results.  


Author(s):  
Sheela Rani P ◽  
Dhivya S ◽  
Dharshini Priya M ◽  
Dharmila Chowdary A

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.


Author(s):  
Muskan Patidar

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit


2018 ◽  
Vol 7 (3.12) ◽  
pp. 793 ◽  
Author(s):  
B Shanthi ◽  
Mahalakshmi N ◽  
Shobana M

Structural Health Monitoring is essential in today’s world where large amount of money and labour are involved in building a structure. There arises a need to periodically check whether the built structure is strong and flawless, also how long it will be strong and if not how much it is damaged. These information are needed so that the precautions can be made accordingly. Otherwise, it may result in disastrous accidents which may take away even human lives. There are various methods to evaluate a structure. In this paper, we apply various classification algorithms like J48, Naive Bayes and many other classifiers available, to the dataset to check on the accuracy of the prediction determined by all of these classification algorithms and ar-rive at the conclusion of the best possible classifier to say whether a structure is damaged or not.  


2020 ◽  
Vol 19 ◽  
pp. 153303382090982
Author(s):  
Melek Akcay ◽  
Durmus Etiz ◽  
Ozer Celik ◽  
Alaattin Ozen

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.


Sign in / Sign up

Export Citation Format

Share Document