scholarly journals Malicious URL Detection using Logistic Regression

Author(s):  
Rohit Rayala ◽  
Sashank Pasumarthi ◽  
Rohith Kuppa ◽  
S R KARTHIK

Paper is based on a model that is built to detect malicious URLs using machine learning techniques.

Author(s):  
M. Carr ◽  
V. Ravi ◽  
G. Sridharan Reddy ◽  
D. Veranna

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.


2018 ◽  
Author(s):  
Sandip S Panesar ◽  
Rhett N D’Souza ◽  
Fang-Cheng Yeh ◽  
Juan C Fernandez-Miranda

AbstractBackgroundMachine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.MethodsWe applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).ResultsRaw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.ConclusionsWe demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.


Author(s):  
Joshua J. Levy ◽  
A. James O’Malley

AbstractBackgroundMachine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each.MethodsWe present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions.ResultsPreliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output.ConclusionsWhen a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.


2021 ◽  
Author(s):  
Alan Lopes de Sousa Freitas ◽  
Ana Silvia Degasperi Ieker ◽  
Josiane Melchiori Pinheiro ◽  
Wilson Rinaldi ◽  
Heloise Manica Paris Teixeira

Cardiometabolic diseases, developed throughout the worker’s life,such as hypertension, diabetes, dyslipidemia and obesity are amongthe main causes of death and are associated with modifiable andcontrollable risk factors. The general objective of this study wasto apply supervised Machine Learning techniques and to comparetheir performance to predict the risk of developing cardiometabolicdisease from servers working at the School Hospital of south inBrazil. We sought to map the characteristics of individuals who aremore likely to develop cardiometabolic diseases. The machine learningmodels evaluated were Naive Bayes, Decision Tree, RandomForest, KNN, Logistic Regression and SVM. The results obtained inthe experiments showed that some supervised machine learningmodels produce a good classification, depending on the attributesand hyperparameters used.


2020 ◽  
Vol 17 (9) ◽  
pp. 4092-4097
Author(s):  
Inchara Yogesh ◽  
K. R. Suresh Kumar ◽  
Niveditha Candrashekaran ◽  
Dhrithi Reddy ◽  
Harshitha Sampath

Employee turn_over inflicts costs on the company. The employee must be supplanted, and the new employee trained. These quits may likewise make critical and exorbitant interruptions the production process. This gives lucid motivation to the firm to forestall stops or, in any event, to have the option to anticipate when and where stops can be anticipated. On the off chance that employees are approached to assess their superiors and the appropriate responses will be made accessible to the superior, it is most obvious that only positive feedbacks will be provided. Along these lines, the point is to utilize Machine Learning techniques to foresee employee turn_over. Appropriate predictions cause companies to take necessary decisions on employee retention or succession planning. Algorithms: One-Sample T-Test (T-Test), Decision Tree (DT), AdaBoost (AB), Logistic Regression (LR), Random Forest Classifier (RFC).


Algorithms ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 202
Author(s):  
Abdul Karim ◽  
Azhari Azhari ◽  
Samir Brahim Belhaouri ◽  
Ali Adil Qureshi ◽  
Maqsood Ahmad

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.


2018 ◽  
Vol 7 (2.7) ◽  
pp. 676 ◽  
Author(s):  
V Uma Ramya ◽  
K Thirupathi Rao

Today's online world was fully filled up with blogs, views, comments, posts through various websites and social-surfs. People were habituated with posting every incident into blogs, messed with comments like text and emotions, which are a mixed bag of sad, happy, worry, cry etc. Analysing such data was called as Sentimental Analysis. To analysis, these unordered data we use new emerged technology algorithms. Machine learning a transpire technology which is engaged with almost all the fields, where its algorithms are more powerful that give with better faultless results. In this paper, we are analyzing tweets based on movie reviews using the Multinomial Logistic Regression, Naïve Bayes, and SVM algorithms to compare score value to show the best text analysis algorithm. 


2018 ◽  
Vol 3 (1) ◽  
pp. 18 ◽  
Author(s):  
Alfensi Faruk ◽  
Endro Setyo Cahyono

Machine learning (ML) is a subject that focuses on the data analysis using various statistical tools and learning processes in order to gain more knowledge from the data. The objective of this research was to apply one of the ML techniques on the low birth weight (LBW) data in Indonesia. This research conducts two ML tasks; including prediction and classification. The binary logistic regression model was firstly employed on the train and the test data. Then; the random approach was also applied to the data set. The results showed that the binary logistic regression had a good performance for prediction; but it was a poor approach for classification. On the other hand; random forest approach has a very good performance for both prediction and classification of the LBW data set


Sign in / Sign up

Export Citation Format

Share Document