Phishing Detection: A Case Analysis on Classifiers with Rules Using Machine Learning

2017 ◽  
Vol 16 (04) ◽  
pp. 1750034 ◽  
Author(s):  
Fadi Thabtah ◽  
Firuz Kamalov

A typical predictive approach in data mining that produces If-Then knowledge for decision making is rule-based classification. Rule-based classification includes a large number of algorithms that fall under the categories of covering, greedy, rule induction, and associative classification. These approaches have shown promising results due to the simplicity of the models generated and the user’s ability to understand, and maintain them. Phishing is one of the emergent online threats in web security domains that necessitates anti-phishing models with rules so users can easily differentiate among website types. This paper critically analyses recent research studies on the use of predictive models with rules for phishing detection, and evaluates the applicability of these approaches on phishing. To accomplish our task, we experimentally evaluate four different rule-based classifiers that belong to greedy, associative classification and rule induction approaches on real phishing datasets and with respect to different evaluation measures. Moreover, we assess the classifiers derived and contrast them with known classic classification algorithms including Bayes Net and Simple Logistics. The aim of the comparison is to determine the pros and cons of predictive models with rules and reveal their actual performance when it comes to detecting phishing activities. The results clearly showed that eDRI, a recently greedy algorithm, not only generates useful models but these are also highly competitive with respect to predictive accuracy as well as runtime when they are employed as anti-phishing tools.

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Zhongmei Zhou

A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when the minimum support is set to be low. It is difficult to select a high quality rule set for classification. Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence. In comparison with associative classification, some improved traditional rule-based classification approaches often produce a classification rule set that plays an important role in prediction. Thus, some improved traditional rule-based classification approaches not only achieve better efficiency than associative classification but also get higher accuracy. In this paper, we put forward a new classification approach called CMR (classification based on multiple classification rules). CMR combines the advantages of both associative classification and rule-based classification. Our experimental results show that CMR gets higher accuracy than some traditional rule-based classification methods.


2014 ◽  
Vol 53 (02) ◽  
pp. 137-148 ◽  
Author(s):  
M. Sikora ◽  
Ł. Wróbel

SummaryObjectives: Rule induction is one of the major methods of machine learning. Rule-based models can be easily read and interpreted by humans, that makes them particularly useful in survival studies as they can help clinicians to better understand analysed data and make informed decisions about patient treatment. Although of such usefulness, there is still a little research on rule learning in survival analysis. In this paper we take a step towards rule-based analysis of survival data.Methods: We investigate so-called covering or separate-and-conquer method of rule induction in combination with a weighting scheme for handling censored observations. We also focus on rule quality measures being one of the key elements differentiating particular implementations of separate-and-conquer rule induction algorithms. We examine 15 rule quality measures guiding rule induction process and reflecting a wide range of different rule learning heuristics.Results: The algorithm is extensively tested on a collection of 20 real survival datasets and compared with the state-of-the-art survival trees and random survival forests algorithms. Most of the rule quality measures outperform Kaplan-Meier estimate and perform at least equally well as tree-based algorithms.Conclusions: Separate-and-conquer rule induction in combination with weighting scheme is an effective technique for building rule-based models of survival data which, according to predictive accuracy, are competitive with tree-based representations.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lisha Yu ◽  
Yang Zhao ◽  
Hailiang Wang ◽  
Tien-Lung Sun ◽  
Terrence E. Murphy ◽  
...  

Abstract Background Poor balance has been cited as one of the key causal factors of falls. Timely detection of balance impairment can help identify the elderly prone to falls and also trigger early interventions to prevent them. The goal of this study was to develop a surrogate approach for assessing elderly’s functional balance based on Short Form Berg Balance Scale (SFBBS) score. Methods Data were collected from a waist-mounted tri-axial accelerometer while participants performed a timed up and go test. Clinically relevant variables were extracted from the segmented accelerometer signals for fitting SFBBS predictive models. Regularized regression together with random-shuffle-split cross-validation was used to facilitate the development of the predictive models for automatic balance estimation. Results Eighty-five community-dwelling older adults (72.12 ± 6.99 year) participated in our study. Our results demonstrated that combined clinical and sensor-based variables, together with regularized regression and cross-validation, achieved moderate-high predictive accuracy of SFBBS scores (mean MAE = 2.01 and mean RMSE = 2.55). Step length, gender, gait speed and linear acceleration variables describe the motor coordination were identified as significantly contributed variables of balance estimation. The predictive model also showed moderate-high discriminations in classifying the risk levels in the performance of three balance assessment motions in terms of AUC values of 0.72, 0.79 and 0.76 respectively. Conclusions The study presented a feasible option for quantitatively accurate, objectively measured, and unobtrusively collected functional balance assessment at the point-of-care or home environment. It also provided clinicians and elderly with stable and sensitive biomarkers for long-term monitoring of functional balance.


2021 ◽  
Vol 20 (01) ◽  
pp. 2150013
Author(s):  
Mohammed Abu-Arqoub ◽  
Wael Hadi ◽  
Abdelraouf Ishtaiwi

Associative Classification (AC) classifiers are of substantial interest due to their ability to be utilised for mining vast sets of rules. However, researchers over the decades have shown that a large number of these mined rules are trivial, irrelevant, redundant, and sometimes harmful, as they can cause decision-making bias. Accordingly, in our paper, we address these challenges and propose a new novel AC approach based on the RIPPER algorithm, which we refer to as ACRIPPER. Our new approach combines the strength of the RIPPER algorithm with the classical AC method, in order to achieve: (1) a reduction in the number of rules being mined, especially those rules that are largely insignificant; (2) a high level of integration among the confidence and support of the rules on one hand and the class imbalance level in the prediction phase on the other hand. Our experimental results, using 20 different well-known datasets, reveal that the proposed ACRIPPER significantly outperforms the well-known rule-based algorithms RIPPER and J48. Moreover, ACRIPPER significantly outperforms the current AC-based algorithms CBA, CMAR, ECBA, FACA, and ACPRISM. Finally, ACRIPPER is found to achieve the best average and ranking on the accuracy measure.


2021 ◽  
pp. postgradmedj-2021-140754
Author(s):  
Wei Syun Hu ◽  
Cheng Li Lin

PurposeThis is a nationwide-based retrospective study aiming to compare the three different scoring systems (CHA2DS2-VASc, C2HEST and HAVOC scores) in the prediction of atrial fibrillation (AF) in patients with rheumatological disease.MethodsWe used the Fine and Gray model to estimate the risk of AF (subhazard ratio and 95% CI). The predictive accuracy and discriminatory ability of the predictive model were evaluated by receiver operating characteristic (ROC) curve.ResultsAmong the three predictive models, the model using CHA2DS2-VASc score had the better discriminative ability with an ROC of 0.79. The model with C2HEST score had an ROC of 0.78. The discriminative ability of the HAVOC score was 0.77, estimated by ROC.ConclusionWe concluded the CHA2DS2-VASc score has better performance in predicting AF compared with C2HEST score or HAVOC score.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Associative Classification (AC) or Class Association Rule (CAR) mining is a very efficient method for the classification problem. It can build comprehensible classification models in the form of a list of simple IF-THEN classification rules from the available data. In this paper, we present a new, and improved discrete version of the Crow Search Algorithm (CSA) called NDCSA-CAR to mine the Class Association Rules. The goal of this article is to improve the data classification accuracy and the simplicity of classifiers. The authors applied the proposed NDCSA-CAR algorithm on eleven benchmark dataset and compared its result with traditional algorithms and recent well known rule-based classification algorithms. The experimental results show that the proposed algorithm outperformed other rule-based approaches in all evaluated criteria.


2021 ◽  
pp. 1197-1206
Author(s):  
Kai Zu ◽  
Kristina L. Greenwood ◽  
Joyce C. LaMori ◽  
Besa Smith ◽  
Tyler Smith ◽  
...  

PURPOSE This study evaluated risk factors predicting unplanned 30-day acute service utilization among adults subsequent to hospitalization for a new diagnosis of leukemia, lymphoma, or myeloma. This study explored the prevalence of medical complications (aligned with OP-35 measure specifications from the Centers for Medicare & Medicaid Services [CMS] Hospital Outpatient Quality Reporting Program) and the potential impact of psychosocial factors on unplanned acute care utilization. METHODS This study included 933 unique patients admitted to three acute care inpatient facilities within a nonprofit community-based health care system in southern California from 2012 to 2017. Integrated comprehensive data elements from electronic medical records and facility oncology registries were leveraged for univariate statistics, predictive models constructed using multivariable logistic regression, and further exploratory data mining, with predictive accuracy of the models measured with c-statistics. RESULTS The mean age of study participants was 65 years, and 55.1% were male. Specific diagnoses were lymphoma (48.7%), leukemia (35.2%), myeloma (14.0%), and mixed types (2.1%). Approximately one fifth of patients received unplanned acute care services within 30 days postdischarge, and over half of these patients presented with one or more symptoms associated with the CMS medical complication measure. The predictive models, with c-statistics ranging from 0.7 and above for each type of hematologic malignancy, indicated good predictive qualities with the impact of psychosocial functioning on the use of acute care services ( P values < .05), including lack of consult for social work during initial admission (lymphoma or myeloma), history of counseling or use of psychotropic medications (lymphoma), and past substance use (myeloma). CONCLUSION This study provides insights into patient-related factors that may inform a proactive approach to improve health outcomes, such as enhanced care transition, monitoring, and support interventions.


2019 ◽  
Vol 39 (2-3) ◽  
pp. 250-265 ◽  
Author(s):  
David Fridovich-Keil ◽  
Andrea Bajcsy ◽  
Jaime F Fisac ◽  
Sylvia L Herbert ◽  
Steven Wang ◽  
...  

One of the most difficult challenges in robot motion planning is to account for the behavior of other moving agents, such as humans. Commonly, practitioners employ predictive models to reason about where other agents are going to move. Though there has been much recent work in building predictive models, no model is ever perfect: an agent can always move unexpectedly, in a way that is not predicted or not assigned sufficient probability. In such cases, the robot may plan trajectories that appear safe but, in fact, lead to collision. Rather than trust a model’s predictions blindly, we propose that the robot should use the model’s current predictive accuracy to inform the degree of confidence in its future predictions. This model confidence inference allows us to generate probabilistic motion predictions that exploit modeled structure when the structure successfully explains human motion, and degrade gracefully whenever the human moves unexpectedly. We accomplish this by maintaining a Bayesian belief over a single parameter that governs the variance of our human motion model. We couple this prediction algorithm with a recently proposed robust motion planner and controller to guide the construction of robot trajectories that are, to a good approximation, collision-free with a high, user-specified probability. We provide extensive analysis of the combined approach and its overall safety properties by establishing a connection to reachability analysis, and conclude with a hardware demonstration in which a small quadcopter operates safely in the same space as a human pedestrian.


Author(s):  
Brook Tesfaye ◽  
Suleman Atique ◽  
Tariq Azim ◽  
Mihiretu M. Kebede

Abstract Background Skilled assistance during childbirth is essential to reduce maternal deaths. However, in Ethiopia, which is among the six countries contributing to more than half of the global maternal deaths, the coverage of births attended by skilled health personnel remains very low. The aim of this study was to identify determinants and develop a predictive model for skilled delivery service use in Ethiopia by applying logistic regression and machine-learning techniques. Methods Data from the 2016 Ethiopian Demographic and Health Survey (EDHS) was used for this study. Statistical Package for Social Sciences (SPSS) and Waikato Environment for Knowledge Analysis (WEKA) tools were used for logistic regression and model building respectively. Classification algorithms namely J48, Naïve Bayes, Support Vector Machine (SVM), and Artificial Neural Network (ANN) were used for model development. The validation of the predictive models was assessed using accuracy, sensitivity, specificity, and area under Receiver Operating Characteristics (ROC) curve. Results Only 27.7% women received skilled delivery assistance in Ethiopia. First antenatal care (ANC) [AOR = 1.83, 95% CI (1.24–2.69)], birth order [AOR = 0.22, 95% CI (0.11–0.46)], television ownership [AOR = 6.83, 95% CI (2.52–18.52)], contraceptive use [AOR = 1.92, 95% CI (1.26–2.97)], cost needed for healthcare [AOR = 2.17, 95% CI (1.47–3.21)], age at first birth [AOR = 1.96, 95% CI (1.31–2.94)], and age at first sex [AOR = 2.72, 95% CI (1.55–4.76)] were determinants for utilizing skilled delivery services during the childbirth. Predictive models were developed and the J48 model had superior predictive accuracy (98%), sensitivity (96%), specificity (99%) and, the area under ROC (98%). Conclusions First ANC and contraceptive uses were among the determinants of utilization of skilled delivery services. A predictive model was developed to forecast the likelihood of a pregnant woman seeking skilled delivery assistance; therefore, the predictive model can help to decide targeted interventions for a pregnant woman to ensure skilled assistance at childbirth. The model developed through the J48 algorithm has better predictive accuracy. Web-based application can be build based on results of this study.


Sign in / Sign up

Export Citation Format

Share Document