Random Forest Bagging and X-Means Clustered Antipattern Detection from SQL Query Log for Accessing Secure Mobile Data

In the current ongoing crisis, people mostly rely on mobile phones for all the activities, but query analysis and mobile data security are major issues. Several research works have been made on efficient detection of antipatterns for minimizing the complexity of query analysis. However, more focus needs to be given to the accuracy aspect. In addition, for grouping similar antipatterns, a clustering process was performed to eradicate the design errors. To address the above-said issues and further enhance the antipattern detection accuracy with minimum time and false positive rate, in this work, Random Forest Bagging X-means SQL Query Clustering (RFBXSQLQC) technique is proposed. Different patterns or queries are initially gathered from the input SQL query log, and bootstrap samples are created. Then, for each pattern, various weak clusters are constructed via X-means clustering and are utilized as the weak learner (clusters). During this process, the input patterns are categorized into different clusters. Using the Bayesian information criterion, the similarity measure is employed to evaluate the similarity between the patterns and cluster weight. Based on the similarity value, patterns are assigned to either relevant or irrelevant groups. The weak learner results are aggregated to form strong clusters, and, with the aid of voting, a majority vote is considered for designing strong clusters with minimum time. Experiments are conducted to evaluate the performance of the RFBXSQLQC technique using the IIT Bombay dataset using the metrics like antipattern detection accuracy, time complexity, false-positive rate, and computational overhead with respect to the differing number of queries. The results revealed that the RFBXSQLQC technique outperforms the existing algorithms by 19% with pattern detection accuracy, 34% minimized time complexity, 64% false-positive rate, and 31% in terms of computational overhead.

Download Full-text

Kleinberg’s Hyper-Richness Based Fuzzy Partition Clustering for Efficient Bi-Temporal Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h7095.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 2422-2430

Keyword(s):

False Positive ◽

Time Complexity ◽

False Positive Rate ◽

Space Complexity ◽

Core Data ◽

Positive Rate ◽

Data Points ◽

Point Module ◽

Data Point ◽

Partition Clustering

Clustering is the process used for partitioning the total dataset into different classes of similar objects. The group contains knowledge about their members and also helps to understand the structure of the dataset very easily. Clustering the bitemporal data is one of the major tasks in data mining since the bitemporal datasets are very large with various attribute counts. Hence the accurate clustering is still challenging tasks. In order to improve the clustering accuracy with less complexity, Kleinberg’s Hyper-richness Bitemporal property based fuzzy c means partition Clustering (KHBP-FCMPC) technique is introduced. The KHBP-FCMPC technique partition the bitemporal dataset into number of possible groups with an improved performance rate based on a distance metric. At first, the ‘c’ numbers of clusters are initialized. The KHBP-FCMPC technique uses the core data point module and authority sector module to minimize the execution time of clustering the data points. Core data point module served as the centroid of the cluster. Each cluster contains one core data point. After that, the distance is computed with the membership function. The authority sector module assigns the data points into the cluster with minimum distance. After that, the centroid is updated and the process iterated until the convergence is met. Finally, the Kleinberg’s Hyper-richness Bitemporal property is applied to verify the total dataset equals the partition of all the data points. This property used to group the entire data points into the cluster with higher accuracy. Experimental evaluation is carried out using a temporal dataset with different factors such as clustering accuracy, false positive rate, time complexity and space complexity with a number of data points. The experimental results show that the proposed KHBP-FCMPC technique increases the bitemporal data clustering accuracy with less false positive rate, time complexity as well as space complexity. Based on the results observations, KHBP-FCMPC technique is more efficient than the state-of-the-art methods.

Download Full-text

Hybrid rule-based botnet detection approach using machine learning for analysing DNS traffic

PeerJ Computer Science ◽

10.7717/peerj-cs.640 ◽

2021 ◽

Vol 7 ◽

pp. e640

Author(s):

Saif Al-mashhadi ◽

Mohammed Anbar ◽

Iznan Hasbullah ◽

Taief Alaa Alamiedy

Keyword(s):

Machine Learning ◽

False Positive ◽

False Positive Rate ◽

Communication Protocols ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Botnet Detection ◽

Internet Service ◽

Positive Rate

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.

Download Full-text

FSM based Intrusion Detection of Packet Dropping Attack using Trustworthy Watchdog Nodes

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200730223837 ◽

2020 ◽

Vol 13 ◽

Author(s):

Radha Raman Chandan ◽

P.K Mishra

Keyword(s):

Intrusion Detection ◽

False Positive ◽

False Positive Rate ◽

Detection Accuracy ◽

Context Aware ◽

Malicious Nodes ◽

Detection Scheme ◽

Packet Dropping ◽

Positive Rate ◽

Trust Calculation

Introduction: * The proposed TWIST model aims to achieve a secure MANET by detecting and mitigating packet dropping attack using finite state machine based IDS model. * To determine the trust values of the nodes using context-aware trust calculation * To select the trustworthy nodes as watchdog nodes for performing intrusion detection on the network * To detect and isolate the packet dropping attackers from routing activities, the scheme uses FSM based IDS for differen-tiating the packet dropping attacks from genuine nodes in the MANET. Method: In this methodology, instead of launching an intrusion detection system (IDS) in all nodes, an FSM based IDS is placed in the trustworthy watchdog nodes for detecting packet dropping attacker nodes in the network. The proposed FSM based intrusion detection scheme has three steps. The three main steps in the proposed scheme are context- aware trust calculation, watchdog node selection, and FSM based intrusion detection. In the first process, the trust calculation for each node is based on specific parameters that are different for malicious nodes and normal nodes. The second step is the watchdog node selection based on context-aware trust value calculation for ensuring that the trust-worthy network monitors are used for detecting attacker nodes in the network. The final process is FSM based intrusion detection, where the nodes acquire each state based on their behavior during the data routing. Based on the node behavior, the state transition occurs, and the nodes that drop data packets exceeding the defined threshold are moved to the malicious state and restricted to involve in further routing and services in the network Result: The performance of the proposed (TWIST) mechanism is assessed using the Network Simulator 2 (NS2). The proposed TWIST model is implemented by modifying the Ad-Hoc On-Demand Distance Vector (AODV) protocol files in NS2. Moreover, the proposed scheme is compared with Detection and Defense against Packet Drop attack in the MANET (DDPD) scheme. A performance analysis is done for the proposed TWIST model using performance metrics such as detection accuracy, false-positive rate, and overhead and the performance result is compared with that of the DDPD scheme. After the compare result we have analyzed that the proposed TWIST model exhibits better performance in terms of detection accuracy, false positive rate, energy consumption, and overhead compared to the existing DDPD scheme. Conclusion: In the TWIST model, an efficient packet dropping detection scheme based on the FSM model is proposed that efficiently detects the packet dropping attackers in the MANET. The trust is evaluated for each node in the network, and the nodes with the highest trust value are selected as watchdog nodes. The trust calculation based on parameters such as residual energy, the interaction between nodes and the neighbor count is considered for determining watchdog node selec-tion. Thus, the malicious nodes that drop data packets during data forwarding cannot be selected as watchdog nodes. The FSM based intrusion detection is applied in the watchdog nodes for detecting attackers accurately by monitoring the neigh-bor nodes for malicious behavior. The performance analysis is performed between the proposed TWIST mechanism and existing DDPD scheme. The proposed TWIST model exhibits better performance in terms of detection accuracy, false positive rate, energy consumption, and overhead compared to the existing DDPD scheme Discussion: This work may extend the conventional trust measurement of MANET routing, which adopts only routing behavior observation to cope with malicious activity. In addition, performance evaluation of proposed work under packet dropping attack has not been performed for varying the mobility of nodes in terms of speed. Furthermore, various perfor-mance metric parameters like route discovery latency and malicious discovery ratio which can be added for evaluate the performance of protocol in presence of malicious nodes. This may be considered in future work for extension of protocol for better and efficient results. Furthermore, In the future, the scheme will focus on providing proactive detection of packet dropping attacker nodes in MANET using a suitable and efficient statistical method.

Download Full-text

Heuristic based malicious URL detection

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0296 ◽

2020 ◽

pp. 267-271

Keyword(s):

False Positive ◽

Statistical Approach ◽

False Positive Rate ◽

The Body ◽

Detection Accuracy ◽

Original Algorithm ◽

Positive Rate ◽

Email Message ◽

Phishing Detection ◽

False Positive Problem

Phishing is one of the most potentially disruptive actions that can be performed on the Internet. Intellectual property and other pertinent business information could potentially be at risk if a user falls for a phishing attack . The adversary sends an email with a link to a fraudulent site to lure consumers into divulging their confidential information. One of the main goal of this research is to detect phishing attempts via email. The algorithm in the previous work analyses the body text in an email to detect whether the email message asks the user to do some action such as clicking on the link that directs the user to a fraudulent website. This work expanded the text analysis portion of that algorithm, which performed poorly in catching phishing emails. The original algorithm has considerably have a lower result in filtering out malicious email as compared to modified algorithm.To address the False Positive problem, a statistical approach was adopted and the method ameliorated the False Positive Rate while minimizing the decrease in the phishing detection accuracy.

Download Full-text

Probit Regressed Feature Selection Based Linear Programming Boost Classification for Tumor Risk Factor Identification and Disease Diagnosis

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6087.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 8152-8160

Keyword(s):

Linear Programming ◽

Risk Factor ◽

False Positive ◽

False Positive Rate ◽

Computation Time ◽

Disease Diagnosis ◽

Patient Data ◽

Weak Learner ◽

Tumor Risk ◽

Positive Rate

Accurate diagnosis of survival rate in patients with tumor remains challenges due to the increasing complexity of treatment protocols, and different patient population samples. Due to the complexity, the risk factor of the patients gets increased. Therefore, a reliable and well-validated prediction needs to develop the automatic disease diagnosis for early detection of the tumor. The novel technique called Probit Regressed Feature Selection based Iterative Linear Programming Boost Classification (PRFS-ILPBC) is introduced for tumor risk factor identification and disease diagnosis of patient data with higher accuracy and lesser time consumption. In PRFS-ILPBC technique, Probit Regression model is a regression type to estimate the relationship between the features and the disease symptoms using bivariate correlation coefficient. Based on the correlation results, the features fall into any one of the two classes (i.e. relevant or irrelevant). With the help of relevant feature, Iterative Linear Programming (LP) Boost Classification model is applied to perform classification by combining the weak learner for tumor risk factor identification and disease diagnosis. LPBoost constructs the strong classifier through initiating with a set of weak classifiers. The training data (i.e. patient data) are taken as the input and added to the set of considered weak classifiers. The kernelized support vector machine act as weak learner compares the training features with the testing results to identify the risk factor and classify the patient data into normal or abnormal. The ensemble classifier improves disease diagnosis accuracy and reduces the false positive rate. Experimental evaluation of proposed PRFS-ILPBC technique and existing methods are carried out with different factors such as disease diagnosing accuracy, false positive rate, and computation time with respect to a number of patient data. The observed results reported that the proposed PRFS-ILPBC technique achieves higher disease diagnosing accuracy with minimum computation time as well as false positive rate than the conventional techniques

Download Full-text

Improving diagnosis of acute appendicitis with atypical findings by Tc-99m HMPAO leukocyte scan

Nuklearmedizin ◽

10.1055/s-0038-1623994 ◽

2002 ◽

Vol 41 (01) ◽

pp. 37-41 ◽

Cited By ~ 3

Author(s):

S. Shung-Shung ◽

S. Yu-Chien ◽

Y. Mei-Due ◽

W. Hwei-Chung ◽

A. Kao

Keyword(s):

Acute Appendicitis ◽

False Positive ◽

False Positive Rate ◽

Accurate Method ◽

Clinical Findings ◽

Pathological Findings ◽

Lower Quadrant ◽

Predictive Values ◽

Positive Rate

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.

Download Full-text

Predicting Fetal Chromosome Anomalies in the First Trimester Using Pregnancy Associated Plasma Protein-A: A Comparison of Statistical Methods

Methods of Information in Medicine ◽

10.1055/s-0038-1634910 ◽

1993 ◽

Vol 32 (02) ◽

pp. 175-179 ◽

Cited By ~ 7

Author(s):

B. Brambati ◽

T. Chard ◽

J. G. Grudzinskas ◽

M. C. M. Macintosh

Keyword(s):

Logistic Regression ◽

General Population ◽

Likelihood Ratio ◽

False Positive ◽

False Positive Rate ◽

Ratio Method ◽

Detection Rates ◽

Gaussian Distributions ◽

Positive Rate ◽

Likelihood Ratio Method

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.

Download Full-text

Identification of and Correction for Publication Bias: Comment

10.31222/osf.io/dh87m ◽

2019 ◽

Author(s):

Amanda Kvarven ◽

Eirik Strømland ◽

Magnus Johannesson

Keyword(s):

Publication Bias ◽

False Positive ◽

Large Scale ◽

Meta Analysis ◽

False Positive Rate ◽

Effect Sizes ◽

Replication Studies ◽

Moderate Reduction ◽

Positive Rate ◽

Meta Analyses

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.

Download Full-text

The false positive rate of P300-based concealed information test

Korean Journal of Cognitive and Biological Psychology ◽

10.22172/cogbio.2018.30.3.003 ◽

2018 ◽

Vol 30 (3) ◽

pp. 241-259 ◽

Cited By ~ 1

Author(s):

엄진섭 ◽

Jin-Hun Sohn ◽

Hajung Jeon

Keyword(s):

False Positive ◽

False Positive Rate ◽

Concealed Information Test ◽

Positive Rate ◽

Concealed Information ◽

Information Test

Download Full-text

PRATD: A Phased Remote Access Trojan Detection Method with Double-Sided Features

Electronics ◽

10.3390/electronics9111894 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1894

Author(s):

Chun Guo ◽

Zihua Song ◽

Yuan Ping ◽

Guowei Shen ◽

Yuhei Cui ◽

...

Keyword(s):

False Positive ◽

Detection Method ◽

False Positive Rate ◽

True Positive Rate ◽

Remote Access ◽

Detection Methods ◽

Security Threats ◽

True Positive ◽

Trojan Detection ◽

Positive Rate

Remote Access Trojan (RAT) is one of the most terrible security threats that organizations face today. At present, two major RAT detection methods are host-based and network-based detection methods. To complement one another’s strengths, this article proposes a phased RATs detection method by combining double-side features (PRATD). In PRATD, both host-side and network-side features are combined to build detection models, which is conducive to distinguishing the RATs from benign programs because that the RATs not only generate traffic on the network but also leave traces on the host at run time. Besides, PRATD trains two different detection models for the two runtime states of RATs for improving the True Positive Rate (TPR). The experiments on the network and host records collected from five kinds of benign programs and 20 famous RATs show that PRATD can effectively detect RATs, it can achieve a TPR as high as 93.609% with a False Positive Rate (FPR) as low as 0.407% for the known RATs, a TPR 81.928% and FPR 0.185% for the unknown RATs, which suggests it is a competitive candidate for RAT detection.

Download Full-text