An Ensemble Cost-Sensitive One-Class Learning Framework for Malware Detection

Machine learning is among the most popular methods in designing unknown and variant malware detection algorithms. However, most of the existing methods take a single type of features to build binary classifiers. In practice, these methods have limited ability in depicting malware characteristics and the binary classification suffers from inadequate sampling of benign samples and extremely imbalanced training samples when detecting malware. In this paper, we present a malware detection Framework based on ENsemble One-Class Learning, namely FENOC. It uses hybrid features at different semantic layers to ensure a comprehensive insight of the program to be analyzed. We construct the malware detector by a novel learning algorithm called Cost-sensitive Twin One-class Classifier (CosTOC), which uses a pair of one-class classifiers to describe malware and benign programs respectively. CosTOC is more flexible and robust in comparison to conventional binary classifiers when training samples are extremely imbalanced or the benign programs are inadequately sampled. Finally, random subspace method and clustering-based ensemble method are developed to enhance the generalization ability of CosTOC. Experimental results show that FENOC gives a comparative detection rate and a lower false positive rate than many other binary classification algorithms, especially when the detector are trained with imbalanced data, or evaluated in terms of false positive rate.

Download Full-text

Cross-site Scripting Attack Detection Using Machine Learning with Hybrid Features

JURNAL INFOTEL ◽

10.20895/infotel.v13i1.606 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-6

Author(s):

Dimaz Arno Prasetio ◽

Kusrini Kusrini ◽

M. Rudyanto Arief

Keyword(s):

Feature Selection ◽

False Positive ◽

Good Accuracy ◽

False Positive Rate ◽

Attack Detection ◽

Feature Modeling ◽

Hybrid Features ◽

Positive Rate ◽

N Gram ◽

Better Than

This study aims to measure the classification accuracy of XSS attacks by using a combination of two methods of determining feature characteristics, namely using linguistic computation and feature selection. XSS attacks have a certain pattern in their character arrangement, this can be studied by learners using n-gram modeling, but in certain cases XSS characteristics can contain a certain meta and synthetic this can be learned using feature selection modeling. From the results of this research, hybrid feature modeling gives good accuracy with an accuracy value of 99.87%, it is better than previous studies which the average is still below 99%, this study also tries to analyze the false positive rate considering that the false positive rate in attack detection is very influential for the convenience of the information security team, with the modeling proposed, the false positive rate is very small, namely 0.039%

Download Full-text

A novel machine learning algorithm has the potential to reduce by 1/3 the quantity of ILR episodes needing review

European Heart Journal ◽

10.1093/eurheartj/ehab724.0316 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

A Rosier ◽

E Crespin ◽

A Lazarus ◽

G Laurent ◽

A Menet ◽

...

Keyword(s):

Machine Learning ◽

False Positive ◽

Learning Algorithm ◽

False Positive Rate ◽

Machine Learning Algorithm ◽

The Novel ◽

Funding Sources ◽

High Workload ◽

Positive Rate ◽

The Impact

Abstract Background Implantable Loop Recorders (ILRs) are increasingly used and generate a high workload for timely adjudication of ECG recordings. In particular, the excessive false positive rate leads to a significant review burden. Purpose A novel machine learning algorithm was developed to reclassify ILR episodes in order to decrease by 80% the False Positive rate while maintaining 99% sensitivity. This study aims to evaluate the impact of this algorithm to reduce the number of abnormal episodes reported in Medtronic ILRs. Methods Among 20 European centers, all Medtronic ILR patients were enrolled during the 2nd semester of 2020. Using a remote monitoring platform, every ILR transmitted episode was collected and anonymised. For every ILR detected episode with a transmitted ECG, the new algorithm reclassified it applying the same labels as the ILR (asystole, brady, AT/AF, VT, artifact, normal). We measured the number of episodes identified as false positive and reclassified as normal by the algorithm, and their proportion among all episodes. Results In 370 patients, ILRs recorded 3755 episodes including 305 patient-triggered and 629 with no ECG transmitted. 2821 episodes were analyzed by the novel algorithm, which reclassified 1227 episodes as normal rhythm. These reclassified episodes accounted for 43% of analyzed episodes and 32.6% of all episodes recorded. Conclusion A novel machine learning algorithm significantly reduces the quantity of episodes flagged as abnormal and typically reviewed by healthcare professionals. FUNDunding Acknowledgement Type of funding sources: None. Figure 1. ILR episodes analysis

Download Full-text

Detecting Malware Based on DNS Graph Mining

International Journal of Distributed Sensor Networks ◽

10.1155/2015/102687 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 8

Author(s):

Futai Zou ◽

Siyu Zhang ◽

Weixiong Rao ◽

Ping Yi

Keyword(s):

Real World ◽

False Positive ◽

Graph Mining ◽

Belief Propagation ◽

False Positive Rate ◽

Malware Detection ◽

True Positive Rate ◽

True Positive ◽

Detection Approach ◽

Positive Rate

Malware remains a major threat to nowadays Internet. In this paper, we propose a DNS graph mining-based malware detection approach. A DNS graph is composed of DNS nodes, which represent server IPs, client IPs, and queried domain names in the process of DNS resolution. After the graph construction, we next transform the problem of malware detection to the graph mining task of inferring graph nodes’ reputation scores using the belief propagation algorithm. The nodes with lower reputation scores are inferred as those infected by malwares with higher probability. For demonstration, we evaluate the proposed malware detection approach with real-world dataset. Our real-world dataset is collected from campus DNS servers for three months and we built a DNS graph consisting of 19,340,820 vertices and 24,277,564 edges. On the graph, we achieve a true positive rate 80.63% with a false positive rate 0.023%. With a false positive of 1.20%, the true positive rate was improved to 95.66%. We detected 88,592 hosts infected by malware or C&C servers, accounting for the percentage of 5.47% among all hosts. Meanwhile, 117,971 domains are considered to be related to malicious activities, accounting for 1.5% among all domains. The results indicate that our method is efficient and effective in detecting malwares.

Download Full-text

Comparison of malware detection techniques using machine learning algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i1.pp435-440 ◽

2019 ◽

Vol 16 (1) ◽

pp. 435 ◽

Cited By ~ 1

Author(s):

Nur Syuhada Selamat ◽

Fakariah Hani Mohd Ali

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

False Positive Rate ◽

Malware Detection ◽

Virus Detection ◽

Detection Accuracy ◽

Decision Tree Algorithm ◽

Security Threat ◽

Detection Techniques ◽

Positive Rate

<p>Currently, the volume of malware grows faster each year and poses a thoughtful global security threat. The number of malware developed increases as computers became interconnected, at an alarming rate in the 1990s. This scenario resulted the increment of malware. It also caused many protections are built to fight the malware. Unfortunately, the current technology is no longer effective to handle more advanced malware. Malware authors have created them to become more difficult to be evaded from anti-virus detection. In the current research, Machine Learning (ML) algorithm techniques became more popular to the researchers to analyze malware detection. In this paper, researchers proposed a defense system which uses three ML algorithm techniques comparison and select them based on the high accuracy malware detection. The result indicates that Decision Tree algorithm is the best detection accuracy compares to others classifier with 99% and 0.021% False Positive Rate (FPR) on a relatively small dataset.</p>

Download Full-text

Improving diagnosis of acute appendicitis with atypical findings by Tc-99m HMPAO leukocyte scan

Nuklearmedizin ◽

10.1055/s-0038-1623994 ◽

2002 ◽

Vol 41 (01) ◽

pp. 37-41 ◽

Cited By ~ 3

Author(s):

S. Shung-Shung ◽

S. Yu-Chien ◽

Y. Mei-Due ◽

W. Hwei-Chung ◽

A. Kao

Keyword(s):

Acute Appendicitis ◽

False Positive ◽

False Positive Rate ◽

Accurate Method ◽

Clinical Findings ◽

Pathological Findings ◽

Lower Quadrant ◽

Predictive Values ◽

Positive Rate

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.

Download Full-text

Predicting Fetal Chromosome Anomalies in the First Trimester Using Pregnancy Associated Plasma Protein-A: A Comparison of Statistical Methods

Methods of Information in Medicine ◽

10.1055/s-0038-1634910 ◽

1993 ◽

Vol 32 (02) ◽

pp. 175-179 ◽

Cited By ~ 7

Author(s):

B. Brambati ◽

T. Chard ◽

J. G. Grudzinskas ◽

M. C. M. Macintosh

Keyword(s):

Logistic Regression ◽

General Population ◽

Likelihood Ratio ◽

False Positive ◽

False Positive Rate ◽

Ratio Method ◽

Detection Rates ◽

Gaussian Distributions ◽

Positive Rate ◽

Likelihood Ratio Method

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.

Download Full-text

Identification of and Correction for Publication Bias: Comment

10.31222/osf.io/dh87m ◽

2019 ◽

Author(s):

Amanda Kvarven ◽

Eirik Strømland ◽

Magnus Johannesson

Keyword(s):

Publication Bias ◽

False Positive ◽

Large Scale ◽

Meta Analysis ◽

False Positive Rate ◽

Effect Sizes ◽

Replication Studies ◽

Moderate Reduction ◽

Positive Rate ◽

Meta Analyses

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.

Download Full-text

The false positive rate of P300-based concealed information test

Korean Journal of Cognitive and Biological Psychology ◽

10.22172/cogbio.2018.30.3.003 ◽

2018 ◽

Vol 30 (3) ◽

pp. 241-259 ◽

Cited By ~ 1

Author(s):

엄진섭 ◽

Jin-Hun Sohn ◽

Hajung Jeon

Keyword(s):

False Positive ◽

False Positive Rate ◽

Concealed Information Test ◽

Positive Rate ◽

Concealed Information ◽

Information Test

Download Full-text

PRATD: A Phased Remote Access Trojan Detection Method with Double-Sided Features

Electronics ◽

10.3390/electronics9111894 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1894

Author(s):

Chun Guo ◽

Zihua Song ◽

Yuan Ping ◽

Guowei Shen ◽

Yuhei Cui ◽

...

Keyword(s):

False Positive ◽

Detection Method ◽

False Positive Rate ◽

True Positive Rate ◽

Remote Access ◽

Detection Methods ◽

Security Threats ◽

True Positive ◽

Trojan Detection ◽

Positive Rate

Remote Access Trojan (RAT) is one of the most terrible security threats that organizations face today. At present, two major RAT detection methods are host-based and network-based detection methods. To complement one another’s strengths, this article proposes a phased RATs detection method by combining double-side features (PRATD). In PRATD, both host-side and network-side features are combined to build detection models, which is conducive to distinguishing the RATs from benign programs because that the RATs not only generate traffic on the network but also leave traces on the host at run time. Besides, PRATD trains two different detection models for the two runtime states of RATs for improving the True Positive Rate (TPR). The experiments on the network and host records collected from five kinds of benign programs and 20 famous RATs show that PRATD can effectively detect RATs, it can achieve a TPR as high as 93.609% with a False Positive Rate (FPR) as low as 0.407% for the known RATs, a TPR 81.928% and FPR 0.185% for the unknown RATs, which suggests it is a competitive candidate for RAT detection.

Download Full-text

Evidence for Age-Adjusted D-Dimer to Rule out Pulmonary Embolism: An Institutional Study

American Journal of Clinical Pathology ◽

10.1093/ajcp/aqaa137.007 ◽

2020 ◽

Vol 154 (Supplement_1) ◽

pp. S5-S5

Author(s):

Ridin Balakrishnan ◽

Daniel Casa ◽

Morayma Reyes Gil

Keyword(s):

Pulmonary Embolism ◽

False Positive ◽

False Positive Rate ◽

Moderate Risk ◽

Older Population ◽

Statistical Measures ◽

Positive Rate ◽

Risk Patients ◽

D Dimer ◽

Rule Out

Abstract The diagnostic approach for ruling out suspected acute pulmonary embolism (PE) in the ED setting includes several tests: ultrasound, plasma d-dimer assays, ventilation-perfusion scans and computed tomography pulmonary angiography (CTPA). Importantly, a pretest probability scoring algorithm is highly recommended to triage high risk cases while also preventing unnecessary testing and harm to low/moderate risk patients. The d-dimer assay (both ELISA and immunoturbidometric) has been shown to be extremely sensitive to rule out PE in conjunction with clinical probability. In particularly, d-dimer testing is recommended for low/moderate risk patients, in whom a negative d-dimer essentially rules out PE sparing these patients from CTPA radiation exposure, longer hospital stay and anticoagulation. However, an unspecific increase in fibrin-degradation related products has been seen with increase in age, resulting in higher false positive rate in the older population. This study analyzed patient visits to the ED of a large academic institution for five years and looked at the relationship between d-dimer values, age and CTPA results to better understand the value of age-adjusted d-dimer cut-offs in ruling out PE in the older population. A total of 7660 ED visits had a CTPA done to rule out PE; out of which 1875 cases had a d-dimer done in conjunction with the CT and 5875 had only CTPA done. Out of the 1875 cases, 1591 had positive d-dimer results (>0.50 µg/ml (FEU)), of which 910 (57%) were from patients older than or equal to fifty years of age. In these older patients, 779 (86%) had a negative CT result. The following were the statistical measures of the d-dimer test before adjusting for age: sensitivity (98%), specificity (12%); negative predictive value (98%) and false positive rate (88%). After adjusting for age in people older than 50 years (d-dimer cut off = age/100), 138 patients eventually turned out to be d-dimer negative and every case but four had a CT result that was also negative for a PE. The four cases included two non-diagnostic results and two with subacute/chronic/subsegmental PE on imaging. None of these four patients were prescribed anticoagulation. The statistical measures of the d-dimer test after adjusting for age showed: sensitivity (96%), specificity (20%); negative predictive value (98%) and a decrease in the false positive rate (80%). Therefore, imaging could have been potentially avoided in 138/779 (18%) of the patients who were part of this older population and had eventual negative or not clinically significant findings on CTPA if age-adjusted d-dimers were used. This data very strongly advocates for the clinical usefulness of an age-adjusted cut-off of d-dimer to rule out PE.

Download Full-text