scholarly journals Assisting in Auditing of Buffer Overflow Vulnerabilities via Machine Learning

2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Qingkun Meng ◽  
Chao Feng ◽  
Bin Zhang ◽  
Chaojing Tang

Buffer overflow vulnerability is a kind of consequence in which programmers’ intentions are not implemented correctly. In this paper, a static analysis method based on machine learning is proposed to assist in auditing buffer overflow vulnerabilities. First, an extended code property graph is constructed from the source code to extract seven kinds of static attributes, which are used to describe buffer properties. After embedding these attributes into a vector space, five frequently used machine learning algorithms are employed to classify the functions into suspicious vulnerable functions and secure ones. The five classifiers reached an average recall of 83.5%, average true negative rate of 85.9%, a best recall of 96.6%, and a best true negative rate of 91.4%. Due to the imbalance of the training samples, the average precision of the classifiers is 68.9% and the average F1 score is 75.2%. When the classifiers were applied to a new program, our method could reduce the false positive to 1/12 compared to Flawfinder.

2011 ◽  
pp. 999-999
Author(s):  
William Uther ◽  
Dunja Mladenić ◽  
Massimiliano Ciaramita ◽  
Bettina Berendt ◽  
Aleksander Kołcz ◽  
...  

2017 ◽  
Vol 10 (7) ◽  
pp. 657-662 ◽  
Author(s):  
Shlomi Peretz ◽  
David Orion ◽  
David Last ◽  
Yael Mardor ◽  
Yotam Kimmel ◽  
...  

PurposeThe region defined as ‘at risk’ penumbra by current CT perfusion (CTP) maps is largely overestimated. We aimed to quantitate the portion of true ‘at risk’ tissue within CTP penumbra and to determine the parameter and threshold that would optimally distinguish it from false ‘at risk’ tissue, that is, benign oligaemia.MethodsAmong acute stroke patients evaluated by multimodal CT (NCCT/CTA/CTP) we identified those that had not undergone endovascular/thrombolytic treatment and had follow-up NCCT. Maps of absolute and relative CBF, CBV, MTT, TTP and Tmax as well as summary maps depicting infarcted and penumbral regions were generated using the Intellispace Portal (Philips Healthcare, Best, Netherlands). Follow-up CT was automatically co-registered to the CTP scan and the final infarct region was manually outlined. Perfusion parameters were systematically analysed – the parameter that resulted in the highest true-negative-rate (ie, proportion of benign oligaemia correctly identified) at a fixed, clinically relevant false-negative-rate (ie, proportion of ‘missed’ infarct) of 15%, was chosen as optimal. It was then re-applied to the CTP data to produce corrected perfusion maps.ResultsForty seven acute stroke patients met selection criteria. Average portion of infarcted tissue within CTP penumbra was 15%±2.2%. Relative CBF at a threshold of 0.65 yielded the highest average true-negative-rate (48%), enabling reduction of the false ‘at risk’ penumbral region by ~half.ConclusionsApplying a relative CBF threshold on relative MTT-based CTP maps can significantly reduce false ‘at risk’ penumbra. This step may help to avoid unnecessary endovascular interventions.


Author(s):  
D. Wang ◽  
M. Hollaus ◽  
N. Pfeifer

Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI) and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM), Na¨ıve Bayes (NB), Random Forest (RF), and Gaussian Mixture Model (GMM), for separating wood and leaf points from terrestrial laser scanning (TLS) data. Two trees, an <i>Erytrophleum fordii</i> and a <i>Betula pendula</i> (silver birch) are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.


2020 ◽  
Vol 60 (2) ◽  
pp. 102-111
Author(s):  
Henrique Rodrigues ◽  
Rosa Ramos ◽  
Leoni Fagundes ◽  
Orlando Galego ◽  
David Navega ◽  
...  

Objective We aimed to evaluate whether the internal structures of the human ear have anatomical characteristics that are sufficiently distinctive to contribute to human identification and use in a forensic context. Materials and methods After data anonymisation, a dataset containing temporal bone CT scans of 100 subjects was processed by a radiologist who was not involved in the study. Four reference images were selected for each subject. Of the original sample, 10 examinations were used for visual comparison, case by case, against the dataset of 100 patients. This visual assessment was performed independently by four observers, who evaluated the anatomical agreement using a Likert scale (1–5). Inter-observer agreement, true positive rate, positive predictive value, true negative rate, negative predictive value, false positive rate, false negative rate and positive likelihood ratio (LR+) were evaluated. Results Inter-observer agreement obtained an overall Cohen’s Kappa = 99.59%. True positive rate, positive predictive value, true negative rate and negative predictive value were all 100%. Conclusion Visual assessment of the mastoid examinations was shown to be a robust and reliable approach to identify unique osseous features and contribute to human identification. The statistical analysis indicates that regardless of the examiner’s background and training, the approach has a high degree of accuracy.


Author(s):  
Shantipriya Parida ◽  
Satchidananda Dehuri

Classification of brain states obtained through functional magnetic resonance imaging (fMRI) poses a serious challenges for neuroimaging community to uncover discriminating patterns of brain state activity that define independent thought processes. This challenge came into existence because of the large number of voxels in a typical fMRI scan, the classifier is presented with a massive feature set coupled with a relatively small training samples. One of the most popular research topics in last few years is the application of machine learning algorithms for mental states classification, decoding brain activation, and finding the variable of interest from fMRI data. In classification scenario, different algorithms have different biases, in the sequel performances differs across datasets, and for a particular dataset the accuracy varies from classifier to classifier. To overcome the limitations of individual techniques, hybridization or fusion of these machine learning techniques emerged in recent years which have shown promising result and open up new direction of research. This paper reviews the machine learning techniques ranging from individual classifiers, ensemble, and hybrid techniques used in cognitive classification with a well balance treatment of their applications, performance, and limitations. It also discusses many open research challenges for further research.


2021 ◽  
Author(s):  
Vladimir Fonov ◽  
Mahsa Dadar ◽  
D. Louis Collins ◽  
◽  

Linear registration to stereotaxic space is a common first step in many automated image-processing tools for analysis of human brain MRI scans. This step is crucial for the success of the subsequent image-processing steps. Several well-established algorithms are commonly used in the field of neuroimaging for this task, but none have a 100% success rate. Manual assessment of the registration is commonly used as part of quality control. To reduce the burden of this time-consuming step, we propose Deep Automated Registration Qc (DARQ), a fully automatic quality control method based on deep learning that can replace the human rater and accurately perform quality control assessment for stereotaxic registration of T1w brain scans. In a recently published study from our group comparing linear registration methods, we used a database of 9325 MRI scans from several publicly available datasets and applied seven linear registration tools to them. In this study, the resulting images that were assessed and labeled by a human rater are used to train a deep neural network to detect cases when registration failed. We further validated the results on an independent dataset of patients with multiple sclerosis, with manual QC labels available (n=1200). In terms of agreement with a manual rater, our automated QC method was able to achieve 89% accuracy and 85% true negative rate (equivalently 15% false positive rate) in detecting scans that should pass quality control in a balanced cross-validation experiments, and 96.1% accuracy and 95.5% true negative rate (or 4.5% FPR) when evaluated in a balanced independent sample, similar to manual QC rater (test-retest accuracy of 93%). The results show that DARQ is robust, fast, accurate, and generalizable in detecting failure in linear stereotaxic registrations and can substantially reduce QC time (by a factor of 20 or more) when processing large datasets.


Author(s):  
Daniel Campbell ◽  
Corey Ray-Subramanian ◽  
Winifred Schultz-Krohn ◽  
Kristen M. Powers ◽  
Renee Watling ◽  
...  

2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e13040-e13040
Author(s):  
Jose Manuel Jerez ◽  
Nuria Ribelles ◽  
Pablo Rodriguez-Brazzarola ◽  
Tamara Diaz Redondo ◽  
Begoña Jimenez Rodriguez ◽  
...  

e13040 Background: The treatment of luminal MBC has undergone a substantial change with the use of cyclin dependent kinase 4/6 inhibitors (CDKIs). Nevertheless, there is not a clearly defined subgroup of patients who do not initially respond to CDKIs and show EP. Methods: MBC ER+/HER2- patients who have received at least one line of treatment were eligible. The event of interest was disease progression within 6 months of first line treatment according to the type of therapy administered. The first line treatments were categorized in chemotherapy (CT), hormonal therapy (HT), CT plus maintenance HT and HT plus CDKIs. Free text data from clinical visits registered in our Electronic Health Record were obtained until the date of first treatment in order to generate a feature vector composed of the word frequencies for each visit of every patient. Six different machine learning algorithms were evaluated to predict the event of interest and to obtain the risk of EP for every type of therapy. Area under the ROC curve (AUC), True Positive Rate (TPR) and True Negative Rate (TNR) were assessed using 10-fold cross validation. Results: 610 RE+/HER2- MBC treated between November 1991 and August 2019 were included. Median follow up for metastatic disease was 28 months. 17426 clinical visits were analyzed (per patient: range 1-173; median 30). 119 patients received CT as first line treatment, 311 HT, 117 CT plus maintenance HT and 63 HT plus CDKIs. There were 379 patients with disease progression, from which 126 were within 6 months from first line treatment (54 events with CT, 57 with HT, 4 with CT plus maintenance HT and 11 with HT plus CDKIs). The model that yields the best results was the GLMBoost algorithm: AUC 0.72 (95%CI 0.67-0.77), TPR 70.85% (95%CI 70.63%-71.06%), TNR 66.27% (95% 66.08%-66.46%). Conclusions: Our model based on unstructured data from real-world patients predicts EP and establishes the risk for each of the different types of treatment for MBC ER+/HER2-. Obviously an additional validation is needed, but a tool with these characteristics could help to select the best available treatment when that decision has to be made, avoiding those therapies that are probably not to be effective.


Author(s):  
RONG-CHANG CHEN ◽  
TUNG-SHOU CHEN ◽  
CHIH-CHIANG LIN

Recently, a new personalized model has been developed to prevent credit card fraud. This model is promising; however, there remains some problems. Existing approaches cannot identify well credit card frauds from few data with skewed distributions. This paper proposes to address the problem using a binary support vector system (BSVS). The proposed BSVS is based on the support vectors in the support vector machines (SVM) and the genetic algorithm (GA) is employed to select support vectors. To obtain a high true negative rate, self-organizing mapping (SOM) is first employed to estimate the distribution model of the input data. Then BSVS is used to best train the data according to the input data distribution to obtain a high detection rate. Experimental results show that the proposed BSVS is effective especially for predicting a high true negative rate.


2014 ◽  
Vol 11 (1) ◽  
pp. 175-188 ◽  
Author(s):  
Nemanja Macek ◽  
Milan Milosavljevic

The KDD Cup '99 is commonly used dataset for training and testing IDS machine learning algorithms. Some of the major downsides of the dataset are the distribution and the proportions of U2R and R2L instances, which represent the most dangerous attack types, as well as the existence of R2L attack instances identical to normal traffic. This enforces minor category detection complexity and causes problems while building a machine learning model capable of detecting these attacks with sufficiently low false negative rate. This paper presents a new support vector machine based intrusion detection system that classifies unknown data instances according both to the feature values and weight factors that represent importance of features towards the classification. Increased detection rate and significantly decreased false negative rate for U2R and R2L categories, that have a very few instances in the training set, have been empirically proven.


Sign in / Sign up

Export Citation Format

Share Document