Machine Learning and Survey-based Predictors of InfoSec Non-Compliance

2022 ◽  
Vol 13 (2) ◽  
pp. 1-20
Author(s):  
Byron Marshall ◽  
Michael Curry ◽  
Robert E. Crossler ◽  
John Correia

Survey items developed in behavioral Information Security (InfoSec) research should be practically useful in identifying individuals who are likely to create risk by failing to comply with InfoSec guidance. The literature shows that attitudes, beliefs, and perceptions drive compliance behavior and has influenced the creation of a multitude of training programs focused on improving ones’ InfoSec behaviors. While automated controls and directly observable technical indicators are generally preferred by InfoSec practitioners, difficult-to-monitor user actions can still compromise the effectiveness of automatic controls. For example, despite prohibition, doubtful or skeptical employees often increase organizational risk by using the same password to authenticate corporate and external services. Analysis of network traffic or device configurations is unlikely to provide evidence of these vulnerabilities but responses to well-designed surveys might. Guided by the relatively new IPAM model, this study administered 96 survey items from the Behavioral InfoSec literature, across three separate points in time, to 217 respondents. Using systematic feature selection techniques, manageable subsets of 29, 20, and 15 items were identified and tested as predictors of non-compliance with security policy. The feature selection process validates IPAM's innovation in using nuanced self-efficacy and planning items across multiple time frames. Prediction models were trained using several ML algorithms. Practically useful levels of prediction accuracy were achieved with, for example, ensemble tree models identifying 69% of the riskiest individuals within the top 25% of the sample. The findings indicate the usefulness of psychometric items from the behavioral InfoSec in guiding training programs and other cybersecurity control activities and demonstrate that they are promising as additional inputs to AI models that monitor networks for security events.

Author(s):  
Liang Zhang ◽  
Jin Wen ◽  
Yimin Chen

An accurate building energy forecasting model is a key component for real-time and advanced control of building energy system and building-to-grid integration. With the fast deployment and advancement of building automation systems, data are collected by hundreds and sometimes thousands of sensors every few minutes in buildings, which provide great potential for data-driven building energy forecasting. To develop building energy forecasting models from a large number of potential inputs, feature selection is a critical procedure to ensure model accuracy and computation efficiency. Though the theory of feature selection is well developed in statistics and machine learning fields, it is not well studied in the application of building energy modeling. In this paper, a feature selection framework proposed in an earlier study is examined using a real campus building in Philadelphia. This feature selection framework combines domain knowledge and statistical methods and is developed for short-term data-driven building energy forecasting. In this case study, the feasibilities of using this feature selection framework in developing whole building energy forecasting model and chiller energy forecasting model are studied. Results show that, for both whole building and chiller energy forecasting applications, the model with systematic feature selection process presents better performance (in terms of cross validation error of forecasted output) than other models including that with conventional inputs and that uses only single feature selection technique.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Li-Hsin Cheng ◽  
Te-Cheng Hsu ◽  
Che Lin

AbstractBreast cancer is a heterogeneous disease. To guide proper treatment decisions for each patient, robust prognostic biomarkers, which allow reliable prognosis prediction, are necessary. Gene feature selection based on microarray data is an approach to discover potential biomarkers systematically. However, standard pure-statistical feature selection approaches often fail to incorporate prior biological knowledge and select genes that lack biological insights. Besides, due to the high dimensionality and low sample size properties of microarray data, selecting robust gene features is an intrinsically challenging problem. We hence combined systems biology feature selection with ensemble learning in this study, aiming to select genes with biological insights and robust prognostic predictive power. Moreover, to capture breast cancer's complex molecular processes, we adopted a multi-gene approach to predict the prognosis status using deep learning classifiers. We found that all ensemble approaches could improve feature selection robustness, wherein the hybrid ensemble approach led to the most robust result. Among all prognosis prediction models, the bimodal deep neural network (DNN) achieved the highest test performance, further verified by survival analysis. In summary, this study demonstrated the potential of combining ensemble learning and bimodal DNN in guiding precision medicine.


2013 ◽  
Vol 56 (3) ◽  
pp. 729-756 ◽  
Author(s):  
JEREMY NUTTALL

ABSTRACTObserving the increasing, yet still partial exploration of pluralism, complexity and multiplicity in recent Labour party historiography, this article pursues a pluralist approach to Labour on two central, related themes of its middle-century evolution. First, it probes the plurality of Labour's different conceptions of time, specifically how it lived with the ambiguity of simultaneously viewing social progress as both immediate and rapidly achievable, yet also long term and strewn with constraints. This co-existence of multiple time-frames highlights the party's uncertainty and ideological multi-dimensionality, especially in its focus both on relatively rapid economic or structural transformation, and on much more slow-moving cultural, ethical, and educational change. It also complicates neat characterizations of particular phases in the party's history, challenging straightforwardly declinist views of the post-1945–51 period. Secondly, time connects to Labour's view of the people. Whilst historians have debated between positive and negative perceptions of the people, here the plural, split mind of Labour about the progressive potential of the citizenry is stressed, one closely intertwined with its multiple outlook on how long socialism would take. Contrasts are also suggested between the time-frames and expectations under which Labour and the Conservatives operated.


Author(s):  
G. T. Alckmin ◽  
L. Kooistra ◽  
A. Lucieer ◽  
R. Rawnsley

<p><strong>Abstract.</strong> Vegetation indices (VIs) have been extensively employed as a feature for dry matter (DM) estimation. During the past five decades more than a hundred vegetation indices have been proposed. Inevitably, the selection of the optimal index or subset of indices is not trivial nor obvious. This study, performed on a year-round observation of perennial ryegrass (n&amp;thinsp;=&amp;thinsp;900), indicates that for this response variable (i.e. kg.DM.ha<sup>&amp;minus;1</sup>), more than 80% of indices present a high degree of collinearity (correlation&amp;thinsp;&amp;gt;&amp;thinsp;|0.8|.) Additionally, the absence of an established workflow for feature selection and modelling is a handicap when trying to establish meaningful relations between spectral data and biophysical/biochemical features. Within this case study, an unsupervised and supervised filtering process is proposed to an initial dataset of 97 VIs. This research analyses the effects of the proposed filtering and feature selection process to the overall stability of final models. Consequently, this analysis provides a straightforward framework to filter and select VIs. This approach was able to provide a reduced feature set for a robust model and to quantify trade-offs between optimal models (i.e. lowest root mean square error &amp;ndash; RMSE&amp;thinsp;=&amp;thinsp;412.27&amp;thinsp;kg.DM.ha<sup>&amp;minus;1</sup>) and tolerable models (with a smaller number of features &amp;ndash; 4 VIs and within 10% of the lowest RMSE.)</p>


2014 ◽  
Vol 52 ◽  
Author(s):  
Ralf C. Staudemeyer ◽  
Christian W. Omlin

This work presents a data preprocessing and feature selection framework to support data mining and network security experts in minimal feature set selection of intrusion detection data. This process is supported by detailed visualisation and examination of class distributions. Distribution histograms, scatter plots and information gain are presented as supportive feature reduction tools. The feature reduction process applied is based on decision tree pruning and backward elimination. This paper starts with an analysis of the KDD Cup '99 datasets and their potential for feature reduction. The dataset consists of connection records with 41 features whose relevance for intrusion detection are not clear. All traffic is either classified `normal' or into the four attack types denial-of-service, network probe, remote-to-local or user-to-root. Using our custom feature selection process, we show how we can significantly reduce the number features in the dataset to a few salient features. We conclude by presenting minimal sets with 4--8 salient features for two-class and multi-class categorisation for detecting intrusions, as well as for the detection of individual attack classes; the performance using a static classifier compares favourably to the performance using all features available. The suggested process is of general nature and can be applied to any similar dataset.


Sign in / Sign up

Export Citation Format

Share Document