Predicting and Analysis of Phishing Attacks and Breaches In E-Commerce Websites

Analyzing cyber incident data sets is an important method for deepening our understanding of the evolution of the threat situation. This is a relatively new research topic, and many studies remain to be done. In this paper, I reported a statistical analysis of a breach incident data set corresponding to 12 years (2005–2017) of cyber hacking activities that include malware attacks. I shown that, in contrast to the findings reported in the literature, both hacking breach incident inter-arrival times and breach sizes should be modeled by stochastic processes, rather than by distributions because they exhibit autocorrelations. Then, I proposed a particular stochastic process models to, respectively, fit the inter-arrival times and the breach sizes. I also shown that these models can predict the inter-arrival times and the breach sizes. In order to get deeper insights into the evolution of hacking breach incidents, we conduct both qualitative and quantitative trend analyses on the data set. I drew a set of cyber security insights, including that the threat of cyber hacks is indeed getting worse in terms of their frequency, but not in terms of the magnitude of their damage.

Download Full-text

British Iron Age Diet: Stable Isotopes and Other Evidence

Proceedings of the Prehistoric Society ◽

10.1017/s0079497x0002733x ◽

2007 ◽

Vol 73 ◽

pp. 169-190 ◽

Cited By ~ 31

Author(s):

Mandy Jay ◽

Michael P. Richards

Keyword(s):

Iron Age ◽

Isotope Analysis ◽

Consumption Patterns ◽

Data Sets ◽

Isotopic Data ◽

Data Set ◽

Middle Iron Age ◽

The Uk ◽

New Research ◽

High Level

This paper presents the results of new research into British Iron Age diet. Specifically, it summarises the existing evidence and compares this with new evidence obtained from stable isotope analysis. The isotope data come from both humans and animals from ten British middle Iron Age sites, from four locations in East Yorkshire, East Lothian, Hampshire, and Cornwall. These represent the only significant data-set of comparative humans (n = 138) and animals (n = 212) for this period currently available for the UK. They are discussed here alongside other evidence for diet during the middle Iron Age in Britain. In particular, the question of whether fish, or other aquatic foods, were a major dietary resource during this period is examined.The isotopic data suggest similar dietary protein consumption patterns across the groups, both within local populations and between them, although outliers do exist which may indicate mobile individuals moving into the sites. The diet generally includes a high level of animal protein, with little indication of the use of marine resources at any isotopically distinguishable level, even when the sites are situated directly on the coast. The nitrogen isotopic values also indicate absolute variation across these locations which is indicative of environmental background differences rather than differential consumption patterns and this is discussed in the context of the difficulty of interpreting isotopic data without a complete understanding of the ‘baseline’ values for any particular time and place. This reinforces the need for significant numbers of contemporaneous animals to be analysed from the same locations when interpreting human data-sets.

Download Full-text

Students’ General Knowledge of the Learning Process: A Mixed Methods Study Illustrating Integrated Data Collection and Data Consolidation

Journal of Mixed Methods Research ◽

10.1177/1558689816651792 ◽

2016 ◽

Vol 12 (2) ◽

pp. 182-203 ◽

Cited By ~ 7

Author(s):

Joke H. van Velzen

Keyword(s):

Mixed Methods ◽

Data Collection ◽

Learning Process ◽

Mixed Methods Research ◽

Mixed Methods Study ◽

Data Sets ◽

General Knowledge ◽

Data Set ◽

Data Consolidation ◽

New Research

There were two purposes for this mixed methods study: to investigate (a) the realistic meaning of awareness and understanding as the underlying constructs of general knowledge of the learning process and (b) a procedure for data consolidation. The participants were 11th-grade high school and first-year university students. Integrated data collection and data transformation provided for positive but small correlations between awareness and understanding. A comparison of the created combined and integrated new data sets showed that the integrated data set provided for an expected statistically significant outcome, which was in line with the participants’ developmental difference. This study can contribute to the mixed methods research because it proposes a procedure for data consolidation and a new research design.

Download Full-text

Hybrid analysis of textual data

Management Decision ◽

10.1108/md-03-2012-0247 ◽

2014 ◽

Vol 52 (4) ◽

pp. 737-754 ◽

Cited By ~ 15

Author(s):

Margit Raich ◽

Julia Müller ◽

Dagmar Abfalter

Keyword(s):

Hybrid Approach ◽

Data Sets ◽

Hybrid Analysis ◽

Data Set ◽

Content Type ◽

Software Application ◽

Qualitative And Quantitative ◽

Scientific Discussion ◽

Organization And Management ◽

Textual Data

Purpose – The purpose of this paper is to provide insightful evidence of phenomena in organization and management theory. Textual data sets consist of two different elements, namely qualitative and quantitative aspects. Researchers often combine methods to harness both aspects. However, they frequently do this in a comparative, convergent, or sequential way. Design/methodology/approach – The paper illustrates and discusses a hybrid textual data analysis approach employing the qualitative software application GABEK-WinRelan in a case study of an Austrian retail bank. Findings – The paper argues that a hybrid analysis method, fully intertwining qualitative and quantitative analysis simultaneously on the same textual data set, can deliver new insight into more facets of a data set. Originality/value – A hybrid approach is not a universally applicable solution to approaching research and management problems. Rather, this paper aims at triggering and intensifying scientific discussion about stronger integration of qualitative and quantitative data and analysis methods in management research.

Download Full-text

Risk Assessment of Modern Pipelines

Volume 4: Pipelining in Northern and Offshore Environments; Strain-Based Design; Risk and Reliability; Standards and Regulations ◽

10.1115/ipc2012-90072 ◽

2012 ◽

Author(s):

James N. Mihell ◽

Cameron Rout

Keyword(s):

Material Properties ◽

Limit State ◽

Data Sets ◽

Feature Size ◽

Data Set ◽

Reliability Estimates ◽

Source Data ◽

Adjustment Factors ◽

Incident Data ◽

Selection Of

Proponents of new pipeline projects are often asked by regulators to provide estimates of risk and reliability for their proposed pipeline. On existing pipelines, the availability of operating and assessment data is generally considered to be essential to the task of performing an accurate and defendable risk or reliability assessment. For proposed or new pipelines, the absence of these data presents a significant challenge to those performing the analysis. The reliance on industry incident data presents problems, since the vast majority of loss-of-containment incidents relate to older pipelines in which the design, routing criteria, material properties, material manufacturing processes, and early operating practices differ significantly from those that are characteristic of modern pipelines. As a consequence, much of the available failure incident data does not accurately reflect the threats or the magnitudes of the threats that are associated with modern pipelines. In order to address this problem, ‘adjustment factors’ are often applied against incident data to try to account for threat differences between the source data and the intended application. The selection of these adjustment factors can often be quite subjective, however, and open to judgment; therefore, they can be difficult to justify. With the rapidly growing practice of regular in-line inspection (ILI) on transmission pipelines, an extensive repository of ILI data has been accumulated — much of it relating to modern pipelines. Through the judicious selection of source data, ILI data sets can be mined so that an analogue data set can be created that constitutes a reasonable representation of the attributes of reliability of a specific new pipeline of interest. Key reliability properties, such as tool error distribution, feature incidence rate, feature size distribution, and apparent feature growth rate distribution can be derived from such analogue data. By applying these reliability properties in an analysis along with known pipeline design and material properties and their associated distributions, and by taking consideration of planned inspection intervals, a reliability basis can be derived for estimating pipeline risk and reliability. Estimates of risk and reliability that are derived in this manner employ methodologies that are repeatable, defendable, transparent, and free of subjectivity. This paper outlines an approach for completing risk and reliability estimates on new pipelines, and presents the results of some sample calculations. The reliability estimates illustrated are based on an approach whereby corrosion feature size and growth rates are obtained from analogue ILI datasets, and treated as random variables. In that regard, they constitute the probability of exceeding a limit state that represents an approximation of the condition for failure.

Download Full-text

Evaluation of a Model for Predicting the Drift of Iceberg Ensembles

Journal of Offshore Mechanics and Arctic Engineering ◽

10.1115/1.3257047 ◽

1988 ◽

Vol 110 (2) ◽

pp. 172-179 ◽

Cited By ~ 2

Author(s):

H. El-Tahan ◽

S. Venkatesh ◽

M. El-Tahan

Keyword(s):

East Coast ◽

Current System ◽

Model Performance ◽

Critical Examination ◽

Data Sets ◽

Data Set ◽

Grand Banks ◽

Qualitative And Quantitative ◽

Large Numbers ◽

Grid Block

This paper describes the evaluation of a model for predicting the drift of iceberg ensembles. The model was developed in preparation for providing an iceberg forecasting service off the Canadian east coast north of about 45°N. It was envisaged that 1–5 day forecasts of iceberg ensemble drift will be available. Following a critical examination of all available data, 10 data sets containing up to 404 icebergs in the Grand Banks area off Newfoundland were selected for detailed study. The winds measured in the vicinity of the study area as well as the detailed current system developed by the International Ice Patrol were used as inputs to the model. A discussion on the accuracy and limitations of the input data is presented. Qualitative and quantitative criteria were used to evaluate model performance. Applying these criteria to the results of the computer simulations, it is shown that the model provides good predictions. The degree of predictive success varied from one data set to another. The study demonstrated the validity of the assumption of random positioning for icebergs within a grid block, especially for ensembles with large numbers of icebergs. It was found that an “average” iceberg size can be used to represent all icebergs. The study also showed that in order to achieve improved results it will be necessary to account for the deterioration (complete melting of icebergs), especially during the summer months.

Download Full-text

Causal Process “Observation”: Oxymoron or (Fine) Old Wine

Political Analysis ◽

10.1093/pan/mpq023 ◽

2010 ◽

Vol 18 (4) ◽

pp. 499-505 ◽

Cited By ~ 22

Author(s):

Nathaniel Beck

Keyword(s):

Research Design ◽

Quantitative Data ◽

Quantitative Information ◽

Data Sets ◽

Causal Process ◽

Qualitative Information ◽

Data Set ◽

Qualitative And Quantitative ◽

Political Analysis ◽

Process Observation

The issue of how qualitative and quantitative information can be used together is critical. Brady, Collier, and Seawright (BCS) have argued that “causal process observations” can be adjoined to “data set observations.” This implies that qualitative methods can be used to add information to quantitative data sets. In a symposium inPolitical Analysis, I argued that such qualitative information cannot be adjoined in any meaningful way to quantitative data sets. In that symposium, the original authors offered several defenses, but, in the end, BCS can be seen as recommending good, but hopefully standard, research design practices that are normally thought of as central in thequantitativearena. It is good that BCS remind us that no amount of fancy statistics can save a bad research design.

Download Full-text

Comparative Study of Datasets used in Cyber Security Intrusion Detection

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2063103 ◽

2020 ◽

pp. 302-312

Author(s):

Rahul Yadav ◽

Phalguni Pathak ◽

Saumya Saraswat

Keyword(s):

Deep Learning ◽

Intrusion Detection ◽

Network Traffic ◽

Cyber Security ◽

Malware Detection ◽

Data Sets ◽

Security Threat ◽

Data Set ◽

Android Malware ◽

Network Intrusion

In recent years, deep learning frameworks are applied in various domains and achieved shows potential performance that includes malware detection software, self-driving cars, identity recognition cameras, adversarial attacks became one crucial security threat to several deep learning applications in today’s world Deep learning techniques became the core part for several cyber security applications like intrusion detection, android malware detection, spam, malware classification, binary analysis and phishing detection. . One of the major research challenges in this field is the insufficiency of a comprehensive data set which reflects contemporary network traffic scenarios, broad range of low footprint intrusions and in depth structured information about the network traffic. For Evaluation of network intrusion detection systems, many benchmark data sets were developed a decade ago. In this paper, we provides a focused literature survey of data sets used for network based intrusion detection and characterize the underlying packet and flow-based network data in detail used for intrusion detection in cyber security. The datasets plays incredibly vital role in intrusion detection; as a result we illustrate cyber datasets and provide a categorization of those datasets.

Download Full-text

The effects of data quality in local earthquake tomography: Application to the Alpine region

Geophysics ◽

10.1190/1.3237117 ◽

2009 ◽

Vol 74 (6) ◽

pp. WCB71-WCB79 ◽

Cited By ~ 8

Author(s):

Stephan Husen ◽

Tobias Diehl ◽

Edi Kissling

Keyword(s):

Real Data ◽

Alpine Region ◽

Quality Data ◽

Data Sets ◽

High Quality ◽

Solution Quality ◽

Data Set ◽

Arrival Times ◽

Local Earthquake Tomography ◽

Local Earthquake

Despite the increase in quality and number of seismic stations in many parts of the world, accurate timing of individual arrival times remains crucial for many tomographic applications. To achieve a data set of high quality, arrival times need to be picked with high accuracy, including a proper assessment of the uncertainty of timing and phase identification, and a high level of consistency. We have investigated the effects of data quantity and quality on the solution quality in local earthquake tomography. We have compared tomographic results obtained with synthetic and real data of two very different data sets. The first data set consisted of a large set of arrival times of low precision and unknown accuracy taken from the International Seismological Centre (ISC) Bulletin for the greater Alpine region. The second high-quality data set for the same region was seven times smaller and was obtained by automated quality-weighted repicking. During a first series of inversions, synthetic data resembling the two data sets were inverted with the same amount of Gaussian distributed noise added. Subsequently, during a second series of inversions, the noise level was increased successively for ISC data to study the effect of larger Gaussian distributed error on the solution quality. Finally, the real data for both data sets were inverted. These investigations showed that, for Gaussian distributed error, a smaller data set of high quality could achieve a similar or better solution quality than a data set seven times larger but about four times lower in quality. Our results further suggest that the quality of the ISC Bulletin is degraded significantly by inconsistencies, strongly limiting the use of this large data set for local earthquake tomography studies.

Download Full-text

The social wasp Vespula germanica (Fabricius) (Hymenoptera: Vespidae) population dynamics in England over 39 years.

The Entomologist s monthly magazine ◽

10.31184/m00138908.1542.3906 ◽

2018 ◽

Vol 154 (2) ◽

pp. 149-155

Author(s):

Michael Archer

Keyword(s):

Population Dynamics ◽

Population Dynamic ◽

Ecological Factors ◽

Social Wasp ◽

Data Sets ◽

Data Set ◽

Vespula Germanica ◽

The Social ◽

Minimum Number ◽

Suction Traps

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.

Download Full-text

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Download Full-text