scholarly journals Detection of Network Attacks using Machine Learning: A New Approach

Author(s):  
Avinash R. Sonule

Abstract: The Cyber-attacks become the most important security problems in the today’s world. With the increase in use of computing resources connected to the Internet like computers, mobiles, sensors, IoTs in networks, Big Data, Web Applications/Server, Clouds and other computing resources, hackers and malicious users are planning new ways of network intrusions. Many techniques have been developed to detect these intrusions which are based on data mining and machine learning methods. These intrusions detection techniques have been applied on various IDS datasets. UNSW-NB15 is the latest dataset. This data set contains different modern attack types and wide varieties of real normal activities. In this paper, we compare Naïve Bays algorithm with proposed probability based supervised machine learning algorithms using reduced UNSW NB15 dataset. Keywords: UNSW NB-15, Machine Learning, Naïve Bayes, All to Single (AS) features probability Algorithm

The network attacks become the most important security problems in the today’s world. There is a high increase in use of computers, mobiles, sensors,IoTs in networks, Big Data, Web Application/Server,Clouds and other computing resources. With the high increase in network traffic, hackers and malicious users are planning new ways of network intrusions. Many techniques have been developed to detect these intrusions which are based on data mining and machine learning methods. Machine learning algorithms intend to detect anomalies using supervised and unsupervised approaches.Both the detection techniques have been implemented using IDS datasets like DARPA98, KDDCUP99, NSL-KDD, ISCX, ISOT.UNSW-NB15 is the latest dataset. This data set contains nine different modern attack types and wide varieties of real normal activities. In this paper, a detailed survey of various machine learning based techniques applied on UNSW-NB15 data set have been carried out and suggested thatUNSW-NB15 is more complex than other datasets and is assumed as a new benchmark data set for evaluating NIDSs.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 444 ◽  
Author(s):  
Valerio Morfino ◽  
Salvatore Rampone

In the fields of Internet of Things (IoT) infrastructures, attack and anomaly detection are rising concerns. With the increased use of IoT infrastructure in every domain, threats and attacks in these infrastructures are also growing proportionally. In this paper the performances of several machine learning algorithms in identifying cyber-attacks (namely SYN-DOS attacks) to IoT systems are compared both in terms of application performances, and in training/application times. We use supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on public datasets using a training set of up to 2 million instances. We adopt a Cloud environment, emphasizing the importance of the scalability and of the elasticity of use. Results show that all the Spark algorithms used result in a very good identification accuracy (>99%). Overall, one of them, Random Forest, achieves an accuracy of 1. We also report a very short training time (23.22 sec for Decision Tree with 2 million rows). The experiments also show a very low application time (0.13 sec for over than 600,000 instances for Random Forest) using Apache Spark in the Cloud. Furthermore, the explicit model generated by Random Forest is very easy-to-implement using high- or low-level programming languages. In light of the results obtained, both in terms of computation times and identification performance, a hybrid approach for the detection of SYN-DOS cyber-attacks on IoT devices is proposed: the application of an explicit Random Forest model, implemented directly on the IoT device, along with a second level analysis (training) performed in the Cloud.


2021 ◽  
Author(s):  
Marc Raphael ◽  
Michael Robitaille ◽  
Jeff Byers ◽  
Joseph Christodoulides

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.


2021 ◽  
Author(s):  
Michael C. Robitaille ◽  
Jeff M. Byers ◽  
Joseph A. Christodoulides ◽  
Marc P. Raphael

Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm's initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery's optical modality, magnification or cell type.


Author(s):  
David A. Huber ◽  
Steffen Lau ◽  
Martina Sonnweber ◽  
Moritz P. Günther ◽  
Johannes Kirchebner

Migrants diagnosed with schizophrenia are overrepresented in forensic-psychiatric clinics. A comprehensive characterization of this offender subgroup remains to be conducted. The present exploratory study aims at closing this research gap. In a sample of 370 inpatients with schizophrenia spectrum disorders who were detained in a Swiss forensic-psychiatric clinic, 653 different variables were analyzed to identify possible differences between native Europeans and non-European migrants. The exploratory data analysis was conducted by means of supervised machine learning. In order to minimize the multiple testing problem, the detected group differences were cross-validated by applying six different machine learning algorithms on the data set. Subsequently, the variables identified as most influential were used for machine learning algorithm building and evaluation. The combination of two childhood-related factors and three therapy-related factors allowed to differentiate native Europeans and non-European migrants with an accuracy of 74.5% and a predictive power of AUC = 0.75 (area under the curve). The AUC could not be enhanced by any of the investigated criminal history factors or psychiatric history factors. Overall, it was found that the migrant subgroup was quite similar to the rest of offender patients with schizophrenia, which may help to reduce the stigmatization of migrants in forensic-psychiatric clinics. Some of the predictor variables identified may serve as starting points for studies aimed at developing crime prevention approaches in the community setting and risk management strategies tailored to subgroups of offenders with schizophrenia.


Data Science in healthcare is a innovative and capable for industry implementing the data science applications. Data analytics is recent science in to discover the medical data set to explore and discover the disease. It’s a beginning attempt to identify the disease with the help of large amount of medical dataset. Using this data science methodology, it makes the user to find their disease without the help of health care centres. Healthcare and data science are often linked through finances as the industry attempts to reduce its expenses with the help of large amounts of data. Data science and medicine are rapidly developing, and it is important that they advance together. Health care information is very effective in the society. In a human life day to day heart disease had increased. Based on the heart disease to monitor different factors in human body to analyse and prevent the heart disease. To classify the factors using the machine learning algorithms and to predict the disease is major part. Major part of involves machine level based supervised learning algorithm such as SVM, Naviebayes, Decision Trees and Random forest.


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2020 ◽  
Vol 16 (6) ◽  
pp. 155014772091156 ◽  
Author(s):  
Asif Iqbal ◽  
Farman Ullah ◽  
Hafeez Anwar ◽  
Ata Ur Rehman ◽  
Kiran Shah ◽  
...  

We propose to perform wearable sensors-based human physical activity recognition. This is further extended to an Internet-of-Things (IoT) platform which is based on a web-based application that integrates wearable sensors, smartphones, and activity recognition. To this end, a smartphone collects the data from wearable sensors and sends it to the server for processing and recognition of the physical activity. We collect a novel data set of 13 physical activities performed both indoor and outdoor. The participants are from both the genders where their number per activity varies. During these activities, the wearable sensors measure various body parameters via accelerometers, gyroscope, magnetometers, pressure, and temperature. These measurements and their statistical are then represented in features vectors that used to train and test supervised machine learning algorithms (classifiers) for activity recognition. On the given data set, we evaluate a number of widely known classifiers such random forests, support vector machine, and many others using the WEKA machine learning suite. Using the default settings of these classifiers in WEKA, we attain the highest overall classification accuracy of 90%. Consequently, such a recognition rate is encouraging, reliable, and effective to be used in the proposed platform.


2020 ◽  
Vol 1 (1) ◽  
pp. 94-116
Author(s):  
Dominik P. Heinisch ◽  
Johannes Koenig ◽  
Anne Otto

Only scarce information is available on doctorate recipients’ career outcomes ( BuWiN, 2013 ). With the current information base, graduate students cannot make an informed decision on whether to start a doctorate or not ( Benderly, 2018 ; Blank et al., 2017 ). However, administrative labor market data, which could provide the necessary information, are incomplete in this respect. In this paper, we describe the record linkage of two data sets to close this information gap: data on doctorate recipients collected in the catalog of the German National Library (DNB), and the German labor market biographies (IEB) from the German Institute of Employment Research. We use a machine learning-based methodology, which (a) improves the record linkage of data sets without unique identifiers, and (b) evaluates the quality of the record linkage. The machine learning algorithms are trained on a synthetic training and evaluation data set. In an exemplary analysis, we compare the evolution of the employment status of female and male doctorate recipients in Germany.


At present networking technologies has provided a better medium for people to communicate and exchange information on the internet. This is the reason in the last ten years the number of internet users has increased exponentially. The high-end use of network technology and the internet has also presented many security problems. Many intrusion detection techniques are proposed in combination with KDD99, NSL-KDD datasets. But there are some limitations of available datasets. Intrusion detection using machine learning algorithms makes the detection system more accurate and fast. So in this paper, a new hybrid approach of machine learning combining feature selection and classification algorithms is presented. The model is examined with the UNSW NB15 intrusion dataset. The proposed model has achieved better accuracy rate and attack detection also improved while the false attack rate is reduced. The model is also successful to accurately classify rare cyber attacks like worms, backdoor, and shellcode.


Sign in / Sign up

Export Citation Format

Share Document