Classification of Peer-to-Peer Traffic Using a Two-Stage Window-Based Classifier with Fast Decision Tree and IP Layer Attributes

Author(s):  
Bijan Raahemi ◽  
Ali Mumtaz

This paper presents a new approach using data mining techniques, and in particular a two-stage architecture, for classification of Peer-to-Peer (P2P) traffic in IP networks where in the first stage the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffic. The labeled traffic produced in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture not only classifies well-known P2P applications, but also classifies applications that use random or non-standard port numbers and cannot be classified otherwise. The authors captured the internet traffic at a gateway router, performed pre-processing on the data, selected the most significant attributes, and prepared a training data set to which the new algorithm was applied. Finally, the authors built several models using a combination of various attribute sets for different ratios of P2P to NonP2P traffic in the training data.

2010 ◽  
Vol 6 (3) ◽  
pp. 28-42 ◽  
Author(s):  
Bijan Raahemi ◽  
Ali Mumtaz

This paper presents a new approach using data mining techniques, and in particular a two-stage architecture, for classification of Peer-to-Peer (P2P) traffic in IP networks where in the first stage the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffic. The labeled traffic produced in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture not only classifies well-known P2P applications, but also classifies applications that use random or non-standard port numbers and cannot be classified otherwise. The authors captured the internet traffic at a gateway router, performed pre-processing on the data, selected the most significant attributes, and prepared a training data set to which the new algorithm was applied. Finally, the authors built several models using a combination of various attribute sets for different ratios of P2P to NonP2P traffic in the training data.


2018 ◽  
Vol 7 (4.15) ◽  
pp. 421
Author(s):  
Erick Akhmad Fahmi Alfa’izy ◽  
Khairil Anam ◽  
Naidah Naing ◽  
Rosanita Tritias Utami ◽  
Nur Anim Jauhariyah ◽  
...  

Design an analysis system to find out graduation by comparing previous data and existing data to overcome errors in a college system. By taking data records that are already available to be processed using the naïve Bayes algorithm. This research was conducted at Universitas Maarif Hasyim Latif. In this case, the object of research is to analyze the data of students with naïve Bayes algorithms to find out their graduation. For sampling the data taken is the previous Faculty of Law Student data to be used as training data, to retrieve the entire data using data records that are already available in the Directorate of Information Systems. That the naïve Bayes algorithm can be used in the classification of data in the form of a string or textual. This is based on researchers' trials in taking examples of calculations that have been done before. To compare the results of the classification of graduation analysis using the naïve Bayes algorithm testing is done with a sample of data in the form of training data compared to data testing. From the calculations that have been made, the accuracy is 77.78%. 


2021 ◽  
Vol 7 (3) ◽  
pp. 53-60
Author(s):  
Rika Nursyahfitri ◽  
Alfanda Novebrian Maharadja ◽  
Riva Arsyad Farissa ◽  
Yuyun Umaidah

Classification is a technique that can be used for prediction, where the predicted value is a label. The classification of drug determination aims to predict the type of drug that is accurate for patients with the dataset that has been obtained. The data used in this study are data from the patient's medical records based on the symptoms of the disease but the type of medicine is not yet known. The data set used comes from kaggle.com which is then presented in the form of a decision tree with a mathematical model. To complete this research, a classification method is used in data mining, namely the decision tree. The decision tree method is used to find the relationship between a number of candidate variables, so that it becomes a classification target variable by dividing the data into 70% data testing and 30% training data. The results obtained from this study are in the form of rules and an accuracy rate of 96.36% as well as the recall and precision values ​​of each type of drug using a multiclass configuration matrix.


2020 ◽  
Vol 1 (1) ◽  
pp. 17
Author(s):  
Astia Weni Syaputri ◽  
Erno Irwandi ◽  
Mustakim Mustakim

Majors are important in determining student specialization. If there is an error in the direction of the student, it will certainly affect the education of subsequent students. In SMA Negeri 1 Kampar Timur, there are two majors, namely Natural Sciences and Social Sciences. To determine these majors, it is necessary to reference the average value of student grades from semester 3 to semester 5 which includes the average value of Islamic religious education, Indonesian, Citizenship Education, English, Natural Sciences, Social Sciences, and Mathematics. Naive Beyes algorithm is an algorithm that can be used in classifying majors found in SMA Negeri 1 Kampar Timur. To determine the classification of majors in SMA Negeri 1 Kampar Timur, training data and test data are used, respectively at 70% and 30%. This data will be tested for accuracy using a confusion matrix and produces a fairly high accuracy of 96.19%. With this high accuracy, the Naive Bayes algorithm is very suitable to be used in determining the direction of students in SMA Negeri 1 Kampar Timur.


2021 ◽  
Author(s):  
Peng Cheng ◽  
James V. Krogmeier ◽  
Mark R. Bell ◽  
Joshua Li ◽  
Guangwei Yang

This research considers the detection, location, and classification of patches in concrete and asphalt-on-concrete pavements using data taken from ground penetrating radar (GPR) and the WayLink 3D Imaging System. In particular, the project seeks to develop a patching table for “inverted-T” patches. A number of deep neural net methods were investigated for patch detection from 3D elevation and image observation, but the success was inconclusive, partly because of a dearth of training data. Later, a method based on thresholding IRI values computed on a 12-foot window was used to localize pavement distress, particularly as seen by patch settling. This method was far more promising. In addition, algorithms were developed for segmentation of the GPR data and for classification of the ambient pavement and the locations and types of patches found in it. The results so far are promising but far from perfect, with a relatively high rate of false alarms. The two project parts were combined to produce a fused patching table. Several hundred miles of data was captured with the Waylink System to compare with a much more limited GPR dataset. The primary dataset was captured on I-74. A software application for MATLAB has been written to aid in automation of patch table creation.


Sign in / Sign up

Export Citation Format

Share Document