Classification of Peer-to-Peer Traffic Using A Two-Stage Window-Based Classifier With Fast Decision Tree and IP Layer Attributes

2010 ◽  
Vol 6 (3) ◽  
pp. 28-42 ◽  
Author(s):  
Bijan Raahemi ◽  
Ali Mumtaz

This paper presents a new approach using data mining techniques, and in particular a two-stage architecture, for classification of Peer-to-Peer (P2P) traffic in IP networks where in the first stage the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffic. The labeled traffic produced in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture not only classifies well-known P2P applications, but also classifies applications that use random or non-standard port numbers and cannot be classified otherwise. The authors captured the internet traffic at a gateway router, performed pre-processing on the data, selected the most significant attributes, and prepared a training data set to which the new algorithm was applied. Finally, the authors built several models using a combination of various attribute sets for different ratios of P2P to NonP2P traffic in the training data.

Author(s):  
Bijan Raahemi ◽  
Ali Mumtaz

This paper presents a new approach using data mining techniques, and in particular a two-stage architecture, for classification of Peer-to-Peer (P2P) traffic in IP networks where in the first stage the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffic. The labeled traffic produced in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture not only classifies well-known P2P applications, but also classifies applications that use random or non-standard port numbers and cannot be classified otherwise. The authors captured the internet traffic at a gateway router, performed pre-processing on the data, selected the most significant attributes, and prepared a training data set to which the new algorithm was applied. Finally, the authors built several models using a combination of various attribute sets for different ratios of P2P to NonP2P traffic in the training data.


2021 ◽  
Vol 7 (3) ◽  
pp. 53-60
Author(s):  
Rika Nursyahfitri ◽  
Alfanda Novebrian Maharadja ◽  
Riva Arsyad Farissa ◽  
Yuyun Umaidah

Classification is a technique that can be used for prediction, where the predicted value is a label. The classification of drug determination aims to predict the type of drug that is accurate for patients with the dataset that has been obtained. The data used in this study are data from the patient's medical records based on the symptoms of the disease but the type of medicine is not yet known. The data set used comes from kaggle.com which is then presented in the form of a decision tree with a mathematical model. To complete this research, a classification method is used in data mining, namely the decision tree. The decision tree method is used to find the relationship between a number of candidate variables, so that it becomes a classification target variable by dividing the data into 70% data testing and 30% training data. The results obtained from this study are in the form of rules and an accuracy rate of 96.36% as well as the recall and precision values ​​of each type of drug using a multiclass configuration matrix.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ryoya Shiode ◽  
Mototaka Kabashima ◽  
Yuta Hiasa ◽  
Kunihiro Oka ◽  
Tsuyoshi Murase ◽  
...  

AbstractThe purpose of the study was to develop a deep learning network for estimating and constructing highly accurate 3D bone models directly from actual X-ray images and to verify its accuracy. The data used were 173 computed tomography (CT) images and 105 actual X-ray images of a healthy wrist joint. To compensate for the small size of the dataset, digitally reconstructed radiography (DRR) images generated from CT were used as training data instead of actual X-ray images. The DRR-like images were generated from actual X-ray images in the test and adapted to the network, and high-accuracy estimation of a 3D bone model from a small data set was possible. The 3D shape of the radius and ulna were estimated from actual X-ray images with accuracies of 1.05 ± 0.36 and 1.45 ± 0.41 mm, respectively.


2014 ◽  
Vol 539 ◽  
pp. 181-184
Author(s):  
Wan Li Zuo ◽  
Zhi Yan Wang ◽  
Ning Ma ◽  
Hong Liang

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.


2016 ◽  
Vol 42 (10) ◽  
pp. 980-998 ◽  
Author(s):  
Thanh Pham Thien Nguyen ◽  
Son Hong Nghiem

Purpose The purpose of this paper is to examine the operational efficiency and effects of market concentration and diversification on the efficiency of Chinese and Indian banks in the 1997-2011 period. Design/methodology/approach This study employs the two-stage bootstrap procedure of Simar and Wilson (2007) to obtain valid inferences on the efficiency scores and the efficiency determinants. Findings Using data set for each country separately, the authors found that the bias-corrected cost efficiency displays an upward trend in Chinese and Indian banks. This trend is consistent with profit efficiency among Chinese banks, but the trend is unclear in Indian banks. Market concentration is negatively related to cost and profit efficiencies of Chinese banks. However, market concentration is positively associated with cost efficiency, but unrelated to profit efficiency of Indian banks. In Chinese banks, diversification of revenue, earning assets and non-lending earning assets are associated with increasing profit efficiency, but their effects to cost efficiency are not clear. In Indian banks, diversification of earning assets increases profit efficiency while there are cost efficiency losses from diversification of revenue and earning assets. Practical implications Bank regulators and supervisors in China should consider establishing policies to reduce market concentration and encourage diversification of revenue, earning assets and non-lending earning assets, while increasing concentration and diversification of earning assets should be encouraged in Indian banks. Originality/value To the best of the authors’ knowledge, this is the first study employing the double bootstrap procedure proposed by Simar and Wilson (2007) which can address the problem of the two-stage data envelopment analysis or SFA estimator in the efficiency literature on Chinese and Indian banks that efficiency scores obtained in the first stage are inter-dependent, and hence violating the basic assumption in regression analysis in the second stage.


2005 ◽  
Vol 61 (5) ◽  
pp. 585-594 ◽  
Author(s):  
J. Pérez ◽  
K. Nolsøe ◽  
M. Kessler ◽  
L. García ◽  
E. Pérez ◽  
...  

Two methods for the classification of eight-membered rings based on a Bayesian analysis are presented. The two methods share the same probabilistic model for the measurement of torsion angles, but while the first method uses the canonical forms of cyclooctane and, given an empirical sequence of eight torsion angles, yields the probability that the associated structure corresponds to each of the ten canonical conformations, the second method does not assume previous knowledge of existing conformations and yields a clustering classification of a data set, allowing new conformations to be detected. Both methods have been tested using the conformational classification of Csp 3 eight-membered rings described in the literature. The methods have also been employed to classify the solid-state conformation in Csp 3 eight-membered rings using data retrieved from an updated version of the Cambridge Structural Database (CSD).


Author(s):  
GOZDE UNAL ◽  
GAURAV SHARMA ◽  
REINER ESCHBACH

Photography, lithography, xerography, and inkjet printing are the dominant technologies for color printing. Images produced on these "different media" are often scanned either for the purpose of copying or creating an electronic representation. For an improved color calibration during scanning, a media identification from the scanned image data is desirable. In this paper, we propose an efficient algorithm for automated classification of input media into four major classes corresponding to photographic, lithographic, xerographic and inkjet. Our technique exploits the strong correlation between the type of input media and the spatial statistics of corresponding images, which are observed in the scanned images. We adopt ideas from spatial statistics literature, and design two spatial statistical measures of dispersion and periodicity, which are computed over spatial point patterns generated from blocks of the scanned image, and whose distributions provide the features for making a decision. We utilize extensive training data and determined well separated decision regions to classify the input media. We validate and tested our classification technique results over an independent extensive data set. The results demonstrate that the proposed method is able to distinguish between the different media with high reliability.


Geophysics ◽  
2020 ◽  
Vol 85 (4) ◽  
pp. WA147-WA158
Author(s):  
Kaibo Zhou ◽  
Jianyu Zhang ◽  
Yusong Ren ◽  
Zhen Huang ◽  
Luanxiao Zhao

Lithology identification based on conventional well-logging data is of great importance for geologic features characterization and reservoir quality evaluation in the exploration and production development of petroleum reservoirs. However, there are some limitations in the traditional lithology identification process: (1) It is very time consuming to build a model so that it cannot realize real-time lithology identification during well drilling, (2) it must be modeled by experienced geologists, which consumes a lot of manpower and material resources, and (3) the imbalance of labeled data in well-log data may reduce the classification performance of the model. We have developed a gradient boosting decision tree (GBDT) algorithm combining synthetic minority oversampling technique (SMOTE) to realize fast and automatic lithology identification. First, the raw well-log data are normalized by maximum and minimum normalization algorithm. Then, SMOTE is adopted to balance the number of samples in each class in training process. Next, a lithology identification model is built by GBDT to fit the preprocessed training data set. Finally, the built model is verified with the testing data set. The experimental results indicate that the proposed approach improves the lithology identification performance compared with other machine-learning approaches.


Author(s):  
Saadia Karim

<p>The purpose of this paper is to analyze the climate changes in Pakistan, identify issues related to weather disasters and to revisit weather prediction approaches. The proposed approach is based on different algorithms and their comparisons with reference to past 5years (2010 - 2015) data on 12 attributes. A flow diagram is given that identifies steps included in the process. Results are obtained using WEKA 3.7.13 (latest version 2015). The KNN algorithm and memory-based reasoning algorithm shows the accuracy of predicting weather forecasts. The BPANN algorithm is used to analyze the data set along with KNN and memory-based reasoning algorithms. Decision tree shows the accuracy of predicting weather forecasts. The KNN is used with Bayesian approach in this research. Attributes used in this research shows significant relationship while many of those work as independent variables. Since, for weather prediction these attributes are very important, we used variant factors based on time and date. The KNN algorithm using Bayesian classifier provides accurate results compare with memory-based reasoning of Decision Tree and BPANN trainlm and trainbr.</p>


Sign in / Sign up

Export Citation Format

Share Document