A Micro-Cluster-Based Data Stream Clustering Method for P2P Traffic Classification

2012 ◽  
Vol 263-266 ◽  
pp. 1121-1126
Author(s):  
Guang Hui Yan ◽  
Ming Hao Ai

Many machine learning techniques were proposed to classify P2P traffic and each with reasonable successes. But in the real P2P network environment, new communities of peers often attend and old communities of peers often leave. It requires the identification methods to be capable of coping with concept drift and updating the model incrementally. In this paper, we presented a concept-adapting algorithm MCStream which was based on streaming data mining techniques to identify P2P applications in Internet traffic. The MCStream used two micro-cluster structures, potential micro-cluster structures and outlier micro-cluster structures, to classify the P2P traffic and discovered the concept drift with limited memory. Our performance studied over a number of real data which was captured at a main gateway router demonstrates the effectiveness and efficiency of our method.

2021 ◽  
Vol 4 (1) ◽  
pp. 17
Author(s):  
Tariq Mahmood ◽  
Tatheer Fatima

World is generating immeasurable amount of data every minute, that needs to be analyzed for better decision making. In order to fulfil this demand of faster analytics, businesses are adopting efficient stream processing and machine learning techniques. However, data streams are particularly challenging to handle. One of the prominent problems faced while dealing with streaming data is concept drift. Concept drift is described as, an unexpected change in the underlying distribution of the streaming data that can be observed as time passes. In this work, we have conducted a systematic literature review to discover several methods that deal with the problem of concept drift. Most frequently used supervised and unsupervised techniques have been reviewed and we have also surveyed commonly used publicly available artificial and real-world datasets that are used to deal with concept drift issues.


2021 ◽  
Author(s):  
◽  
Cao Truong Tran

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors.    Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values.   The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers.   The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data.   The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data.   The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers.   The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data.   The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data.    In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>


Author(s):  
Karthick G. S. ◽  
Pankajavalli P. B.

The internet of things (IoT) revolution is improving the proficiency of human healthcare infrastructures, and this chapter analyzes the applications of IoT in healthcare systems with diversified aspects such as topological arrangement of medical devices, layered architecture, and platform services. This chapter focuses on advancements in IoT-based healthcare in order to identify the communication and sensing technologies enabling the smart healthcare systems. The transformation of healthcare from doctor-centric to patient-centric with the diversified applications of IoT is discussed in detail. In addition, this chapter examines the various issues to be emphasized on designing an effective IoT-based healthcare system. It also explores security in healthcare systems and the possible security threats that may be vulnerable to the security essentials. Finally, this chapter summarizes the procedure of applying machine learning techniques on healthcare streaming data which provides intelligence to the systems.


2016 ◽  
Vol 2016 ◽  
pp. 1-13 ◽  
Author(s):  
Jeankyung Kim ◽  
Jinsoo Hwang ◽  
Kichang Kim

As internet traffic rapidly increases, fast and accurate network classification is becoming essential for high quality of service control and early detection of network traffic abnormalities. Machine learning techniques based on statistical features of packet flows have recently become popular for network classification partly because of the limitations of traditional port- and payload-based methods. In this paper, we propose a Markov model-based network classification with a Kullback-Leibler divergence criterion. Our study is mainly focused on hard-to-classify (or overlapping) traffic patterns of network applications, which current techniques have difficulty dealing with. The results of simulations conducted using our proposed method indicate that the overall accuracy reaches around 90% with a reasonable group size ofn=100.


Author(s):  
Qi Wang ◽  
Xia Zhao ◽  
Jincai Huang ◽  
Yanghe Feng ◽  
Zhong Liu ◽  
...  

The concept of &lsquo;big data&rsquo; has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.


2017 ◽  
Vol 3 (10) ◽  
Author(s):  
Anjum Khan ◽  
Anjana Nigam

 As the network primarily based applications are growing quickly, the network security mechanisms need a lot of attention to enhance speed and preciseness. The ever evolving new intrusion types cause a significant threat to network security. Though varied network security tools are developed, however the quick growth of intrusive activities continues to be a significant issue. Intrusion detection systems (IDSs) are wont to detect intrusive activities on the network. Analysis showed that application of machine learning techniques in intrusion detection might reach high detection rate. Machine learning and classification algorithms facilitate to design “Intrusion Detection Models” which might classify the network traffic into intrusive or traditional traffic. This paper discusses some usually used machine learning techniques in Intrusion Detection System and conjointly reviews a number of the prevailing machine learning IDS proposed by researchers at different times. in this paper an experimental analysis is performed to demonstrate the performance analysis of some existing techniques in order that they will be used further in developing Hybrid Classifier for real data packets classification. The given result analysis shows that KNN, RF and SVM performs best for NSL-KDD dataset.


2020 ◽  
Vol 110 (11-12) ◽  
pp. 2991-3003
Author(s):  
Panagiotis Stavropoulos ◽  
Alexios Papacharalampopoulos ◽  
John Stavridis ◽  
Kyriakos Sampatakakis

Abstract Diagnosis systems for laser processing are being integrated into industry. However, their readiness level is still questionable under the prism of the Industry’s 4.0 design principles for interoperability and intuitive technical assistance. This paper presents a novel multifunctional, web-based, real-time quality diagnosis platform, in the context of a laser welding application, fused with decision support, data visualization, storing, and post-processing functionalities. The platform’s core considers a quality assessment module, based upon a three-stage method which utilizes feature extraction and machine learning techniques for weld defect detection and quality prediction. A multisensorial configuration streams image data from the weld pool to the module in which a statistical and geometrical method is applied for selecting the input features for the classification model. A Hidden Markov Model is then used to fuse this information with earlier results for a decision to be made on the basis of maximum likelihood. The outcome is fed through web services in a tailored User Interface. The platform’s operation has been validated with real data.


Author(s):  
Qi Wang ◽  
Xia Zhao ◽  
Jincai Huang ◽  
Yanghe Feng ◽  
Jiahao Su ◽  
...  

The concept of &lsquo;big data&rsquo; has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.


2021 ◽  
Author(s):  
Lucas Evangelista de Souza ◽  
Raimundo Ghizoni Teive

The electricity distribution network is responsible for supplying energy to consumers in the National Interconnected System, serving 99% of consumers in Brazil. There are two types of losses in this network: technical losses and non-technical losses or commercial losses. In the case of non-technical losses, the focus of this work, the existence of these results in a higher tariff for all consumers, so that the concessionaire can compensate for such reduction in revenue. Non-technical losses are usually associated with fraud (meter tampering or deviations). The main objective of this work is the application of machine learning techniques, using software R, to identify possible fraudulent behaviors of commercial consumers in the state of Santa Catarina. Considering data from typical consumer load curves and functional information from the company. Preliminary results, using real data from consumers, indicate that the SVM classifier used performed well in the cases studied, achieving precision and accuracy greater than 90%. The input variables selected for the classifier, based mainly on data and information from typical load curves, are the differential of this work, as well as the main reason for the success in the initial tests.


Sign in / Sign up

Export Citation Format

Share Document