Divide and Conquer Machine Learning for a Genomics Analogy Problem

Network traffic classification aims to identify categories of traffic or applications of network packets or flows. It is an area that continues to gain attention by researchers due to the necessity of understanding the composition of network traffics, which changes over time, to ensure the network Quality of Service (QoS). Among the different methods of network traffic classification, the payload-based one (DPI) is the most accurate, but presents some drawbacks, such as the inability of classifying encrypted data, the concerns regarding the users’ privacy, the high computational costs, and ambiguity when multiple signatures might match. For that reason, machine learning methods have been proposed to overcome these issues. This work proposes a Multi-Objective Divide and Conquer (MODC) model for network traffic classification, by combining, into a hybrid model, supervised and unsupervised machine learning algorithms, based on the divide and conquer strategy. Additionally, it is a flexible model since it allows network administrators to choose between a set of parameters (pareto-optimal solutions), led by a multi-objective optimization process, by prioritizing flow or byte accuracies. Our method achieved 94.14% of average flow accuracy for the analyzed dataset, outperforming the six DPI-based tools investigated, including two commercial ones, and other machine learning-based methods.

Download Full-text

Medium‐Term Forecasting of Loop Current Eddy Cameron and Eddy Darwin Formation in the Gulf of Mexico With a Divide‐and‐Conquer Machine Learning Approach

Journal of Geophysical Research Oceans ◽

10.1029/2019jc015172 ◽

2019 ◽

Vol 124 (8) ◽

pp. 5586-5606

Author(s):

Justin L. Wang ◽

Hanqi Zhuang ◽

Laurent M. Chérubin ◽

Ali K. Ibrahim ◽

Ali Muhamed Ali

Keyword(s):

Machine Learning ◽

Gulf Of Mexico ◽

Divide And Conquer ◽

Learning Approach ◽

Loop Current ◽

Medium Term ◽

Machine Learning Approach

Download Full-text

Enhancing Generalizability of Predictive Models with Synergy of Data and Physics

Measurement Science and Technology ◽

10.1088/1361-6501/ac3944 ◽

2021 ◽

Author(s):

Yingjun Shen ◽

Zhe Song ◽

Andrew Kusiak

Keyword(s):

Machine Learning ◽

Data Mining ◽

Predictive Models ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Predictive Maintenance ◽

Divide And Conquer ◽

Less Is More ◽

Industrial Big Data

Abstract Wind farm needs prediction models for predictive maintenance. There is a need to predict values of non-observable parameters beyond ranges reflected in available data. A prediction model developed for one machine many not perform well in another similar machine. This is usually due to lack of generalizability of data-driven models. To increase generalizability of predictive models, this research integrates the data mining with first-principle knowledge. Physics-based principles are combined with machine learning algorithms through feature engineering, strong rules and divide-and-conquer. The proposed synergy concept is illustrated with the wind turbine blade icing prediction and achieves significant prediction accuracy across different turbines. The proposed process is widely accepted by wind energy predictive maintenance practitioners because of its simplicity and efficiency. Furthermore, the testing scores of KNN, CART and DNN algorithm are increased by 44.78%, 32.72% and 9.13% with our proposed process. We demonstrated the importance of embedding physical principles within the machine learning process, and also highlight an important point that the need for more complex machine learning algorithms in industrial big data mining is often much less than it is in other applications, making it essential to incorporate physics and follow “Less is More” philosophy.

Download Full-text

Machine-Learning-Based White-Hat Worm Launcher in Botnet Defense System

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.291713 ◽

2022 ◽

Vol 14 (1) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Petri Net ◽

Cyber Security ◽

Large Scale ◽

Security System ◽

Divide And Conquer ◽

Defense System ◽

Divide And Conquer Algorithm

This article proposes a white-hat worm launcher based on machine learning (ML) adaptable to large-scale IoT network for Botnet Defense System (BDS). BDS is a cyber-security system that uses white-hat worms to exterminate malicious botnets. White-hat worms defend an IoT system against malicious bots, the BDS decides the number of white-hat worms, but there is no discussion on the white-hat worms' deployment in IoT network. Therefore, the authors propose a machine-learning-based launcher to launch the white-hat worms effectively along with a divide and conquer algorithm to deploy the launcher to large-scale IoT networks. Then the authors modeled BDS and the launcher with agent-oriented Petri net and confirmed the effect through the simulation of the PN2 model. The result showed that the proposed launcher can reduce the number of infected devices by about 30-40%.

Download Full-text

"Dividing and Conquering" and "Caching" in Molecular Modeling

10.20944/preprints202012.0081.v1 ◽

2020 ◽

Author(s):

Xiaoyong Cao ◽

Pu Tian

Keyword(s):

Machine Learning ◽

Free Energy ◽

Molecular Modeling ◽

Materials Science ◽

Divide And Conquer ◽

Machine Learning Techniques ◽

Collective Variables ◽

Spatial And Temporal Scales ◽

Wide Range ◽

State Models

Molecular modeling is widely utilized in subjects including but not limited to physics, chemistry, biology, materials science and engineering. Impressive progress has been made in development of theories, algorithms and software packages. To divide and conquer, and to cache intermediate results have been long standing principles in development of algorithms. Not surprisingly, Most of important methodological advancements in more than half century of molecule modeling are various implementations of these two fundamental principles. To access interesting behavior of complex molecular systems in a wide range of spatial and temporal scales, the molecular modeling community has invested tremendous efforts on two lines of algorithm development. The first is coarse graining, which is to represent multiple basic particles in higher resolution modeling as a single larger and softer particle in lower resolution counterpart, with resulting force fields of partial transferability at the expense of some information loss. The second is enhanced sampling, which realizes "dividing and conquering" and/or "caching" in configurational space with focus either on reaction coordinates and collective variables as in Metadynamics and related algorithms, or on the transition matrix and state discretization as in Markov state models. For this line of algorithms, spatial resolution is maintained but no transferability is available. With introduction of machine learning techniques, many new developments, particularly those based on deep learning, have been implemented to realize more efficient and accurate ways of "dividing and conquering" and "caching" along these two lines of algorithmic research. We recently developed the generalized solvation free energy theory , which suggests a third class of algorithm that facilitate molecular modeling through partially transferable in resolution "caching" of local free energy landscape. Connections and potential interactions among these three algorithmic directions are discussed. This brief review is on both the traditional development and the application of machine learning in molecular modeling from the perspective of "dividing and conquering" and "caching", with the hope to stimulate development of more elegant, efficient and reliable formulations and algorithms in this regard.

Download Full-text

Predicting creep rupture life of Ni-based single crystal superalloys using divide-and-conquer approach based machine learning

Acta Materialia ◽

10.1016/j.actamat.2020.05.001 ◽

2020 ◽

Vol 195 ◽

pp. 454-467 ◽

Cited By ~ 2

Author(s):

Yue Liu ◽

Junming Wu ◽

Zhichao Wang ◽

Xiao-Gang Lu ◽

Maxim Avdeev ◽

...

Keyword(s):

Machine Learning ◽

Single Crystal ◽

Rupture Life ◽

Creep Rupture ◽

Divide And Conquer

Download Full-text

A New Three-Dimensional Indoor Positioning Mechanism Based on Wireless LAN

Mathematical Problems in Engineering ◽

10.1155/2014/862347 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 5

Author(s):

Jiujun Cheng ◽

Yueqiao Cai ◽

Qingyang Zhang ◽

Junlu Cheng ◽

Chendan Yan

Keyword(s):

Machine Learning ◽

Wireless Lan ◽

Learning Algorithm ◽

Three Dimensional ◽

Indoor Positioning ◽

Divide And Conquer ◽

Machine Learning Algorithm ◽

Two Dimensional ◽

Service Areas ◽

Error Distance

The researches on two-dimensional indoor positioning based on wireless LAN and the location fingerprint methods have become mature, but in the actual indoor positioning situation, users are also concerned about the height where they stand. Due to the expansion of the range of three-dimensional indoor positioning, more features must be needed to describe the location fingerprint. Directly using a machine learning algorithm will result in the reduced ability of classification. To solve this problem, in this paper, a “divide and conquer” strategy is adopted; that is, first through k-medoids algorithm the three-dimensional location space is clustered into a number of service areas, and then a multicategory SVM with less features is created for each service area for further positioning. Our experiment shows that the error distance resolution of the approach with k-medoids algorithm and multicategory SVM is higher than that of the approach only with SVM, and the former can effectively decrease the “crazy prediction.”

Download Full-text

TaxoDaCML: Taxonomy based Divide and Conquer using machine learning approach for DDoS attack classification

International Journal of Information Management Data Insights ◽

10.1016/j.jjimei.2021.100048 ◽

2021 ◽

Vol 1 (2) ◽

pp. 100048

Author(s):

Onkar Thorat ◽

Nirali Parekh ◽

Ramchandra Mangrulkar

Keyword(s):

Machine Learning ◽

Divide And Conquer ◽

Learning Approach ◽

Ddos Attack ◽

Machine Learning Approach

Download Full-text

Machine Learning by Data Mining REPTree and M5P for Predicating Novel Information for PM10

Cloud Computing and Data Science ◽

10.37256/ccds.112020418 ◽

2020 ◽

pp. 40-48

Author(s):

Yas Alsultanny

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Tree ◽

Meteorological Parameters ◽

Pm10 Concentration ◽

Divide And Conquer ◽

Time Processing ◽

Elapsed Time ◽

Tree Algorithms

We examined data mining as a technique to extract knowledge from database to predicate PM10 concentration related to meteorological parameters. The purpose of this paper is to compare between the two types of machine learning by data mining decision tree algorithms Reduced Error Pruning Tree (REPTree) and divide and conquer M5P to predicate Particular Matter 10 (PM10) concentration depending on meteorological parameters. The results of the analysis showed M5P tree gave higher correlation compared with REPTree, moreover lower errors, and higher number of rules, the elapsed time for processing REPTree is less than the time processing of M5P. Both of these trees proved that humidity absorbed PM10. The paper recommends REPTree and M5P for predicting PM10 and other pollution gases.

Download Full-text