Divide and Conquer Machine Learning for a Genomics Analogy Problem

Author(s):  
Ming Ouyang ◽  
John Case ◽  
Joan Burnside
Information ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 233 ◽  
Author(s):  
Zuleika Nascimento ◽  
Djamel Sadok

Network traffic classification aims to identify categories of traffic or applications of network packets or flows. It is an area that continues to gain attention by researchers due to the necessity of understanding the composition of network traffics, which changes over time, to ensure the network Quality of Service (QoS). Among the different methods of network traffic classification, the payload-based one (DPI) is the most accurate, but presents some drawbacks, such as the inability of classifying encrypted data, the concerns regarding the users’ privacy, the high computational costs, and ambiguity when multiple signatures might match. For that reason, machine learning methods have been proposed to overcome these issues. This work proposes a Multi-Objective Divide and Conquer (MODC) model for network traffic classification, by combining, into a hybrid model, supervised and unsupervised machine learning algorithms, based on the divide and conquer strategy. Additionally, it is a flexible model since it allows network administrators to choose between a set of parameters (pareto-optimal solutions), led by a multi-objective optimization process, by prioritizing flow or byte accuracies. Our method achieved 94.14% of average flow accuracy for the analyzed dataset, outperforming the six DPI-based tools investigated, including two commercial ones, and other machine learning-based methods.


Author(s):  
Yingjun Shen ◽  
Zhe Song ◽  
Andrew Kusiak

Abstract Wind farm needs prediction models for predictive maintenance. There is a need to predict values of non-observable parameters beyond ranges reflected in available data. A prediction model developed for one machine many not perform well in another similar machine. This is usually due to lack of generalizability of data-driven models. To increase generalizability of predictive models, this research integrates the data mining with first-principle knowledge. Physics-based principles are combined with machine learning algorithms through feature engineering, strong rules and divide-and-conquer. The proposed synergy concept is illustrated with the wind turbine blade icing prediction and achieves significant prediction accuracy across different turbines. The proposed process is widely accepted by wind energy predictive maintenance practitioners because of its simplicity and efficiency. Furthermore, the testing scores of KNN, CART and DNN algorithm are increased by 44.78%, 32.72% and 9.13% with our proposed process. We demonstrated the importance of embedding physical principles within the machine learning process, and also highlight an important point that the need for more complex machine learning algorithms in industrial big data mining is often much less than it is in other applications, making it essential to incorporate physics and follow “Less is More” philosophy.


This article proposes a white-hat worm launcher based on machine learning (ML) adaptable to large-scale IoT network for Botnet Defense System (BDS). BDS is a cyber-security system that uses white-hat worms to exterminate malicious botnets. White-hat worms defend an IoT system against malicious bots, the BDS decides the number of white-hat worms, but there is no discussion on the white-hat worms' deployment in IoT network. Therefore, the authors propose a machine-learning-based launcher to launch the white-hat worms effectively along with a divide and conquer algorithm to deploy the launcher to large-scale IoT networks. Then the authors modeled BDS and the launcher with agent-oriented Petri net and confirmed the effect through the simulation of the PN2 model. The result showed that the proposed launcher can reduce the number of infected devices by about 30-40%.


Author(s):  
Xiaoyong Cao ◽  
Pu Tian

Molecular modeling is widely utilized in subjects including but not limited to physics, chemistry, biology, materials science and engineering. Impressive progress has been made in development of theories, algorithms and software packages. To divide and conquer, and to cache intermediate results have been long standing principles in development of algorithms. Not surprisingly, Most of important methodological advancements in more than half century of molecule modeling are various implementations of these two fundamental principles. To access interesting behavior of complex molecular systems in a wide range of spatial and temporal scales, the molecular modeling community has invested tremendous efforts on two lines of algorithm development. The first is coarse graining, which is to represent multiple basic particles in higher resolution modeling as a single larger and softer particle in lower resolution counterpart, with resulting force fields of partial transferability at the expense of some information loss. The second is enhanced sampling, which realizes "dividing and conquering" and/or "caching" in configurational space with focus either on reaction coordinates and collective variables as in Metadynamics and related algorithms, or on the transition matrix and state discretization as in Markov state models. For this line of algorithms, spatial resolution is maintained but no transferability is available. With introduction of machine learning techniques, many new developments, particularly those based on deep learning, have been implemented to realize more efficient and accurate ways of "dividing and conquering" and "caching" along these two lines of algorithmic research. We recently developed the generalized solvation free energy theory , which suggests a third class of algorithm that facilitate molecular modeling through partially transferable in resolution "caching" of local free energy landscape. Connections and potential interactions among these three algorithmic directions are discussed. This brief review is on both the traditional development and the application of machine learning in molecular modeling from the perspective of "dividing and conquering" and "caching", with the hope to stimulate development of more elegant, efficient and reliable formulations and algorithms in this regard.


2020 ◽  
Vol 195 ◽  
pp. 454-467 ◽  
Author(s):  
Yue Liu ◽  
Junming Wu ◽  
Zhichao Wang ◽  
Xiao-Gang Lu ◽  
Maxim Avdeev ◽  
...  

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Jiujun Cheng ◽  
Yueqiao Cai ◽  
Qingyang Zhang ◽  
Junlu Cheng ◽  
Chendan Yan

The researches on two-dimensional indoor positioning based on wireless LAN and the location fingerprint methods have become mature, but in the actual indoor positioning situation, users are also concerned about the height where they stand. Due to the expansion of the range of three-dimensional indoor positioning, more features must be needed to describe the location fingerprint. Directly using a machine learning algorithm will result in the reduced ability of classification. To solve this problem, in this paper, a “divide and conquer” strategy is adopted; that is, first through k-medoids algorithm the three-dimensional location space is clustered into a number of service areas, and then a multicategory SVM with less features is created for each service area for further positioning. Our experiment shows that the error distance resolution of the approach with k-medoids algorithm and multicategory SVM is higher than that of the approach only with SVM, and the former can effectively decrease the “crazy prediction.”


2020 ◽  
pp. 40-48
Author(s):  
Yas Alsultanny

We examined data mining as a technique to extract knowledge from database to predicate PM10 concentration related to meteorological parameters. The purpose of this paper is to compare between the two types of machine learning by data mining decision tree algorithms Reduced Error Pruning Tree (REPTree) and divide and conquer M5P to predicate Particular Matter 10 (PM10) concentration depending on meteorological parameters. The results of the analysis showed M5P tree gave higher correlation compared with REPTree, moreover lower errors, and higher number of rules, the elapsed time for processing REPTree is less than the time processing of M5P. Both of these trees proved that humidity absorbed PM10. The paper recommends REPTree and M5P for predicting PM10 and other pollution gases.


Sign in / Sign up

Export Citation Format

Share Document