scholarly journals Switching Scheme: A Novel Approach for Handling Incremental Concept Drift in Real-World Data Sets

Author(s):  
Lucas Baier ◽  
Vincent Kellner ◽  
Niklas Kühl ◽  
Gerhard Satzger
Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 507
Author(s):  
Piotr Białczak ◽  
Wojciech Mazurczyk

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.


Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 349-371
Author(s):  
Hassan Mehmood ◽  
Panos Kostakos ◽  
Marta Cortes ◽  
Theodoros Anagnostopoulos ◽  
Susanna Pirttikangas ◽  
...  

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.


2020 ◽  
Vol 19 (2) ◽  
pp. 21-35
Author(s):  
Ryan Beal ◽  
Timothy J. Norman ◽  
Sarvapali D. Ramchurn

AbstractThis paper outlines a novel approach to optimising teams for Daily Fantasy Sports (DFS) contests. To this end, we propose a number of new models and algorithms to solve the team formation problems posed by DFS. Specifically, we focus on the National Football League (NFL) and predict the performance of real-world players to form the optimal fantasy team using mixed-integer programming. We test our solutions using real-world data-sets from across four seasons (2014-2017). We highlight the advantage that can be gained from using our machine-based methods and show that our solutions outperform existing benchmarks, turning a profit in up to 81.3% of DFS game-weeks over a season.


2009 ◽  
Vol 103 (1) ◽  
pp. 62-68
Author(s):  
Kathleen Cage Mittag ◽  
Sharon Taylor

Using activities to create and collect data is not a new idea. Teachers have been incorporating real-world data into their classes since at least the advent of the graphing calculator. Plenty of data collection activities and data sets exist, and the graphing calculator has made modeling data much easier. However, the authors were in search of a better physical model for a quadratic. We wanted students to see an actual parabola take shape in real time and then explore its characteristics, but we could not find such a hands-on model.


2013 ◽  
Vol 34 (3) ◽  
pp. 133-148 ◽  
Author(s):  
François Pomerleau ◽  
Francis Colas ◽  
Roland Siegwart ◽  
Stéphane Magnenat

Author(s):  
Nils Finke ◽  
Tanya Braun ◽  
Marcel Gehrke ◽  
Ralf Möller

Dynamic probabilistic relational models, which are factorized w.r.t. a full joint distribution, are used to cater for uncertainty and for relational and temporal aspects in real-world data. While these models assume the underlying temporal process to be stationary, real-world data often exhibits non-stationary behavior where the full joint distribution changes over time. We propose an approach to account for non-stationary processes w.r.t. to changing probability distributions over time, an effect known as concept drift. We use factorization and compact encoding of relations to efficiently detect drifts towards new probability distributions based on evidence.


Author(s):  
Lutz Oettershagen ◽  
Petra Mutzel

AbstractThe closeness centrality of a vertex in a classical static graph is the reciprocal of the sum of the distances to all other vertices. However, networks are often dynamic and change over time. Temporal distances take these dynamics into account. In this work, we consider the harmonic temporal closeness with respect to the shortest duration distance. We introduce an efficient algorithm for computing the exact top-k temporal closeness values and the corresponding vertices. The algorithm can be generalized to the task of computing all closeness values. Furthermore, we derive heuristic modifications that perform well on real-world data sets and drastically reduce the running times. For the case that edge traversal takes an equal amount of time for all edges, we lift two approximation algorithms to the temporal domain. The algorithms approximate the transitive closure of a temporal graph (which is an essential ingredient for the top-k algorithm) and the temporal closeness for all vertices, respectively, with high probability. We experimentally evaluate all our new approaches on real-world data sets and show that they lead to drastically reduced running times while keeping high quality in many cases. Moreover, we demonstrate that the top-k temporal and static closeness vertex sets differ quite largely in the considered temporal networks.


Sign in / Sign up

Export Citation Format

Share Document