Different algorithms, different models

Quality & Quantity ◽

10.1007/s11135-021-01193-9 ◽

2021 ◽

Author(s):

Martyna Daria Swiatczak

Keyword(s):

Comparative Analysis ◽

Real World ◽

Qualitative Comparative Analysis ◽

Comparative Methods ◽

Data Sets ◽

Simulation Studies ◽

Threshold Values ◽

Real World Data ◽

Software Packages ◽

Methodological Approaches

AbstractThis study assesses the extent to which the two main Configurational Comparative Methods (CCMs), i.e. Qualitative Comparative Analysis (QCA) and Coincidence Analysis (CNA), produce different models. It further explains how this non-identity is due to the different algorithms upon which both methods are based, namely QCA’s Quine–McCluskey algorithm and the CNA algorithm. I offer an overview of the fundamental differences between QCA and CNA and demonstrate both underlying algorithms on three data sets of ascending proximity to real-world data. Subsequent simulation studies in scenarios of varying sample sizes and degrees of noise in the data show high overall ratios of non-identity between the QCA parsimonious solution and the CNA atomic solution for varying analytical choices, i.e. different consistency and coverage threshold values and ways to derive QCA’s parsimonious solution. Clarity on the contrasts between the two methods is supposed to enable scholars to make more informed decisions on their methodological approaches, enhance their understanding of what is happening behind the results generated by the software packages, and better navigate the interpretation of results. Clarity on the non-identity between the underlying algorithms and their consequences for the results is supposed to provide a basis for a methodological discussion about which method and which variants thereof are more successful in deriving which search target.

Get full-text (via PubEx)

Hfinger: Malware HTTP Request Fingerprinting

Entropy ◽

10.3390/e23050507 ◽

2021 ◽

Vol 23 (5) ◽

pp. 507

Author(s):

Piotr Białczak ◽

Wojciech Mazurczyk

Keyword(s):

Real World ◽

Network Traffic ◽

Experimental Evaluation ◽

Data Sets ◽

Real World Data ◽

Malicious Software ◽

Default Mode ◽

World Data ◽

Effectiveness Analysis ◽

Http Protocol

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.

Get full-text (via PubEx)

Improving recommender systems’ performance on cold-start users and controversial items by a new similarity model

International Journal of Web Information Systems ◽

10.1108/ijwis-07-2015-0024 ◽

2016 ◽

Vol 12 (2) ◽

pp. 126-149 ◽

Cited By ~ 4

Author(s):

Masoud Mansoury ◽

Mehdi Shajari

Keyword(s):

Real World ◽

Design Methodology ◽

Cold Start ◽

Selection Function ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

User Similarity ◽

Active User ◽

Similarity Model

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.

Get full-text (via PubEx)

RW1 A Comparative Analysis of Recommendations for the Post-Reimbursement Collection of Real-World DATA (RWD) in Oncology Appraisals Issued By Six HTA Agencies

Value in Health ◽

10.1016/j.jval.2021.04.1200 ◽

2021 ◽

Vol 24 ◽

pp. S239

Author(s):

K. Gurjar ◽

S. Harricharan ◽

K. Nguyen ◽

A. Forsythe

Keyword(s):

Comparative Analysis ◽

Real World ◽

Real World Data ◽

World Data

Get full-text (via PubEx)

Boosting Instance Segmentation with Synthetic Data: A study to overcome the limits of real world data sets

10.1109/iccvw54120.2021.00110 ◽

2021 ◽

Author(s):

Florentin Poucin ◽

Andrea Kraus ◽

Martin Simon

Keyword(s):

Real World ◽

Synthetic Data ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Instance Segmentation

Get full-text (via PubEx)

Comparison of M2M Traffic Models Against Real World Data Sets

2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) ◽

10.1109/camad.2018.8515000 ◽

2018 ◽

Cited By ~ 1

Author(s):

Marco Sansoni ◽

Giuseppe Ravagnani ◽

Daniel Zucchetto ◽

Chiara Pielli ◽

Andrea Zanella ◽

...

Keyword(s):

Real World ◽

Data Sets ◽

Real World Data ◽

Traffic Models ◽

World Data

Get full-text (via PubEx)

Optimising Daily Fantasy Sports Teams with Artificial Intelligence

International Journal of Computer Science in Sport ◽

10.2478/ijcss-2020-0008 ◽

2020 ◽

Vol 19 (2) ◽

pp. 21-35

Author(s):

Ryan Beal ◽

Timothy J. Norman ◽

Sarvapali D. Ramchurn

Keyword(s):

Real World ◽

National Football League ◽

Mixed Integer ◽

Data Sets ◽

Fantasy Sports ◽

Real World Data ◽

Sports Teams ◽

Novel Approach ◽

Four Seasons ◽

Daily Fantasy Sports

AbstractThis paper outlines a novel approach to optimising teams for Daily Fantasy Sports (DFS) contests. To this end, we propose a number of new models and algorithms to solve the team formation problems posed by DFS. Specifically, we focus on the National Football League (NFL) and predict the performance of real-world players to form the optimal fantasy team using mixed-integer programming. We test our solutions using real-world data-sets from across four seasons (2014-2017). We highlight the advantage that can be gained from using our machine-based methods and show that our solutions outperform existing benchmarks, turning a profit in up to 81.3% of DFS game-weeks over a season.

Get full-text (via PubEx)

The failure of certain fractional calculus operators in two physical models

Fractional Calculus and Applied Analysis ◽

10.1515/fca-2019-0017 ◽

2019 ◽

Vol 22 (2) ◽

pp. 255-270 ◽

Cited By ~ 12

Author(s):

Manuel D. Ortigueira ◽

Valeriy Martynyuk ◽

Mykola Fedula ◽

J. Tenreiro Machado

Keyword(s):

Fractional Calculus ◽

Human Body ◽

Real World ◽

Fractional Derivatives ◽

Electrical Impedance ◽

Real Data ◽

Physical Models ◽

Data Sets ◽

Real World Data ◽

Fractional Calculus Operators

Abstract The ability of the so-called Caputo-Fabrizio (CF) and Atangana-Baleanu (AB) operators to create suitable models for real data is tested with real world data. Two alternative models based on the CF and AB operators are assessed and compared with known models for data sets obtained from electrochemical capacitors and the human body electrical impedance. The results show that the CF and AB descriptions perform poorly when compared with the classical fractional derivatives.

Get full-text (via PubEx)

Metric-Based Semi-Supervised Fuzzy C-Means Clustering

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.166 ◽

2011 ◽

Vol 268-270 ◽

pp. 166-171

Author(s):

Xue Song Yin ◽

Qi Huang ◽

Liang Ming Li

Keyword(s):

Real World ◽

Side Information ◽

Data Sets ◽

Membership Degree ◽

Real World Data ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Clustering And Classification ◽

Classification Tasks ◽

Fuzzy C Means Algorithm

This paper presents a metric-based semi-supervised fuzzy c-means algorithm called MSFCM. Through using side information and unlabeled data together, MSFCM can be applied to both clustering and classification tasks. The resulting algorithm has the following advantages compared with semi-supervised clustering: firstly, membership degree as side information is used to guide the clustering of the data; secondly, through the metric learned, clustering accuracy can be greatly improved. Experimental results on a collection of real-world data sets demonstrated the effectiveness of the proposed algorithm.

Get full-text (via PubEx)

Structure Identification-Based Clustering According to Density Consistency

Mathematical Problems in Engineering ◽

10.1155/2011/890901 ◽

2011 ◽

Vol 2011 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Chunzhong Li ◽

Zongben Xu

Keyword(s):

High Dimension ◽

Real World ◽

Clustering Algorithm ◽

Density Difference ◽

Structure Identification ◽

Data Sets ◽

Critical Importance ◽

Real World Data ◽

Data Set ◽

High Dimension Data

Structure of data set is of critical importance in identifying clusters, especially the density difference feature. In this paper, we present a clustering algorithm based on density consistency, which is a filtering process to identify same structure feature and classify them into same cluster. This method is not restricted by the shapes and high dimension data set, and meanwhile it is robust to noises and outliers. Extensive experiments on synthetic and real world data sets validate the proposed the new clustering algorithm.

Get full-text (via PubEx)

Activities for Students: As the Ball Rolls: A Quadratic Investigation Using Multiple Representations

Mathematics Teacher ◽

10.5951/mt.103.1.0062 ◽

2009 ◽

Vol 103 (1) ◽

pp. 62-68

Author(s):

Kathleen Cage Mittag ◽

Sharon Taylor

Keyword(s):

Data Collection ◽

Real Time ◽

Physical Model ◽

Real World ◽

Graphing Calculator ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Hands On ◽

Modeling Data

Using activities to create and collect data is not a new idea. Teachers have been incorporating real-world data into their classes since at least the advent of the graphing calculator. Plenty of data collection activities and data sets exist, and the graphing calculator has made modeling data much easier. However, the authors were in search of a better physical model for a quadratic. We wanted students to see an actual parabola take shape in real time and then explore its characteristics, but we could not find such a hands-on model.

Get full-text (via PubEx)