Comparing Platform Core Features with Third-Party Complements. Machine-Learning Evidence from Apple iOS.

Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.

Download Full-text

Research on the third party logistics mode of cross border e-commerce based on machine learning algorithm

10.1145/3482632.3482716 ◽

2021 ◽

Author(s):

Xiaojiao Zeng ◽

Wei Wang

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Third Party ◽

Machine Learning Algorithm ◽

Third Party Logistics ◽

The Third Party Logistics ◽

The Third ◽

Cross Border ◽

Logistics Mode ◽

The Third Party

Download Full-text

Evaluation of Interstate Work Zone Mobility using Probe Vehicle Data and Machine Learning Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119827936 ◽

2019 ◽

Vol 2673 (2) ◽

pp. 811-822 ◽

Cited By ~ 1

Author(s):

Mohsen Kamyab ◽

Stephen Remias ◽

Erfan Najmi ◽

Kerrick Hood ◽

Mustafa Al-Akshar ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Work Zone ◽

Third Party ◽

Machine Learning Techniques ◽

Work Zones ◽

Vehicle Data ◽

Gps Devices ◽

Future Work ◽

The Impact

According to the Federal Highway Administration (FHWA), US work zones on freeways account for nearly 24% of nonrecurring freeway delays and 10% of overall congestion. Historically, there have been limited scalable datasets to investigate the specific causes of congestion due to work zones or to improve work zone planning processes to characterize the impact of work zone congestion. In recent years, third-party data vendors have provided scalable speed data from Global Positioning System (GPS) devices and cell phones which can be used to characterize mobility on all roadways. Each work zone has unique characteristics and varying mobility impacts which are predicted during the planning and design phases, but can realistically be quite different from what is ultimately experienced by the traveling public. This paper uses these datasets to introduce a scalable Work Zone Mobility Audit (WZMA) template. Additionally, the paper uses metrics developed for individual work zones to characterize the impact of more than 250 work zones varying in length and duration from Southeast Michigan. The authors make recommendations to work zone engineers on useful data to collect for improving the WZMA. As more systematic work zone data are collected, improved analytical assessment techniques, such as machine learning processes, can be used to identify the factors that will predict future work zone impacts. The paper concludes by demonstrating two machine learning algorithms, Random Forest and XGBoost, which show historical speed variation is a critical component when predicting the mobility impact of work zones.

Download Full-text

A Machine Learning Approach for Detecting Third-Party Trackers on the Web

Computer Security – ESORICS 2016 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-45744-4_12 ◽

2016 ◽

pp. 238-258 ◽

Cited By ~ 5

Author(s):

Qianru Wu ◽

Qixu Liu ◽

Yuqing Zhang ◽

Peng Liu ◽

Guanxing Wen

Keyword(s):

Machine Learning ◽

Third Party ◽

Learning Approach ◽

Machine Learning Approach ◽

The Web

Download Full-text

Machine Learning-Based Design Concept Evaluation

Journal of Mechanical Design ◽

10.1115/1.4045126 ◽

2020 ◽

Vol 142 (3) ◽

Cited By ~ 1

Author(s):

Bradley Camburn ◽

Yuejun He ◽

Sujithra Raviselvam ◽

Jianxi Luo ◽

Kristin Wood

Keyword(s):

Machine Learning ◽

Technology Development ◽

Empirical Evaluation ◽

Learning Technologies ◽

Design Concept ◽

Third Party ◽

Large Set ◽

Design Concepts ◽

Automated Method ◽

Concept Evaluation

Abstract In order to develop novel solutions for complex systems and in increasingly competitive markets, it may be advantageous to generate large numbers of design concepts and then to identify the most novel and valuable ideas. However, it can be difficult to process, review, and assess thousands of design concepts. Based on this need, we develop and demonstrate an automated method for design concept assessment. In the method, machine learning technologies are first applied to extract ontological data from design concepts. Then, a filtering strategy and quantitative metrics are introduced that enable creativity rating based on the ontological data. This method is tested empirically. Design concepts are crowd-generated for a variety of actual industry design problems/opportunities. Over 4000 design concepts were generated by humans for assessment. Empirical evaluation assesses: (1) correspondence of the automated ratings with human creativity ratings; (2) whether concepts selected using the method are highly scored by another set of crowd raters; and finally (3) if high scoring designs have a positive correlation or relationship to industrial technology development. The method provides a possible avenue to rate design concepts deterministically. A highlight is that a subset of designs selected automatically out of a large set of candidates was scored higher than a subset selected by humans when evaluated by a set of third-party raters. The results hint at bias in human design concept selection and encourage further study in this topic.

Download Full-text

ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5432 ◽

2020 ◽

Vol 34 (01) ◽

pp. 865-872

Author(s):

Soham Pal ◽

Yash Gupta ◽

Aditya Shukla ◽

Aditya Kanade ◽

Shirish Shevade ◽

...

Keyword(s):

Machine Learning ◽

Active Learning ◽

Domain Knowledge ◽

Third Party ◽

Prior Work ◽

Model Extraction ◽

Public Data ◽

Adversarial Examples ◽

Application Programming ◽

Public Datasets

Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text

Secure and Selective Cloud Data Auditing using Deep Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2361.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5471-5479

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Service Providers ◽

Cloud Service ◽

Third Party ◽

Cloud Data ◽

Data Centre ◽

Cloud Service Providers ◽

Characteristics Analysis ◽

Data Centres

The tradition of moving applications, data to be consumed by the applications and the data generated by the applications is increasing and the increase is due to the advantages of cloud computing. The advantages of cloud computing are catered to the application owners, application consumers and at the same time to the cloud datacentre owners or the cloud service providers also. Since IT tasks are vital for business progression, it for the most part incorporates repetitive or reinforcement segments and framework for power supply, data correspondences associations, natural controls and different security gadgets. An extensive data centre is a mechanical scale task utilizing as much power as a community. The primary advantage of pushing the applications on the cloud-based data centres are low infrastructure maintenance with significant cost reduction for the application owners and the high profitability for the data centre cloud service providers. During the application migration to the cloud data centres, the data and few components of the application become exposed to certain users. Also, the applications, which are hosted on the cloud data centres must comply with the certain standards for being accepted by various application consumers. In order to achieve the standard certifications, the applications and the data must be audited by various auditing companies. Few of the cases, the auditors are hired by the data centre owners and few of times, the auditors are engaged by application consumers. Nonetheless, in both situations, the auditors are third party and the risk of exposing business logics in the applications and the data always persists. Nevertheless, the auditor being a third-party user, the data exposure is a high risk. Also, in a data centre environment, it is highly difficult to ensure isolation of the data from different auditors, who may not be have the right to audit the data. Significant number of researches have attempted to provide a generic solution to this problem. However, the solutions are highly criticized by the research community for making generic assumptions during the permission verification process. Henceforth, this work produces a novel machine learning based algorithm to assign or grant audit access permissions to specific auditors in a random situation without other approvals based on the characteristics of the virtual machine, in which the application and the data is deployed, and the auditing user entity. The results of the proposed algorithm are highly satisfactory and demonstrates nearly 99% accuracy on data characteristics analysis, nearly 98% accuracy on user characteristics analysis and 100% accuracy on secure auditor selection process

Download Full-text

Detecting Anti Ad-blockers in the Wild

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2017-0032 ◽

2017 ◽

Vol 2017 (3) ◽

pp. 130-146 ◽

Cited By ~ 10

Author(s):

Muhammad Haris Mughees ◽

Zhiyun Qian ◽

Zubair Shafiq

Keyword(s):

Machine Learning ◽

Large Scale ◽

Online Advertising ◽

Arms Race ◽

Third Party ◽

Measurement Study ◽

Economic Threat ◽

In The Wild ◽

Automated Machine Learning ◽

The Web

Abstract The rise of ad-blockers is viewed as an economic threat by online publishers who primarily rely on online advertising to monetize their services. To address this threat, publishers have started to retaliate by employing anti ad-blockers, which scout for ad-block users and react to them by pushing users to whitelist the website or disable ad-blockers altogether. The clash between ad-blockers and anti ad-blockers has resulted in a new arms race on the Web. In this paper, we present an automated machine learning based approach to identify anti ad-blockers that detect and react to ad-block users. The approach is promising with precision of 94.8% and recall of 93.1%. Our automated approach allows us to conduct a large-scale measurement study of anti ad-blockers on Alexa top-100K websites. We identify 686 websites that make visible changes to their page content in response to ad-block detection. We characterize the spectrum of different strategies used by anti ad-blockers. We find that a majority of publishers use fairly simple first-party anti ad-block scripts. However, we also note the use of third-party anti ad-block services that use more sophisticated tactics to detect and respond to ad-blockers.

Download Full-text

Modelling MTPL insurance claim events: Can machine learning methods overperform the traditional GLM approach?

Hungarian Statistical Review ◽

10.35618/hsr2021.02.en034 ◽

2021 ◽

Vol 4 (2) ◽

pp. 34-69

Author(s):

Dávid Burka ◽

László Kovács ◽

László Szepesváry

Keyword(s):

Machine Learning ◽

Network Models ◽

Statistical Modelling ◽

Third Party ◽

Insurance Companies ◽

Neural Network Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Long Time ◽

Generalised Linear Modelling

Pricing an insurance product covering motor third-party liability is a major challenge for actuaries. Comprehensive statistical modelling and modern computational power are necessary to solve this problem. The generalised linear and additive modelling approaches have been widely used by insurance companies for a long time. Modelling with modern machine learning methods has recently started, but applying them properly with relevant features is a great issue for pricing experts. This study analyses the claim-causing probability by fitting generalised linear modelling, generalised additive modelling, random forest, and neural network models. Several evaluation measures are used to compare these techniques. The best model is a mixture of the base methods. The authors’ hypothesis about the existence of significant interactions between feature variables is proved by the models. A simplified classification and visualisation is performed on the final model, which can support tariff applications later.

Download Full-text