TaintBench: Automatic real-world malware benchmarking of Android taint analyses

AbstractDue to the lack of established real-world benchmark suites for static taint analyses of Android applications, evaluations of these analyses are often restricted and hard to compare. Even in evaluations that do use real-world apps, details about the ground truth in those apps are rarely documented, which makes it difficult to compare and reproduce the results. To push Android taint analysis research forward, this paper thus recommends criteria for constructing real-world benchmark suites for this specific domain, and presents TaintBench, the first real-world malware benchmark suite with documented taint flows. TaintBench benchmark apps include taint flows with complex structures, and addresses static challenges that are commonly agreed on by the community. Together with the TaintBench suite, we introduce the TaintBench framework, whose goal is to simplify real-world benchmarking of Android taint analyses. First, a usability test shows that the framework improves experts’ performance and perceived usability when documenting and inspecting taint flows. Second, experiments using TaintBench reveal new insights for the taint analysis tools Amandroid and FlowDroid: (i) They are less effective on real-world malware apps than on synthetic benchmark apps. (ii) Predefined lists of sources and sinks heavily impact the tools’ accuracy. (iii) Surprisingly, up-to-date versions of both tools are less accurate than their predecessors.

Download Full-text

CIRO: The Effects of Visually Diminished Real Objects on Human Perception in Handheld Augmented Reality

Electronics ◽

10.3390/electronics10080900 ◽

2021 ◽

Vol 10 (8) ◽

pp. 900

Author(s):

Hanseob Kim ◽

Taehyung Kim ◽

Myungho Lee ◽

Gerard Jounghyun Kim ◽

Jae-In Hwang

Keyword(s):

Augmented Reality ◽

Real World ◽

Human Perception ◽

Ground Truth ◽

Prior Work ◽

Comparative Experiment ◽

User Perception ◽

Depth Distortion ◽

Real Objects ◽

Visual Artifacts

Augmented reality (AR) scenes often inadvertently contain real world objects that are not relevant to the main AR content, such as arbitrary passersby on the street. We refer to these real-world objects as content-irrelevant real objects (CIROs). CIROs may distract users from focusing on the AR content and bring about perceptual issues (e.g., depth distortion or physicality conflict). In a prior work, we carried out a comparative experiment investigating the effects on user perception of the AR content by the degree of the visual diminishment of such a CIRO. Our findings revealed that the diminished representation had positive impacts on human perception, such as reducing the distraction and increasing the presence of the AR objects in the real environment. However, in that work, the ground truth test was staged with perfect and artifact-free diminishment. In this work, we applied an actual real-time object diminishment algorithm on the handheld AR platform, which cannot be completely artifact-free in practice, and evaluated its performance both objectively and subjectively. We found that the imperfect diminishment and visual artifacts can negatively affect the subjective user experience.

Download Full-text

Compositional Taint Analysis of Native Codes for Security Vetting of Android Applications

2020 10th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke50421.2020.9303643 ◽

2020 ◽

Author(s):

Seyed Behnam Andarzian ◽

Behrouz Tork Ladani

Keyword(s):

Taint Analysis ◽

Android Applications

Download Full-text

Classification of unlabeled online media

Scientific Reports ◽

10.1038/s41598-021-85608-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Social Media ◽

Real World ◽

Graphical Model ◽

Ground Truth ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Motion estimation in vehicular environments based on Bayesian dynamic networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219255 ◽

2021 ◽

pp. 1-12

Author(s):

Lauro Reyes-Cocoletzi ◽

Ivan Olmos-Pineda ◽

J. Arturo Olvera-Lopez

Keyword(s):

Real World ◽

Dynamic Networks ◽

Ground Truth ◽

Change Of Direction ◽

Prediction Rate ◽

Different Types ◽

Novel Method ◽

Comparison Of The Results ◽

Multiple Obstacles ◽

Real Traffic

The cornerstone to achieve the development of autonomous ground driving with the lowest possible risk of collision in real traffic environments is the movement estimation obstacle. Predicting trajectories of multiple obstacles in dynamic traffic scenarios is a major challenge, especially when different types of obstacles such as vehicles and pedestrians are involved. According to the issues mentioned, in this work a novel method based on Bayesian dynamic networks is proposed to infer the paths of interest objects (IO). Environmental information is obtained through stereo video, the direction vectors of multiple obstacles are computed and the trajectories with the highest probability of occurrence and the possibility of collision are highlighted. The proposed approach was evaluated using test environments considering different road layouts and multiple obstacles in real-world traffic scenarios. A comparison of the results obtained against the ground truth of the paths taken by each detected IO is performed. According to experimental results, the proposed method obtains a prediction rate of 75% for the change of direction taking into consideration the risk of collision. The importance of the proposal is that it does not obviate the risk of collision in contrast with related work.

Download Full-text

Multiple Noisy Label Distribution Propagation for Crowdsourcing

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/204 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hao Zhang ◽

Liangxiao Jiang ◽

Wenqiang Xu

Keyword(s):

Supervised Learning ◽

Real World ◽

Effective Means ◽

Ground Truth ◽

Cost Effective ◽

Nearest Neighbors ◽

True Label ◽

Real World Datasets ◽

The Individual ◽

Label Distribution

Crowdsourcing services provide a fast, efficient, and cost-effective means of obtaining large labeled data for supervised learning. Ground truth inference, also called label integration, designs proper aggregation strategies to infer the unknown true label of each instance from the multiple noisy label set provided by ordinary crowd workers. However, to the best of our knowledge, nearly all existing label integration methods focus solely on the multiple noisy label set itself of the individual instance while totally ignoring the intercorrelation among multiple noisy label sets of different instances. To solve this problem, a multiple noisy label distribution propagation (MNLDP) method is proposed in this study. MNLDP first transforms the multiple noisy label set of each instance into its multiple noisy label distribution and then propagates its multiple noisy label distribution to its nearest neighbors. Consequently, each instance absorbs a fraction of the multiple noisy label distributions from its nearest neighbors and yet simultaneously maintains a fraction of its own original multiple noisy label distribution. Promising experimental results on simulated and real-world datasets validate the effectiveness of our proposed method.

Download Full-text

Glean

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447703 ◽

2021 ◽

Vol 14 (6) ◽

pp. 997-1005

Author(s):

Sandeep Tata ◽

Navneet Potti ◽

James B. Wendt ◽

Lauro Beltrão Costa ◽

Marc Najork ◽

...

Keyword(s):

Machine Learning ◽

Data Management ◽

Real World ◽

Empirical Studies ◽

Ground Truth ◽

Training Data ◽

Ground Truth Data ◽

Document Type ◽

Machine Learning Model ◽

Structured Information

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.

Download Full-text

Ego-zones: non-symmetric dependencies reveal network groups with large and dense overlaps

Applied Network Science ◽

10.1007/s41109-019-0192-6 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 1

Author(s):

Milos Kudelka ◽

Eliska Ochodkova ◽

Sarka Zehnalova ◽

Jakub Plesnik

Keyword(s):

Network Structure ◽

Real World ◽

Ground Truth ◽

Structural Similarity ◽

Global Network ◽

Overlapping Communities ◽

Detection Algorithms ◽

First Case ◽

New Perspective ◽

The Individual

Abstract The existence of groups of nodes with common characteristics and the relationships between these groups are important factors influencing the structures of social, technological, biological, and other networks. Uncovering such groups and the relationships between them is, therefore, necessary for understanding these structures. Groups can either be found by detection algorithms based solely on structural analysis or identified on the basis of more in-depth knowledge of the processes taking place in networks. In the first case, these are mainly algorithms detecting non-overlapping communities or communities with small overlaps. The latter case is about identifying ground-truth communities, also on the basis of characteristics other than only network structure. Recent research into ground-truth communities shows that in real-world networks, there are nested communities or communities with large and dense overlaps which we are not yet able to detect satisfactorily only on the basis of structural network properties.In our approach, we present a new perspective on the problem of group detection using only the structural properties of networks. Its main contribution is pointing out the existence of large and dense overlaps of detected groups. We use the non-symmetric structural similarity between pairs of nodes, which we refer to as dependency, to detect groups that we call zones. Unlike other approaches, we are able, thanks to non-symmetry, accurately to describe the prominent nodes in the zones which are responsible for large zone overlaps and the reasons why overlaps occur. The individual zones that are detected provide new information associated in particular with the non-symmetric relationships within the group and the roles that individual nodes play in the zone. From the perspective of global network structure, because of the non-symmetric node-to-node relationships, we explore new properties of real-world networks that describe the differences between various types of networks.

Download Full-text

VRMiner

Processing and Managing Complex Data for Decision Support ◽

10.4018/978-1-59140-655-6.ch011 ◽

2011 ◽

pp. 318-339 ◽

Cited By ~ 4

Author(s):

H. Azzag ◽

F. Picarougne ◽

C. Guinot ◽

G. Venturini

Keyword(s):

Virtual Reality ◽

Real World ◽

Web Sites ◽

Contextual Information ◽

3D Models ◽

Multimedia Data ◽

Stereoscopic Display ◽

Specific Domain ◽

Data Glove ◽

Virtual Camera

We present in this chapter a new 3D interactive method for visualizing multimedia data with virtual reality named VRMiner. We consider that an expert in a specific domain has collected a set of examples described with numeric and symbolic attributes but also with sounds, images, videos and Web sites or 3D models, and that this expert wishes to explore these data to understand their structure. We use a 3D stereoscopic display in order to let the expert easily visualize and observe the data. We add to this display contextual information such as texts and small images, voice synthesis and sound. Larger images, videos and Web sites are displayed on a second computer in order to ensure real time display. Navigating through the data is done in a very intuitive and precise way with a 3D sensor that simulates a virtual camera. Interactive requests can be formulated by the expert with a data glove that recognizes the hand gestures. We show how this tool has been successfully applied to several real world applications.

Download Full-text