Static detection of silent misconfigurations with deep interaction analysis

The behavior of large systems is guided by their configurations: users set parameters in the configuration file to dictate which corresponding part of the system code is executed. However, it is often the case that, although some parameters are set in the configuration file, they do not influence the system runtime behavior, thus failing to meet the user’s intent. Moreover, such misconfigurations rarely lead to an error message or raising an exception. We introduce the notion of silent misconfigurations which are prohibitively hard to identify due to (1) lack of feedback and (2) complex interactions between configurations and code. This paper presents ConfigX, the first tool for the detection of silent misconfigurations. The main challenge is to understand the complex interactions between configurations and the code that they affected. Our goal is to derive a specification describing non-trivial interactions between the configuration parameters that lead to silent misconfigurations. To this end, ConfigX uses static analysis to determine which parts of the system code are associated with configuration parameters. ConfigX then infers the connections between configuration parameters by analyzing their associated code blocks. We design customized control- and data-flow analysis to derive a specification of configurations. Additionally, we conduct reachability analysis to eliminate spurious rules to reduce false positives. Upon evaluation on five real-world datasets across three widely-used systems, Apache, vsftpd, and PostgreSQL, ConfigX detected more than 2200 silent misconfigurations. We additionally conducted a user study where we ran ConfigX on misconfigurations reported on user forums by real-world users. ConfigX easily detected issues and suggested repairs for those misconfigurations. Our solutions were accepted and confirmed in the interaction with the users, who originally posted the problems.

Download Full-text

Improving the Accuracy of Integer Signedness Error Detection Using Data Flow Analysis

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015400331 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1573-1593

Author(s):

Hao Sun ◽

Chao Su ◽

Yue Wang ◽

Qingkai Zeng

Keyword(s):

Real World ◽

Error Detection ◽

Data Flow ◽

Flow Analysis ◽

Data Flow Analysis ◽

Static Data ◽

Error Detector ◽

Important Challenge ◽

Using Data ◽

Security Checks

Integer signedness errors can be exploited by adversaries to cause severe damages to computer systems. Despite the significant advances in automating the detection of integer signedness errors, accurately differentiating exploitable and harmful signedness errors from unharmful ones is an important challenge. In this paper, we present the design and implementation of SignFlow, an instrumentation-based integer signedness error detector to reduce the reports for unharmful signedness errors. SignFlow first utilizes static data flow analysis to identify unharmful integer sign conversions from the view of where the source operands originate and whether the conversion results can propagate to security-related program points, and then inserts security checks for the remaining conversions so as to accomplish runtime protection. We evaluated SignFlow on 8 real-world harmful integer signedness bugs, SPECint 2006 benchmarks together with 5 real-world applications. The experimental results show that SignFlow correctly detected all harmful integer signedness bugs (i.e. no false negatives) and achieved a reduction of 41% in false positives over IntFlow, the state of the art.

Download Full-text

End-to-End Multi-Perspective Matching for Entity Resolution

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/689 ◽

2019 ◽

Cited By ~ 2

Author(s):

Cheng Fu ◽

Xianpei Han ◽

Le Sun ◽

Bo Chen ◽

Wei Zhang ◽

...

Keyword(s):

Real World ◽

Error Propagation ◽

Similarity Measures ◽

Entity Resolution ◽

Similarity Learning ◽

Entity Matching ◽

Main Challenge ◽

End To End ◽

Real World Datasets ◽

Selection Algorithms

Entity resolution (ER) aims to identify data records referring to the same real-world entity. Due to the heterogeneity of entity attributes and the diversity of similarity measures, one main challenge of ER is how to select appropriate similarity measures for different attributes. Previous ER methods usually employ heuristic similarity selection algorithms, which are highly specialized to specific ER problems and are hard to be generalized to other situations. Furthermore, previous studies usually perform similarity learning and similarity selection independently, which often result in error propagation and are hard to be optimized globally. To resolve the above problems, this paper proposes an end-to-end multi-perspective entity matching model, which can adaptively select optimal similarity measures for heterogenous attributes by jointly learning and selecting similarity measures in an end-to-end way. Experiments on two real-world datasets show that our method significantly outperforms previous ER methods.

Download Full-text

Soil/Tire Interaction Analysis Using FEM and FVM

Tire Science and Technology ◽

10.2346/1.2186786 ◽

2005 ◽

Vol 33 (1) ◽

pp. 38-62 ◽

Cited By ~ 17

Author(s):

S. Oida ◽

E. Seta ◽

H. Heguri ◽

K. Kato

Keyword(s):

Large Scale ◽

Interaction Analysis ◽

Yield Criterion ◽

Surrounding Medium ◽

Flow Analysis ◽

Measurement Data ◽

Traction Force ◽

Elastoplastic Material ◽

The Road ◽

Soil Flow

Abstract Vehicles, such as an agricultural tractor, construction vehicle, mobile machinery, and 4-wheel drive vehicle, are often operated on unpaved ground. In many cases, the ground is deformable; therefore, the deformation should be taken into consideration in order to assess the off-the-road performance of a tire. Recent progress in computational mechanics enabled us to simulate the large scale coupling problem, in which the deformation of tire structure and of surrounding medium can be interactively considered. Using this technology, hydroplaning phenomena and tire traction on snow have been predicted. In this paper, the simulation methodology of tire/soil coupling problems is developed for pneumatic tires of arbitrary tread patterns. The Finite Element Method (FEM) and the Finite Volume Method (FVM) are used for structural and for soil-flow analysis, respectively. The soil is modeled as an elastoplastic material with a specified yield criterion and a nonlinear elasticity. The material constants are referred to measurement data, so that the cone penetration resistance and the shear resistance are represented. Finally, the traction force of the tire in a cultivated field is predicted, and a good correlation with experiments is obtained.

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

A novel digital approach to describe real world outcomes among patients with constipation

npj Digital Medicine ◽

10.1038/s41746-021-00391-x ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Allison Shapiro ◽

Benjamin Bradshaw ◽

Sabine Landes ◽

Petra Kammann ◽

Beatrice Bois De Fer ◽

...

Keyword(s):

Heart Rate ◽

Prospective Study ◽

Real World ◽

Symptom Severity ◽

Interaction Analysis ◽

Behavioral Activity ◽

Patient Centered ◽

Small Magnitude ◽

Digital Devices ◽

Bowel Movements

AbstractUnderstanding day-to-day variations in symptoms and medication management can be important in describing patient centered outcomes for people with constipation. Patient Generated Health Data (PGHD) from digital devices is a potential solution, but its utility as a tool for describing experiences of people with frequent constipation is unknown. We conducted a virtual, 16-week prospective study of individuals with frequent constipation from an online wellness platform that connects mobile consumer digital devices including wearable monitors capable of passively collecting steps, sleep, and heart rate data. Participants wore a Fitbit monitoring device for the study duration and were administered daily and monthly surveys assessing constipation symptom severity and medication usage. A set of 38 predetermined day-level behavioral activity metrics were computed from minute-level data streams for steps, sleep and heart rate. Mixed effects regression models were used to compare activity metrics between constipation status (irregular or constipated vs. regular day), medication use (medication day vs. non-medication day) and the interaction of medication day with irregular or constipation days, as well as to model likelihood to treat with constipation medications based on daily self-reported symptom severity. Correction for multiple comparisons was performed with the Benjamini–Hochberg procedure for false discovery rate. This study analyzed 1540 enrolled participants with completed daily surveys (mean age 36.6 sd 10.0, 72.8% female, 88.8% Caucasian). Of those, 1293 completed all monthly surveys and 756 had sufficient Fitbit data density for analysis of activity metrics. At a daily-level, 22 of the 38 activity metrics were significantly associated with bowel movement or medication treatment patterns for constipation. Participants were measured to have fewer steps on irregular days compared to regular days (−200 steps, 95% CI [−280, −120]), longer periods of inactivity on constipated days (9.1 min, 95% CI [5.2, 12.9]), reduced total sleep time on irregular and constipated days (−2.4 min, 95% CI [−4.3, −0.4] and −4.0 min, 95% CI [−6.5, −1.4], respectively). Participants reported greater severity of symptoms for bloating, hard stool, difficulty passing, and painful bowel movements on irregular, constipation and medication days compared to regular days with no medication. Interaction analysis of medication days with irregular or constipation days observed small increases in severity compared to non-medication days. Participants were 4.3% (95% CI 3.2, 5.3) more likely to treat with medication on constipated days versus regular. No significant increase in likelihood was observed for irregular days. Daily likelihood to treat increased for each 1-point change in symptom severity of bloating (2.4%, 95% CI [2.0, 2.7]), inability to pass (2.2%, 95% CI [1.4, 3.0]) and incomplete bowel movements (1.3%, 95% CI [0.9, 1.7]). This is the first large scale virtual prospective study describing the association between passively collected PGHD and constipation symptoms and severity at a day-to-day granularity level. Constipation status, irregular or constipated, was associated with a number of activity metrics in steps and sleep, and likelihood to treat with medication increased with increasing severity for a number of constipation symptoms. Given the small magnitude of effect, further research is needed to understand the clinical relevance of these results. PGHD may be useful as a tool for describing real world patient centered experiences for people with constipation.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Overlapping Community Detection Based on Attribute Augmented Graph

Entropy ◽

10.3390/e23060680 ◽

2021 ◽

Vol 23 (6) ◽

pp. 680

Author(s):

Hanyang Lin ◽

Yongzhao Zhan ◽

Zizheng Zhao ◽

Yuzhong Chen ◽

Chen Dong

Keyword(s):

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Overlapping Community Detection ◽

Overlapping Communities ◽

Adjustment Strategy ◽

Topology Information ◽

Overlapping Community ◽

Real World Datasets ◽

Community Detection Algorithm

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.

Download Full-text

Review Summary Generation in Online Systems: Frameworks for Supervised and Unsupervised Scenarios

ACM Transactions on the Web ◽

10.1145/3448015 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-33

Author(s):

Wenjun Jiang ◽

Jing Chen ◽

Xiaofei Ding ◽

Jie Wu ◽

Jiawei He ◽

...

Keyword(s):

Decision Making ◽

Real World ◽

Text Summarization ◽

Experimental Results ◽

Product Review ◽

Comprehensive Review ◽

Online Systems ◽

Real World Datasets ◽

Different Characteristics

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.

Download Full-text