Towards plug-and-play visual graph query interfaces

Canned patterns ( i.e. , small subgraph patterns) in visual graph query interfaces (a.k.a GUI) facilitate efficient query formulation by enabling pattern-at-a-time construction mode. However, existing GUIS for querying large networks either do not expose any canned patterns or if they do then they are typically selected manually based on domain knowledge. Unfortunately, manual generation of canned patterns is not only labor intensive but may also lack diversity for supporting efficient visual formulation of a wide range of subgraph queries. In this paper, we present a novel, generic, and extensible framework called TATTOO that takes a data-driven approach to automatically select canned patterns for a GUI from large networks. Specifically, it first decomposes the underlying network into truss-infested and truss-oblivious regions. Then candidate canned patterns capturing different real-world query topologies are generated from these regions. Canned patterns based on a user-specified plug are then selected for the GUI from these candidates by maximizing coverage and diversity , and by minimizing the cognitive load of the pattern set. Experimental studies with real-world datasets demonstrate the benefits of TATTOO. Importantly, this work takes a concrete step towards realizing plug-and-play visual graph query interfaces for large networks.

Download Full-text

A Grey-Box Ensemble Model Exploiting Black-Box Accuracy and White-Box Intrinsic Interpretability

Algorithms ◽

10.3390/a13010017 ◽

2020 ◽

Vol 13 (1) ◽

pp. 17 ◽

Cited By ~ 4

Author(s):

Emmanuel Pintelas ◽

Ioannis E. Livieris ◽

Panagiotis Pintelas

Keyword(s):

Machine Learning ◽

Real World ◽

High Performance ◽

Black Box ◽

Box Model ◽

Proposed Model ◽

Wide Range ◽

Critical Issues ◽

Key Factor ◽

Real World Datasets

Machine learning has emerged as a key factor in many technological and scientific advances and applications. Much research has been devoted to developing high performance machine learning models, which are able to make very accurate predictions and decisions on a wide range of applications. Nevertheless, we still seek to understand and explain how these models work and make decisions. Explainability and interpretability in machine learning is a significant issue, since in most of real-world problems it is considered essential to understand and explain the model’s prediction mechanism in order to trust it and make decisions on critical issues. In this study, we developed a Grey-Box model based on semi-supervised methodology utilizing a self-training framework. The main objective of this work is the development of a both interpretable and accurate machine learning model, although this is a complex and challenging task. The proposed model was evaluated on a variety of real world datasets from the crucial application domains of education, finance and medicine. Our results demonstrate the efficiency of the proposed model performing comparable to a Black-Box and considerably outperforming single White-Box models, while at the same time remains as interpretable as a White-Box model.

Download Full-text

Boosting medical diagnostics by pooling independent judgments

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1601827113 ◽

2016 ◽

Vol 113 (31) ◽

pp. 8777-8782 ◽

Cited By ~ 50

Author(s):

Ralf H. J. M. Kurvers ◽

Stefan M. Herzog ◽

Ralph Hertwig ◽

Jens Krause ◽

Patricia A. Carney ◽

...

Keyword(s):

Decision Making ◽

Diagnostic Accuracy ◽

Real World ◽

Collective Intelligence ◽

Medical Diagnostics ◽

Wide Range ◽

Key Factor ◽

Systematic Effects ◽

Real World Datasets ◽

Skin Cancer Detection

Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches.

Download Full-text

Network Structure and Transfer Behaviors Embedding via Deep Prediction Model

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015041 ◽

2019 ◽

Vol 33 ◽

pp. 5041-5048 ◽

Cited By ~ 1

Author(s):

Xin Sun ◽

Zenghui Song ◽

Junyu Dong ◽

Yongbo Yu ◽

Claudia Plant ◽

...

Keyword(s):

Network Structure ◽

Real World ◽

Short Term Memory ◽

Experimental Studies ◽

Local Network ◽

Prediction Ability ◽

Feature Representations ◽

Biased Random Walk ◽

Deep Embedding ◽

Real World Datasets

Network-structured data is becoming increasingly popular in many applications. However, these data present great challenges to feature engineering due to its high non-linearity and sparsity. The issue on how to transfer the link-connected nodes of the huge network into feature representations is critical. As basic properties of the real-world networks, the local and global structure can be reflected by dynamical transfer behaviors from node to node. In this work, we propose a deep embedding framework to preserve the transfer possibilities among the network nodes. We first suggest a degree-weight biased random walk model to capture the transfer behaviors of the network. Then a deep embedding framework is introduced to preserve the transfer possibilities among the nodes. A network structure embedding layer is added into the conventional Long Short-Term Memory Network to utilize its sequence prediction ability. To keep the local network neighborhood, we further perform a Laplacian supervised space optimization on the embedding feature representations. Experimental studies are conducted on various real-world datasets including social networks and citation networks. The results show that the learned representations can be effectively used as features in a variety of tasks, such as clustering, visualization and classification, and achieve promising performance compared with state-of-the-art models.

Download Full-text

AURORA: Data-driven Construction of Visual Graph Query Interfaces for Graph Databases

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3384681 ◽

2020 ◽

Author(s):

Sourav S. Bhowmick ◽

Kai Huang ◽

Huey Eng Chua ◽

Zifeng Yuan ◽

Byron Choi ◽

...

Keyword(s):

Data Driven ◽

Graph Databases ◽

Graph Query ◽

Query Interfaces ◽

Visual Graph

Download Full-text

Pivot-based Maximal Biclique Enumeration

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/492 ◽

2020 ◽

Author(s):

Aman Abidi ◽

Rui Zhou ◽

Lu Chen ◽

Chengfei Liu

Keyword(s):

Real World ◽

Search Space ◽

Index Structure ◽

Performance Study ◽

Clique Enumeration ◽

Wide Range ◽

Real World Applications ◽

Real World Datasets ◽

Search Space Pruning ◽

Extensive Performance

Enumerating maximal bicliques in a bipartite graph is an important problem in data mining, with innumerable real-world applications across different domains such as web community, bioinformatics, etc. Although substantial research has been conducted on this problem, surprisingly, we find that pivot-based search space pruning, which is quite effective in clique enumeration, has not been exploited in biclique scenario. Therefore, in this paper, we explore the pivot-based pruning for biclique enumeration. We propose an algorithm for implementing the pivot-based pruning, powered by an effective index structure Containment Directed Acyclic Graph (CDAG). Meanwhile, existing literature indicates contradictory findings on the order of vertex selection in biclique enumeration. As such, we re-examine the problem and suggest an offline ordering of vertices which expedites the pivot pruning. We conduct an extensive performance study using real-world datasets from a wide range of domains. The experimental results demonstrate that our algorithm is more scalable and outperforms all the existing algorithms across all datasets and can achieve a significant speedup against the previous algorithms.

Download Full-text

MIDAS: Towards Efficient and Effective Maintenance of Canned Patterns in Visual Graph Query Interfaces

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3457251 ◽

2021 ◽

Author(s):

Kai Huang ◽

Huey Eng Chua ◽

Sourav S. Bhowmick ◽

Byron Choi ◽

Shuigeng Zhou

Keyword(s):

Graph Query ◽

Query Interfaces ◽

Visual Graph

Download Full-text

Image Statistics Preserving Encrypt-then-Compress Scheme Dedicated for JPEG Compression Standard

Entropy ◽

10.3390/e23040421 ◽

2021 ◽

Vol 23 (4) ◽

pp. 421

Author(s):

Dariusz Puchala ◽

Kamil Stokfiszewski ◽

Mykhaylo Yatsymirskyy

Keyword(s):

Statistical Analysis ◽

Image Encryption ◽

Experimental Studies ◽

Quality Measures ◽

Input Image ◽

Jpeg Compression ◽

Image Statistics ◽

Compression Stage ◽

Wide Range ◽

The Impact

In this paper, the authors analyze in more details an image encryption scheme, proposed by the authors in their earlier work, which preserves input image statistics and can be used in connection with the JPEG compression standard. The image encryption process takes advantage of fast linear transforms parametrized with private keys and is carried out prior to the compression stage in a way that does not alter those statistical characteristics of the input image that are crucial from the point of view of the subsequent compression. This feature makes the encryption process transparent to the compression stage and enables the JPEG algorithm to maintain its full compression capabilities even though it operates on the encrypted image data. The main advantage of the considered approach is the fact that the JPEG algorithm can be used without any modifications as a part of the encrypt-then-compress image processing framework. The paper includes a detailed mathematical model of the examined scheme allowing for theoretical analysis of the impact of the image encryption step on the effectiveness of the compression process. The combinatorial and statistical analysis of the encryption process is also included and it allows to evaluate its cryptographic strength. In addition, the paper considers several practical use-case scenarios with different characteristics of the compression and encryption stages. The final part of the paper contains the additional results of the experimental studies regarding general effectiveness of the presented scheme. The results show that for a wide range of compression ratios the considered scheme performs comparably to the JPEG algorithm alone, that is, without the encryption stage, in terms of the quality measures of reconstructed images. Moreover, the results of statistical analysis as well as those obtained with generally approved quality measures of image cryptographic systems, prove high strength and efficiency of the scheme’s encryption stage.

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Overlapping Community Detection Based on Attribute Augmented Graph

Entropy ◽

10.3390/e23060680 ◽

2021 ◽

Vol 23 (6) ◽

pp. 680

Author(s):

Hanyang Lin ◽

Yongzhao Zhan ◽

Zizheng Zhao ◽

Yuzhong Chen ◽

Chen Dong

Keyword(s):

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Overlapping Community Detection ◽

Overlapping Communities ◽

Adjustment Strategy ◽

Topology Information ◽

Overlapping Community ◽

Real World Datasets ◽

Community Detection Algorithm

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.

Download Full-text