Large-Scale Multi-Modal Data Exploration with Human in the Loop

Author(s):  
Guangchen Ruan ◽  
Hui Zhang
2020 ◽  
Vol 60 ◽  
pp. 100546
Author(s):  
Petar Ristoski ◽  
Anna Lisa Gentile ◽  
Alfredo Alba ◽  
Daniel Gruhl ◽  
Steven Welch

2020 ◽  
Author(s):  
Victor S. Bursztyn ◽  
Jonas Dias ◽  
Marta Mattoso

One major challenge in large-scale experiments is the analytical capacity to contrast ongoing results with domain knowledge. We approach this challenge by constructing a domain-specific knowledge base, which is queried during workflow execution. We introduce K-Chiron, an integrated solution that combines a state-of-the-art automatic knowledge base construction (KBC) system to Chiron, a well-established workflow engine. In this work we experiment in the context of Political Sciences to show how KBC may be used to improve human-in-the-loop (HIL) support in scientific experiments. While HIL in traditional domain expert supervision is done offline, in K-Chiron it is done online, i.e. at runtime. We achieve results in less laborious ways, to the point of enabling a breed of experiments that could be unfeasible with traditional HIL. Finally, we show how provenance data could be leveraged with KBC to enable further experimentation in more dynamic settings.


Author(s):  
Ali Salim Rasheed ◽  
Davood Zabihzadeh ◽  
Sumia Abdulhussien Razooqi Al-Obaidi

Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.


2019 ◽  
Author(s):  
Joris Roels ◽  
Frank Vernaillen ◽  
Anna Kremer ◽  
Amanda Gonçalves ◽  
Jan Aelterman ◽  
...  

ABSTRACTThe recent advent of 3D in Electron Microscopy (EM) has allowed for detection of detailed sub-cellular nanometer resolution structures. While being a scientific breakthrough, this has also caused an explosion in dataset size, necessitating the development of automated workflows. Automated workflows typically benefit reproducibility and throughput compared to manual analysis. The risk of automation is that it ignores the expertise of the microscopy user that comes with manual analysis. To mitigate this risk, this paper presents a hybrid paradigm. We propose a ‘human-in-the-loop’ (HITL) approach that combines expert microscopy knowledge with the power of large-scale parallel computing to improve EM image quality through advanced image restoration algorithms. An interactive graphical user interface, publicly available as an ImageJ plugin, was developed to allow biologists to use our framework in an intuitive and user-friendly fashion. We show that this plugin improves visualization of EM ultrastructure and subsequent (semi-)automated segmentation and image analysis.


2020 ◽  
Author(s):  
Wilfried Yves Hamilton Adoni ◽  
Tarik Nahhal ◽  
Moez Krichen ◽  
Abdeltif El byed ◽  
Ismail Assayad

Abstract Big graphs are part of the movement of "Not Only SQL" databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP.


2019 ◽  
Author(s):  
Marcelo Cicconet ◽  
Daniel R. Hochbaum

AbstractImmunostaining of brain slices is a ubiquitous technique used throughout neuroscience for the purposes of understanding the anatomical and molecular characteristics of brain circuits. Yet the variety of distortions introduced, and the manual nature of the preparation, hinder the use of the generated images from being rigorously quantified; instead most registration of brain slices is done laboriously by hand. Existing automated registration methods rarely make use of geometric shape information. When registering anterior-posterior brain slices, for example, small errors between consecutive planes accumulate, causing the symmetry axis of a plane to drift away from its starting position as depth increases. Furthermore, planes with imaging artifacts – e.g. one half of the slice is missing – can cause large errors, which are difficult to fix by changing global parameters. In this work we describe a method in which we register a set of consecutive brain slices enforcing all slices to have a vertical axis of symmetry, and then pair these slices optimally to planes from the Allen Mouse Brain Atlas via Dynamic Programming. The pipeline offers multiple human-in-the-loop opportunities, allowing users to fix algorithmic errors in various stages, including symmetry detection and pairwise assignment, via custom graphical interfaces. This pipeline enables large-scale analysis of brain slices, allowing this common technique to be used to generate quantitative datasets.


Sign in / Sign up

Export Citation Format

Share Document