Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data

AbstractFacetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.

Download Full-text

TMOD-29. STANDARDIZED GENERATION OF TUMOR-ORGANOIDS AS NOVEL DRUG SCREENING MODEL IN MENINGIOMA

Neuro-Oncology ◽

10.1093/neuonc/noab196.890 ◽

2021 ◽

Vol 23 (Supplement_6) ◽

pp. vi221-vi222

Author(s):

Gerhard Jungwirth ◽

Tao Yu ◽

Cao Junguo ◽

Catharina Lotsch ◽

Andreas Unterberg ◽

...

Keyword(s):

Drug Screening ◽

Cancer Biology ◽

Drug Testing ◽

Large Scale ◽

Ex Vivo ◽

Cell Types ◽

Cellular Heterogeneity ◽

Drug Responses ◽

Novel Drug ◽

Tumor Organoids

Abstract Tumor-organoids (TOs) are novel, complex three-dimensional ex vivo tissue cultures that under optimal conditions accurately reflect genotype and phenotype of the original tissue with preserved cellular heterogeneity and morphology. They may serve as a new and exciting model for studying cancer biology and directing personalized therapies. The aim of our study was to establish TOs from meningioma (MGM) and to test their usability for large-scale drug screenings. We were capable of forming several hundred TO equal in size by controlled reaggregation of freshly prepared single cell suspension of MGM tissue samples. In total, standardized TOs from 60 patients were formed, including eight grade II and three grade III MGMs. TOs reaggregated within 3 days resulting in a reducted diameter by 50%. Thereafter, TO size remained stable throughout a 14 days observation period. TOs consisted of largely viable cells, whereas dead cells were predominantly found outside of the organoid. H&E stainings confirmed the successful establishment of dense tissue-like structures. Next, we assessed the suitability and reliability of TOs for a robust large-scale drug testing by employing nine highly potent compounds, derived from a drug screening performed on several MGM cell lines. First, we tested if drug responses depend on TO size. Interestingly, drug responses to these drugs remained identical independent of their sizes. Based on a sufficient representation of low abundance cell types such as T-cells and macrophages an overall number of 25.000 cells/TO was selected for further experiments revealing FDA-approved HDAC inhibitors as highly effective drugs in most of the TOs with a mean z-AUC score of -1.33. Taken together, we developed a protocol to generate standardized TO from MGM containing low abundant cell types of the tumor microenvironment in a representative manner. Robust and reliable drug responses suggest patient-derived TOs as a novel drug testing model in meningioma research.

Download Full-text

Trajectory-User Linking via Variational AutoEncoder

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/446 ◽

2018 ◽

Cited By ~ 20

Author(s):

Fan Zhou ◽

Qiang Gao ◽

Goce Trajcevski ◽

Kunpeng Zhang ◽

Ting Zhong ◽

...

Keyword(s):

Latent Variables ◽

Large Scale ◽

Human Mobility ◽

Hierarchical Structures ◽

High Dimensional ◽

Generation Process ◽

Mobility Patterns ◽

Variational Autoencoder ◽

Hidden States ◽

Structural Semantics

Trajectory-User Linking (TUL) is an essential task in Geo-tagged social media (GTSM) applications, enabling personalized Point of Interest (POI) recommendation and activity identification. Existing works on mining mobility patterns often model trajectories using Markov Chains (MC) or recurrent neural networks (RNN) -- either assuming independence between non-adjacent locations or following a shallow generation process. However, most of them ignore the fact that human trajectories are often sparse, high-dimensional and may contain embedded hierarchical structures. We tackle the TUL problem with a semi-supervised learning framework, called TULVAE (TUL via Variational AutoEncoder), which learns the human mobility in a neural generative architecture with stochastic latent variables that span hidden states in RNN. TULVAE alleviates the data sparsity problem by leveraging large-scale unlabeled data and represents the hierarchical and structural semantics of trajectories with high-dimensional latent variables. Our experiments demonstrate that TULVAE improves efficiency and linking performance in real GTSM datasets, in comparison to existing methods.

Download Full-text

Visual Analytics for Deep Embeddings of Large Scale Molecular Dynamics Simulations

10.1101/830844 ◽

2019 ◽

Author(s):

Junghoon Chae ◽

Debsindhu Bhowmik ◽

Heng Ma ◽

Arvind Ramanathan ◽

Chad Steed

Keyword(s):

Molecular Dynamics ◽

Visual Analytics ◽

Large Scale ◽

Physical Phenomenon ◽

Living Organism ◽

Md Simulations ◽

High Dimensional ◽

Molecular Characteristics ◽

Complex Simulation ◽

Dynamics Simulations

AbstractMolecular Dynamics (MD) simulation have been emerging as an excellent candidate for understanding complex atomic and molecular scale mechanism of bio-molecules that control essential bio-physical phenomenon in a living organism. But this MD technique produces large-size and long-timescale data that are inherently high-dimensional and occupies many terabytes of data. Processing this immense amount of data in a meaningful way is becoming increasingly difficult. Therefore, specific dimensionality reduction algorithm using deep learning technique has been employed here to embed the high-dimensional data in a lower-dimension latent space that still preserves the inherent molecular characteristics i.e. retains biologically meaningful information. Subsequently, the results of the embedding models are visualized for model evaluation and analysis of the extracted underlying features. However, most of the existing visualizations for embeddings have limitations in evaluating the embedding models and understanding the complex simulation data. We propose an interactive visual analytics system for embeddings of MD simulations to not only evaluate and explain an embedding model but also analyze various characteristics of the simulations. Our system enables exploration and discovery of meaningful and semantic embedding results and supports the understanding and evaluation of results by the quantitatively described features of the MD simulations (even without specific labels).

Download Full-text

High-Dimensional Image Data Sets Retrieval: Improving Accuracy Using a Weighted Relevance Feedback

International Journal of Semantic Computing ◽

10.1142/s1793351x1540005x ◽

2015 ◽

Vol 09 (02) ◽

pp. 239-259

Author(s):

Abir Gallas ◽

Walid Barhoumi ◽

Ezzeddine Zagrouba

Keyword(s):

Relevance Feedback ◽

Large Scale ◽

Nearest Neighbor ◽

Image Data ◽

High Dimensional ◽

General Context ◽

Query Image ◽

Data Set ◽

Dimensional Image ◽

Minimum Number

The user's interaction with the retrieval engines, while seeking a particular image (or set of images) in large-scale databases, defines better his request. This interaction is essentially provided by a relevance feedback step. In fact, the semantic gap is increasing in a remarkable way due to the application of approximate nearest neighbor (ANN) algorithms aiming at resolving the curse of dimensionality. Therefore, an additional step of relevance feedback is necessary in order to get closer to the user's expectations in the next few retrieval iterations. In this context, this paper details a classification of the different relevance feedback techniques related to region-based image retrieval applications. Moreover, a technique of relevance feedback based on re-weighting regions of the query-image by selecting a set of negative examples is elaborated. Furthermore, the general context to carry out this technique which is the large-scale heterogeneous image collections indexing and retrieval is presented. In fact, the main contribution of the proposed work is affording efficient results with the minimum number of relevance feedback iterations for high dimensional image databases. Experiments and assessments are carried out within an RBIR system for "Wang" data set in order to prove the effectiveness of the proposed approaches.

Download Full-text

Biological data annotation via a human-augmenting AI-based labeling system

npj Digital Medicine ◽

10.1038/s41746-021-00520-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Douwe van der Wal ◽

Iny Jhun ◽

Israa Laklouk ◽

Jeff Nirschl ◽

Lara Richer ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Microscopic Analysis ◽

Image Data ◽

Cell Types ◽

Biological Data ◽

Data Sets ◽

Data Set ◽

Data Annotation ◽

Labeling System

AbstractBiology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.

Download Full-text

DetecTiff©: A Novel Image Analysis Routine for High-Content Screening Microscopy

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057109339523 ◽

2009 ◽

Vol 14 (8) ◽

pp. 944-955 ◽

Cited By ~ 29

Author(s):

Daniel F. Gilbert ◽

Till Meinhof ◽

Rainer Pepperkok ◽

Heiko Runz

Keyword(s):

Image Analysis ◽

Particle Filtering ◽

Large Scale ◽

Image Data ◽

Automated Analysis ◽

Data Sets ◽

Size Dependent ◽

Structure Recognition ◽

Quantitative Image ◽

High Consistency

In this article, the authors describe the image analysis software DetecTiff©, which allows fully automated object recognition and quantification from digital images. The core module of the LabView©-based routine is an algorithm for structure recognition that employs intensity thresholding and size-dependent particle filtering from microscopic images in an iterative manner. Detected structures are converted into templates, which are used for quantitative image analysis. DetecTiff © enables processing of multiple detection channels and provides functions for template organization and fast interpretation of acquired data. The authors demonstrate the applicability of DetecTiff© for automated analysis of cellular uptake of fluorescencelabeled low-density lipoproteins as well as diverse other image data sets from a variety of biomedical applications. Moreover, the performance of DetecTiff© is compared with preexisting image analysis tools. The results show that DetecTiff© can be applied with high consistency for automated quantitative analysis of image data (e.g., from large-scale functional RNAi screening projects). ( Journal of Biomolecular Screening 2009:944-955)

Download Full-text

Exploring Flood Filling Networks for Instance Segmentation of XXL-Volumetric and Bulk Material CT Data

Journal of Nondestructive Evaluation ◽

10.1007/s10921-020-00734-w ◽

2020 ◽

Vol 40 (1) ◽

Author(s):

Roland Gruber ◽

Stefan Gerth ◽

Joelle Claußen ◽

Norbert Wörlein ◽

Norman Uhlmann ◽

...

Keyword(s):

Large Scale ◽

Bulk Material ◽

Automatic Segmentation ◽

Image Data ◽

Real Data ◽

Specific Information ◽

Volume Data ◽

Beam Hardening ◽

Exploration Process ◽

Instance Segmentation

AbstractXXL-Computed Tomography (XXL-CT) is able to produce large scale volume datasets of scanned objects such as crash tested cars, sea and aircraft containers or cultural heritage objects. The acquired image data consists of volumes of up to and above $$\hbox {10,000}^{3}$$ 10,000 3 voxels which can relate up to many terabytes in file size and can contain multiple 10,000 of different entities of depicted objects. In order to extract specific information about these entities from the scanned objects in such vast datasets, segmentation or delineation of these parts is necessary. Due to unknown and varying properties (shapes, densities, materials, compositions) of these objects, as well as interfering acquisition artefacts, classical (automatic) segmentation is usually not feasible. Contrarily, a complete manual delineation is error-prone and time-consuming, and can only be performed by trained and experienced personnel. Hence, an interactive and partial segmentation of so-called “chunks” into tightly coupled assemblies or sub-assemblies may help the assessment, exploration and understanding of such large scale volume data. In order to assist users with such an (possibly interactive) instance segmentation for the data exploration process, we propose to utilize delineation algorithms with an approach derived from flood filling networks. We present primary results of a flood filling network implementation adapted to non-destructive testing applications based on large scale CT from various test objects, as well as real data of an airplane and describe the adaptions to this domain. Furthermore, we address and discuss segmentation challenges due to acquisition artefacts such as scattered radiation or beam hardening resulting in reduced data quality, which can severely impair the interactive segmentation results.

Download Full-text

Multi-aspect visual analytics on large-scale high-dimensional cyber security data

Information Visualization ◽

10.1177/1473871613488573 ◽

2013 ◽

Vol 14 (1) ◽

pp. 62-75 ◽

Cited By ~ 6

Author(s):

Victor Y Chen ◽

Ahmad M Razip ◽

Sungahn Ko ◽

Cheryl Z Qian ◽

David S Ebert

Keyword(s):

Cyber Security ◽

Visual Analytics ◽

Large Scale ◽

Temporal Trends ◽

High Dimensional ◽

Levels Of Detail ◽

Future Improvement ◽

Semantic Zooming ◽

Different Dimensions ◽

Multiple Levels

In this article, we present a visual analytics system, SemanticPrism, which aims to analyze large-scale high-dimensional cyber security datasets containing logs of a million computers. SemanticPrism visualizes the data from three different perspectives: spatiotemporal distribution, overall temporal trends, and pixel-based IP (Internet Protocol) address blocks. With each perspective, we use semantic zooming to present more detailed information. The interlinked visualizations and multiple levels of detail allow us to detect unexpected changes taking place in different dimensions of the data and to identify potential anomalies in the network. After comparing our approach to other submissions, we outline potential paths for future improvement.

Download Full-text

A Supervised Learning Model for High-Dimensional and Large-Scale Data

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/2972957 ◽

2017 ◽

Vol 8 (2) ◽

pp. 1-23 ◽

Cited By ~ 8

Author(s):

Chong Peng ◽

Jie Cheng ◽

Qiang Cheng

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Learning Model ◽

High Dimensional ◽

Large Scale Data ◽

Scale Data

Download Full-text

treekoR: identifying cellular-to-phenotype associations by elucidating hierarchical relationships in high-dimensional cytometry data

Genome Biology ◽

10.1186/s13059-021-02526-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Adam Chan ◽

Wei Jiang ◽

Emily Blyth ◽

Jean Yang ◽

Ellis Patrick

Keyword(s):

Single Cell ◽

Case Studies ◽

High Throughput ◽

Hierarchical Structures ◽

Cell Types ◽

Unsupervised Clustering ◽

High Dimensional ◽

Cell Type ◽

Clustering Techniques ◽

Do So

AbstractHigh-throughput single-cell technologies hold the promise of discovering novel cellular relationships with disease. However, analytical workflows constructed for these technologies to associate cell proportions with disease often employ unsupervised clustering techniques that overlook the valuable hierarchical structures that have been used to define cell types. We present treekoR, a framework that empirically recapitulates these structures, facilitating multiple quantifications and comparisons of cell type proportions. Our results from twelve case studies reinforce the importance of quantifying proportions relative to parent populations in the analyses of cytometry data — as failing to do so can lead to missing important biological insights.

Download Full-text