scholarly journals Deep Learning of Histopathology Images at the Single Cell Level

2021 ◽  
Vol 4 ◽  
Author(s):  
Kyubum Lee ◽  
John H. Lockhart ◽  
Mengyu Xie ◽  
Ritu Chaudhary ◽  
Robbert J. C. Slebos ◽  
...  

The tumor immune microenvironment (TIME) encompasses many heterogeneous cell types that engage in extensive crosstalk among the cancer, immune, and stromal components. The spatial organization of these different cell types in TIME could be used as biomarkers for predicting drug responses, prognosis and metastasis. Recently, deep learning approaches have been widely used for digital histopathology images for cancer diagnoses and prognoses. Furthermore, some recent approaches have attempted to integrate spatial and molecular omics data to better characterize the TIME. In this review we focus on machine learning-based digital histopathology image analysis methods for characterizing tumor ecosystem. In this review, we will consider three different scales of histopathological analyses that machine learning can operate within: whole slide image (WSI)-level, region of interest (ROI)-level, and cell-level. We will systematically review the various machine learning methods in these three scales with a focus on cell-level analysis. We will provide a perspective of workflow on generating cell-level training data sets using immunohistochemistry markers to “weakly-label” the cell types. We will describe some common steps in the workflow of preparing the data, as well as some limitations of this approach. Finally, we will discuss future opportunities of integrating molecular omics data with digital histopathology images for characterizing tumor ecosystem.

2021 ◽  
Author(s):  
Raul Rodriguez-Esteban ◽  
José Duarte ◽  
Priscila C. Teixeira ◽  
Fabien Richard ◽  
Svetlana Koltsova ◽  
...  

AbstractBackgroundA key step in clinical flow cytometry data analysis is gating, which involves the identification of cell populations. The process of gating produces a set of reportable results, which are typically described by gating definitions. The non-standardized, non-interpreted nature of gating definitions represents a hurdle for data interpretation and data sharing across and within organizations. Interpreting and standardizing gating definitions for subsequent analysis of gating results requires a curation effort from experts. Machine learning approaches have the potential to help in this process by predicting expert annotations associated with gating definitions.MethodsWe created a gold-standard dataset by manually annotating thousands of gating definitions with cell type and functional marker annotations. We used this dataset to train and test a machine learning pipeline able to predict standard cell types and functional marker genes associated with gating definitions.ResultsThe machine learning pipeline predicted annotations with high accuracy for both cell types and functional marker genes. Accuracy was lower for gating definitions from assays belonging to laboratories from which limited or no prior data was available in the training. Manual error review ensured that resulting predicted annotations could be reused subsequently as additional gold-standard training data.ConclusionsMachine learning methods are able to consistently predict annotations associated with gating definitions from flow cytometry assays. However, a hybrid automatic and manual annotation workflow would be recommended to achieve optimal results.


2019 ◽  
Author(s):  
Olivier Poirion ◽  
Kumardeep Chaudhary ◽  
Sijia Huang ◽  
Lana X. Garmire

AbstractBackgroundPrognosis (survival) prediction of patients is important for disease management. Multi-omics data are good resources for survival prediction, however, difficult to integrate computationally.ResultsWe introduce DeepProg, a new computational framework that robustly predicts patient survival subtypes based on multiple types of omic data. It employs an ensemble of deep-learning and machine-learning approaches to achieve high performance. We apply DeepProg on 32 cancer datasets from TCGA and discover that most cancers have two optimal survival subtypes. Patient survival risk-stratification using DeepProg is significantly better than another multi-omics data integration method called Similarity Network Fusion (p-value=7.9e-7). DeepProg shows excellent predictive accuracy in external validation cohorts, exemplified by 2 liver cancer (C-index 0.73 and 0.80) and five breast cancer datasets (C-index 0.68-0.73). Further comprehensive pan-cancer analysis unveils the genomic signatures common among all the poorest survival subtypes, with genes enriched in extracellular matrix modeling, immune deregulation, and mitosis processes.ConclusionsDeepProg is a powerful and generic computational framework to predict patient survival risks. DeepProg is freely available for non-commercial use at: http://garmiregroup.org/DeepProg


2019 ◽  
Author(s):  
Niklas D. Köhler ◽  
Maren Büttner ◽  
Fabian J. Theis

AbstractDeep learning has revolutionized image analysis and natural language processing with remarkable accuracies in prediction tasks, such as image labeling or word identification. The origin of this revolution was arguably the deep learning approach by the Hinton lab in 2012, which halved the error rate of existing classifiers in the then 2-year-old ImageNet database1. In hindsight, the combination of algorithmic and hardware advances with the appearance of large and well-labeled datasets has led up to this seminal contribution.The emergence of large amounts of data from single-cell RNA-seq and the recent global effort to chart all cell types in the Human Cell Atlas has attracted an interest in deep-learning applications. However, all current approaches are unsupervised, i.e., learning of latent spaces without using any cell labels, even though supervised learning approaches are often more powerful in feature learning and the most popular approach in the current AI revolution by far.Here, we ask why this is the case. In particular we ask whether supervised deep learning can be used for cell annotation, i.e. to predict cell-type labels from single-cell gene expression profiles. After evaluating 6 classification methods across 14 datasets, we notably find that deep learning does not outperform classical machine-learning methods in the task. Thus, cell-type prediction based on gene-signature derived cell-type labels is potentially too simplistic a task for complex non-linear methods, which demands better labels of functional single-cell readouts. We, therefore, are still waiting for the “ImageNet moment” in single-cell genomics.


2021 ◽  
Author(s):  
Christoph Flamm ◽  
Julia Wielach ◽  
Michael T. Wolfinger ◽  
Stefan Badelt ◽  
Ronny Lorenz ◽  
...  

Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. These methods even yield exact solutions under certain simplifying assumptions. Nevertheless, the accuracy of these classical methods is limited and has seen little improvement over the last decade. This makes it an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data that can not only be generated in arbitrary amounts, but are also guaranteed to be free of biases. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Olivier B. Poirion ◽  
Zheng Jing ◽  
Kumardeep Chaudhary ◽  
Sijia Huang ◽  
Lana X. Garmire

AbstractMulti-omics data are good resources for prognosis and survival prediction; however, these are difficult to integrate computationally. We introduce DeepProg, a novel ensemble framework of deep-learning and machine-learning approaches that robustly predicts patient survival subtypes using multi-omics data. It identifies two optimal survival subtypes in most cancers and yields significantly better risk-stratification than other multi-omics integration methods. DeepProg is highly predictive, exemplified by two liver cancer (C-index 0.73–0.80) and five breast cancer datasets (C-index 0.68–0.73). Pan-cancer analysis associates common genomic signatures in poor survival subtypes with extracellular matrix modeling, immune deregulation, and mitosis processes. DeepProg is freely available at https://github.com/lanagarmire/DeepProg


2019 ◽  
Vol 11 (3) ◽  
pp. 284 ◽  
Author(s):  
Linglin Zeng ◽  
Shun Hu ◽  
Daxiang Xiang ◽  
Xiang Zhang ◽  
Deren Li ◽  
...  

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2514
Author(s):  
Tharindu Kaluarachchi ◽  
Andrew Reis ◽  
Suranga Nanayakkara

After Deep Learning (DL) regained popularity recently, the Artificial Intelligence (AI) or Machine Learning (ML) field is undergoing rapid growth concerning research and real-world application development. Deep Learning has generated complexities in algorithms, and researchers and users have raised concerns regarding the usability and adoptability of Deep Learning systems. These concerns, coupled with the increasing human-AI interactions, have created the emerging field that is Human-Centered Machine Learning (HCML). We present this review paper as an overview and analysis of existing work in HCML related to DL. Firstly, we collaborated with field domain experts to develop a working definition for HCML. Secondly, through a systematic literature review, we analyze and classify 162 publications that fall within HCML. Our classification is based on aspects including contribution type, application area, and focused human categories. Finally, we analyze the topology of the HCML landscape by identifying research gaps, highlighting conflicting interpretations, addressing current challenges, and presenting future HCML research opportunities.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1694
Author(s):  
Mathew Ashik ◽  
A. Jyothish ◽  
S. Anandaram ◽  
P. Vinod ◽  
Francesco Mercaldo ◽  
...  

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3068
Author(s):  
Soumaya Dghim ◽  
Carlos M. Travieso-González ◽  
Radim Burget

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.


Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2764
Author(s):  
Xin Yu Liew ◽  
Nazia Hameed ◽  
Jeremie Clos

A computer-aided diagnosis (CAD) expert system is a powerful tool to efficiently assist a pathologist in achieving an early diagnosis of breast cancer. This process identifies the presence of cancer in breast tissue samples and the distinct type of cancer stages. In a standard CAD system, the main process involves image pre-processing, segmentation, feature extraction, feature selection, classification, and performance evaluation. In this review paper, we reviewed the existing state-of-the-art machine learning approaches applied at each stage involving conventional methods and deep learning methods, the comparisons within methods, and we provide technical details with advantages and disadvantages. The aims are to investigate the impact of CAD systems using histopathology images, investigate deep learning methods that outperform conventional methods, and provide a summary for future researchers to analyse and improve the existing techniques used. Lastly, we will discuss the research gaps of existing machine learning approaches for implementation and propose future direction guidelines for upcoming researchers.


Sign in / Sign up

Export Citation Format

Share Document