scholarly journals Image model embeddings for digital pathology and drug development via self-supervised learning

2021 ◽  
Author(s):  
Khan Baykaner ◽  
Mona Xu ◽  
Lucas Bordeaux ◽  
Feng Gu ◽  
Balaji Selvaraj ◽  
...  

ABSTRACTWhole slide images (WSIs) contain rich pathology information which can be used to diagnose cancer, characterize the tumour microenvironment (TME), assess patient prognosis, and provide insights into the likelihood of whether a patient may respond to a given treatment. However, since WSI availability is generally scarce during early stage clinical trials, the applicability of deep learning models to new and ongoing drug development in early stages is typically limited. WSIs available in public repositories, such as The Cancer Genome Atlas (TCGA), enable an unsupervised pretraining approach to help alleviate data scarcity. Pretrained models can also be utilised for a range of downstream applications such as automated annotation, quality control (QC), and similar image search.In this work we present DIME (Drug-development Image Model Embeddings), a pipeline for training image patch embeddings for WSIs via self-supervised learning. We compare inpainting and contrastive learning approaches for embedding training in the DIME pipeline, and demonstrate state-of-the-art performance at image patch clustering. In addition, we show that the resultant embeddings allow for training effective downstream patch classifiers with relatively few WSIs, and apply this to an AstraZeneca-sponsored phase III clinical trial. We also highlight the importance of effective colour normalisation for implementing histopathology analysis pipelines, regardless of the core learning algorithm. Finally, we show via subjective exploration of embedding spaces that the DIME pipeline clusters interesting histopathological artefacts, suggesting a possible role for the method in QC pipelines. By clustering image patches according to underlying morphopathologic features, DIME supports subsequent qualitative exploration by pathologists and has the potential to inform and expediate biomarker discovery and drug development.

Author(s):  
Klym Yamkovyi

The paper is dedicated to the development and comparative experimental analysis of semi-supervised learning approaches based on a mix of unsupervised and supervised approaches for the classification of datasets with a small amount of labeled data, namely, identifying to which of a set of categories a new observation belongs using a training set of data containing observations whose category membership is known. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Unlabeled data, when used in combination with a small quantity of labeled data, can produce significant improvement in learning accuracy. The goal is semi-supervised methods development and analysis along with comparing their accuracy and robustness on different synthetics datasets. The proposed approach is based on the unsupervised K-medoids methods, also known as the Partitioning Around Medoid algorithm, however, unlike Kmedoids the proposed algorithm first calculates medoids using only labeled data and next process unlabeled classes – assign labels of nearest medoid. Another proposed approach is the mix of the supervised method of K-nearest neighbor and unsupervised K-Means. Thus, the proposed learning algorithm uses information about both the nearest points and classes centers of mass. The methods have been implemented using Python programming language and experimentally investigated for solving classification problems using datasets with different distribution and spatial characteristics. Datasets were generated using the scikit-learn library. Was compared the developed approaches to find average accuracy on all these datasets. It was shown, that even small amounts of labeled data allow us to use semi-supervised learning, and proposed modifications ensure to improve accuracy and algorithm performance, which was demonstrated during experiments. And with the increase of available label information accuracy of the algorithms grows up. Thus, the developed algorithms are using a distance metric that considers available label information. Keywords: Unsupervised learning, supervised learning. semi-supervised learning, clustering, distance, distance function, nearest neighbor, medoid, center of mass.


Author(s):  
Rene Grzeszick ◽  
Gernot A. Fink

Labeling images is tedious and a costly work that is required for many applications, for example, tagging, grouping and exploring of image collections. It is also necessary for training visual classifiers that recognize scenes or objects. It is therefore desirable to either reduce the human effort or infer additional knowledge by addressing this task with algorithms that allow for learning image annotations in a semi-supervised manner. In this paper, a semi-supervised annotation learning algorithm is introduced that is based on partitioning the data in a multi-view approach. The method is applied to large, diverse image collections of natural scene images. Experiments are performed on the 15 Scenes and SUN databases. It is shown that for sparsely labeled datasets the proposed annotation learning algorithm is able to infer additional knowledge from the unlabeled samples and therefore improve the performance of visual classifiers in comparison to supervised learning. Furthermore, the proposed algorithm outperforms other related semi-supervised learning approaches.


2018 ◽  
Vol 25 ◽  
pp. 57-66
Author(s):  
MM Rana ◽  
MN Hasan ◽  
MS Ahmed ◽  
MNH Mollah

In the early stage of drug development process, it is urgent to judge the toxicity effect of some common chemical compounds (CCs) that is not yet well investigated. Biomarker genes (BGs) and dose of CCs can help to draw a deduction about a drug for safety assessment. Classical toxicology method uses large number of samples to extract clinical results which is both time consuming and costly. However, conventional molecular methods can perform to identify only BGs and fail to detect source factor influencing these BGs. The aim of this study is to propose a suitable algorithm that can identify more promising and essential toxicity biomarkers related to some common CCs for safety assessment of new drugs. The glutathione is an effective metabolite of detoxification process in liver. Glutathione depletion analysis is one of the major key research areas in drug development pipeline. In this paper, we studied glutathione depletion analysis of some reported CCs (acetaminophen, methapyrilene and nitrofurazone). We develop an algorithm combining ANOVA and principal component analysis (PCA) using visualization technique to find biomarker genes and associated glutathione depleting CCs and their corresponding doses. There are numerous numbers of genes in the glutathione metabolism pathway regulated as differentially expressed (DE) genes due to the toxic effect of these CCs and proposed algorithm identify only five genes (Mgst2, Gclc, G6pd, Gsr and Srm) that are also foremost genes in the glutathione metabolism pathway. Proposed algorithm states that high dose of all the CCs are responsible for glutathione depletion, nevertheless middle dose of acetaminophen and nitrofurazone also cause glutathione depletion. The proposed algorithm has an additional benefit over the conventional method to discover new chemical entities toxicity.J. bio-sci. 25: 57-66, 2017


2020 ◽  
Vol 21 (7) ◽  
pp. 647-656
Author(s):  
Steven L. Gonias ◽  
Carlotta Zampieri

The major proteases that constitute the fibrinolysis system are tightly regulated. Protease inhibitors target plasmin, the protease responsible for fibrin degradation, and the proteases that convert plasminogen into plasmin, including tissue-type plasminogen activator (tPA) and urokinase-type plasminogen activator (uPA). A second mechanism by which fibrinolysis is regulated involves exosite interactions, which localize plasminogen and its activators to fibrin, extracellular matrix (ECM) proteins, and cell surfaces. Once plasmin is generated in association with cell surfaces, it may cleave transmembrane proteins, activate growth factors, release growth factors from ECM proteins, remodel ECM, activate metalloproteases, and trigger cell-signaling by cleaving receptors in the Proteaseactivated Receptor (PAR) family. These processes are all implicated in cancer. It is thus not surprising that a family of structurally diverse but functionally similar cell-surface proteins, called Plasminogen Receptors (PlgRs), which increase the catalytic efficiency of plasminogen activation, have received attention for their possible function in cancer and as targets for anticancer drug development. In this review, we consider four previously described PlgRs, including: α-enolase, annexin-A2, Plg-RKT, and cytokeratin-8, in human cancer. To compare the PlgRs, we mined transcriptome profiling data from The Cancer Genome Atlas (TCGA) and searched for correlations between PlgR expression and patient survival. In glioma, the expression of specific PlgRs correlates with tumor grade. In a number of malignancies, including glioblastoma and liver cancer, increased expression of α-enolase or annexin-A2 is associated with an unfavorable prognosis. Whether these correlations reflect the function of PlgRs as receptors for plasminogen or other activities is discussed.


Author(s):  
Tanay Dalvi ◽  
Bhaskar Dewangan ◽  
Rudradip Das ◽  
Jyoti Rani ◽  
Suchita Dattatray Shinde ◽  
...  

: The most common reason behind dementia is Alzheimer’s disease (AD) and it is predicted to be the third lifethreatening disease apart from stroke and cancer for the geriatric population. Till now only four drugs are available in the market for symptomatic relief. The complex nature of disease pathophysiology and lack of concrete evidences of molecular targets are the major hurdles for developing new drug to treat AD. The the rate of attrition of many advanced drugs at clinical stages, makes the de novo discovery process very expensive. Alternatively, Drug Repurposing (DR) is an attractive tool to develop drugs for AD in a less tedious and economic way. Therefore, continuous efforts are being made to develop a new drug for AD by repursing old drugs through screening and data mining. For example, the survey in the drug pipeline for Phase III clinical trials (till February 2019) which has 27 candidates, and around half of the number are drugs which have already been approved for other indications. Although in the past the drug repurposing process for AD has been reviewed in the context of disease areas, molecular targets, there is no systematic review of repurposed drugs for AD from the recent drug development pipeline (2019-2020). In this manuscript, we are reviewing the clinical candidates for AD with emphasis on their development history including molecular targets and the relevance of the target for AD.


Author(s):  
Dan Luo

Background: As known that the semi-supervised algorithm is a classical algorithm in semi-supervised learning algorithm. Methods: In the paper, it proposed improved cooperative semi-supervised learning algorithm, and the algorithm process is presented in detailed, and it is adopted to predict unlabeled electronic components image. Results: In the experiments of classification and recognition of electronic components, it show that through the method the accuracy the proposed algorithm in electron device image recognition can be significantly improved, the improved algorithm can be used in the actual recognition process . Conclusion: With the continuous development of science and technology, machine vision and deep learning will play a more important role in people's life in the future. The subject research based on the identification of the number of components is bound to develop towards the direction of high precision and multi-dimension, which will greatly improve the production efficiency of electronic components industry.


2019 ◽  
Vol 20 (22) ◽  
pp. 5697 ◽  
Author(s):  
Michelle E. Pewarchuk ◽  
Mateus C. Barros-Filho ◽  
Brenda C. Minatel ◽  
David E. Cohn ◽  
Florian Guisier ◽  
...  

Recent studies have uncovered microRNAs (miRNAs) that have been overlooked in early genomic explorations, which show remarkable tissue- and context-specific expression. Here, we aim to identify and characterize previously unannotated miRNAs expressed in gastric adenocarcinoma (GA). Raw small RNA-sequencing data were analyzed using the miRMaster platform to predict and quantify previously unannotated miRNAs. A discovery cohort of 475 gastric samples (434 GA and 41 adjacent nonmalignant samples), collected by The Cancer Genome Atlas (TCGA), were evaluated. Candidate miRNAs were similarly assessed in an independent cohort of 25 gastric samples. We discovered 170 previously unannotated miRNA candidates expressed in gastric tissues. The expression of these novel miRNAs was highly specific to the gastric samples, 143 of which were significantly deregulated between tumor and nonmalignant contexts (p-adjusted < 0.05; fold change > 1.5). Multivariate survival analyses showed that the combined expression of one previously annotated miRNA and two novel miRNA candidates was significantly predictive of patient outcome. Further, the expression of these three miRNAs was able to stratify patients into three distinct prognostic groups (p = 0.00003). These novel miRNAs were also present in the independent cohort (43 sequences detected in both cohorts). Our findings uncover novel miRNA transcripts in gastric tissues that may have implications in the biology and management of gastric adenocarcinoma.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
J-J Stelmes ◽  
E. Vu ◽  
V. Grégoire ◽  
C. Simon ◽  
E. Clementel ◽  
...  

Abstract Introduction The current phase III EORTC 1420 Best-of trial (NCT02984410) compares the swallowing function after transoral surgery versus intensity modulated radiotherapy (RT) in patients with early-stage carcinoma of the oropharynx, supraglottis and hypopharynx. We report the analysis of the Benchmark Case (BC) procedures before patient recruitment with special attention to dysphagia/aspiration related structures (DARS). Materials and methods Submitted RT volumes and plans from participating centers were analyzed and compared against the gold-standard expert delineations and dose distributions. Descriptive analysis of protocol deviations was conducted. Mean Sorensen-Dice similarity index (mDSI) and Hausdorff distance (mHD) were applied to evaluate the inter-observer variability (IOV). Results 65% (23/35) of the institutions needed more than one submission to achieve Quality assurance (RTQA) clearance. OAR volume delineations were the cause for rejection in 53% (40/76) of cases. IOV could be improved in 5 out of 12 OARs by more than 10 mm after resubmission (mHD). Despite this, final IOV for critical OARs in delineation remained significant among DARS by choosing an aleatory threshold of 0.7 (mDSI) and 15 mm (mHD). Conclusions This is to our knowledge the largest BC analysis among Head and neck RTQA programs performed in the framework of a prospective trial. Benchmarking identified non-common OARs and target delineations errors as the main source of deviations and IOV could be reduced in a significant number of cases after this process. Due to the substantial resources involved with benchmarking, future benchmark analyses should assess fully the impact on patients’ clinical outcome.


Sign in / Sign up

Export Citation Format

Share Document