Are radiomics features universally applicable to different organs?

Abstract Background Many studies have successfully identified radiomics features reflecting macroscale tumor features and tumor microenvironment for various organs. There is an increased interest in applying these radiomics features found in a given organ to other organs. Here, we explored whether common radiomics features could be identified over target organs in vastly different environments. Methods Four datasets of three organs were analyzed. One radiomics model was constructed from the training set (lungs, n = 401), and was further evaluated in three independent test sets spanning three organs (lungs, n = 59; kidneys, n = 48; and brains, n = 43). Intensity histograms derived from the whole organ were compared to establish organ-level differences. We constructed a radiomics score based on selected features using training lung data over the tumor region. A total of 143 features were computed for each tumor. We adopted a feature selection approach that favored stable features, which can also capture survival. The radiomics score was applied to three independent test data from lung, kidney, and brain tumors, and whether the score could be used to separate high- and low-risk groups, was evaluated. Results Each organ showed a distinct pattern in the histogram and the derived parameters (mean and median) at the organ-level. The radiomics score trained from the lung data of the tumor region included seven features, and the score was only effective in stratifying survival for other lung data, not in other organs such as the kidney and brain. Eliminating the lung-specific feature (2.5 percentile) from the radiomics score led to similar results. There were no common features between training and test sets, but a common category of features (texture category) was identified. Conclusion Although the possibility of a generally applicable model cannot be excluded, we suggest that radiomics score models for survival were mostly specific for a given organ; applying them to other organs would require careful consideration of organ-specific properties.

Download Full-text

Deep learning to predict subtypes of poorly differentiated lung cancer from biopsy whole slide images.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.8536 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 8536-8536

Author(s):

Gouji Toyokawa ◽

Fahdi Kanavati ◽

Seiya Momosaki ◽

Kengo Tateishi ◽

Hiroaki Takeoka ◽

...

Keyword(s):

Lung Cancer ◽

Deep Learning ◽

Learning Model ◽

Test Set ◽

Cancer Subtypes ◽

Independent Test ◽

Poorly Differentiated ◽

Test Sets ◽

Deep Learning Model ◽

Whole Slide Images

8536 Background: Lung cancer is the leading cause of cancer-related death in many countries, and its prognosis remains unsatisfactory. Since treatment approaches differ substantially based on the subtype, such as adenocarcinoma (ADC), squamous cell carcinoma (SCC) and small cell lung cancer (SCLC), an accurate histopathological diagnosis is of great importance. However, if the specimen is solely composed of poorly differentiated cancer cells, distinguishing between histological subtypes can be difficult. The present study developed a deep learning model to classify lung cancer subtypes from whole slide images (WSIs) of transbronchial lung biopsy (TBLB) specimens, in particular with the aim of using this model to evaluate a challenging test set of indeterminate cases. Methods: Our deep learning model consisted of two separately trained components: a convolutional neural network tile classifier and a recurrent neural network tile aggregator for the WSI diagnosis. We used a training set consisting of 638 WSIs of TBLB specimens to train a deep learning model to classify lung cancer subtypes (ADC, SCC and SCLC) and non-neoplastic lesions. The training set consisted of 593 WSIs for which the diagnosis had been determined by pathologists based on the visual inspection of Hematoxylin-Eosin (HE) slides and of 45 WSIs of indeterminate cases (64 ADCs and 19 SCCs). We then evaluated the models using five independent test sets. For each test set, we computed the receiver operator curve (ROC) area under the curve (AUC). Results: We applied the model to an indeterminate test set of WSIs obtained from TBLB specimens that pathologists had not been able to conclusively diagnose by examining the HE-stained specimens alone. Overall, the model achieved ROC AUCs of 0.993 (confidence interval [CI] 0.971-1.0) and 0.996 (0.981-1.0) for ADC and SCC, respectively. We further evaluated the model using five independent test sets consisting of both TBLB and surgically resected lung specimens (combined total of 2490 WSIs) and obtained highly promising results with ROC AUCs ranging from 0.94 to 0.99. Conclusions: In this study, we demonstrated that a deep learning model could be trained to predict lung cancer subtypes in indeterminate TBLB specimens. The extremely promising results obtained show that if deployed in clinical practice, a deep learning model that is capable of aiding pathologists in diagnosing indeterminate cases would be extremely beneficial as it would allow a diagnosis to be obtained sooner and reduce costs that would result from further investigations.

Download Full-text

FANCY: Fast Estimation of Privacy Risk in Functional Genomics Data

10.1101/775338 ◽

2019 ◽

Cited By ~ 1

Author(s):

Gamze Gürsoy ◽

Charlotte M. Brannon ◽

Fabio C.P. Navarro ◽

Mark Gerstein

Keyword(s):

Functional Genomics ◽

Cumulative Number ◽

Rna Seq ◽

Privacy Risk ◽

Privacy Concerns ◽

Link Type ◽

Privacy Leakage ◽

Independent Test ◽

Matlab Implementation ◽

Test Sets

AbstractFunctional genomics data is becoming clinically actionable, raising privacy concerns. However, quantifying the privacy leakage by genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. FANCY can predict the cumulative number of leaking SNVs with a 0.95 average R2 for all independent test sets. We acknowledged the importance of accurate prediction even when the number of leaked variants is low, so we developed a special version of model, which can make predictions with higher accuracy for only a few leaking variants. A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.

Download Full-text

Value of Radiomics Features From Adrenal Gland and Periadrenal Fat in CT Images for Predicting COVID-19 Prognosis

10.21203/rs.3.rs-989736/v1 ◽

2021 ◽

Author(s):

Mudan zhang ◽

Xuntao Yin ◽

Wuchao Li ◽

Yan Zha ◽

Xianchun Zeng ◽

...

Keyword(s):

Adrenal Gland ◽

Adrenal Glands ◽

Endocrine System ◽

Ct Images ◽

Threshold Probability ◽

Test Set ◽

Clinical Model ◽

Disease Prognosis ◽

Independent Test ◽

Test Sets

Abstract Background: Endocrine system plays an important role in infectious disease prognosis. Our goal is to assess the value of radiomics features extracted from adrenal gland and periadrenal fat CT images in predicting disease prognosis in patients with COVID-19. Methods: A total of 1,325 patients (765 moderate and 560 severe patients) from three centers were enrolled in the retrospective study. We proposed a 3D cascade V-Net to automatically segment adrenal glands in onset CT images. Periadrenal fat areas were obtained using inflation operations. Then, the radiomics features were automatically extracted. Five models were established to predict the disease prognosis in patients with COVID-19: a clinical model (CM), three radiomics models (adrenal gland model [AM], periadrenal fat model [PM], fusion of adrenal gland and periadrenal fat model [FM]), and a radiomics nomogram model (RN).Data from one center (1,183 patients) were utilized as training and validation sets. The remaining two (36 and 106 patients) were used as 2 independent test sets to evaluate the models’ performance. Results: The auto-segmentation framework achieved an average dice of 0.79 in the test set. CM, AM, PM, FM, and RN obtained AUCs of 0.716, 0.755, 0.796, 0.828, and 0.825, respectively in the training set, and the mean AUCs of 0.754, 0.709, 0.672, 0.706 and 0.778 for 2 independent test sets. Decision curve analysis showed that if the threshold probability was more than 0.3, 0.5, and 0.1 in the validation set, the independent-test set 1 and the independent-test set 2 could gain more net benefits using RN than FM and CM, respectively. Conclusion: Radiomics features extracted from CT images of adrenal glands and periadrenal fat are related to disease prognosis in patients with COVID-19 and have great potential for predicting its severity.

Download Full-text

FANCY: fast estimation of privacy risk in functional genomics data

Bioinformatics ◽

10.1093/bioinformatics/btaa661 ◽

2020 ◽

Author(s):

Gamze Gürsoy ◽

Charlotte M Brannon ◽

Fabio C P Navarro ◽

Mark Gerstein

Keyword(s):

Functional Genomics ◽

Supplementary Information ◽

Cumulative Number ◽

Rna Seq ◽

Privacy Risk ◽

Privacy Concerns ◽

Privacy Leakage ◽

Independent Test ◽

Matlab Implementation ◽

Test Sets

Abstract Motivation Functional genomics data are becoming clinically actionable, raising privacy concerns. However, quantifying privacy leakage via genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates the number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. Results FANCY can predict the cumulative number of leaking SNVs with an average 0.95 R2 for all independent test sets. We realize the importance of accurate prediction when the number of leaked variants is low. Thus, we develop a special version of the model, which can make predictions with higher accuracy when the number of leaking variants is low. Availability and implementation A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Self-Affirmation

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.536 ◽

2017 ◽

Author(s):

Xiaoquan Zhao

Keyword(s):

Health Communication ◽

Risk Communication ◽

Sense Of Self ◽

Risk Groups ◽

Careful Consideration ◽

The Self ◽

Defensive Response ◽

Psychological Mechanisms ◽

And Behavior ◽

Message Acceptance

Self-affirmation theory posits that people are motivated to maintain an adequate sense of self-integrity. It further posits that the self-system is highly flexible such that threats to one domain of the self can be better endured if the global sense of self-integrity is protected and reinforced by self-resources in other, unrelated domains. Health and risk communication messages are often threatening to the self because they convey information that highlights inadequacies in one’s health attitudes and behaviors. This tends to lead to defensive response, particularly among high-risk groups to whom the messages are typically targeted and most relevant. However, self-affirmation theory suggests that such defensive reactions can be effectively reduced if people are provided with opportunities to reinforce their sense of self-integrity in unrelated domains. This hypothesis has generated substantial research in the past two decades. Empirical evidence so far has provided relatively consistent support for a positive effect of self-affirmation on message acceptance, intention, and behavior. These findings encourage careful consideration of the theoretical and practical implications of self-affirmation theory in the genesis and reduction of defensive response in health and risk communication. At the same time, important gaps and nuances in the literature should be noted, such as the boundary conditions of the effects of self-affirmation, the lack of clarity in the psychological mechanisms underlying the observed effects, and the fact that self-affirmation can be easily implemented in some health communication contexts, but not in others. Moreover, the research program may also benefit from greater attention to variables and questions of more direct interest to communication researchers, such as the role of varying message attributes and audience characteristics, the potential to integrate self-affirmation theory with health communication theories, and the spontaneous occurrence of positive self-affirmation in natural health communication settings.

Download Full-text

Agricultural Greenhouses Detection in High-Resolution Satellite Images Based on Convolutional Neural Networks: Comparison of Faster R-CNN, YOLO v3 and SSD

Sensors ◽

10.3390/s20174938 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4938

Author(s):

Min Li ◽

Zhijie Zhang ◽

Liping Lei ◽

Xiaofan Wang ◽

Xudong Guo

Keyword(s):

High Resolution ◽

Visual Inspection ◽

Satellite Images ◽

High Spatial Resolution ◽

Fine Tuning ◽

Single Shot ◽

Modern Agriculture ◽

High Resolution Satellite Images ◽

Independent Test ◽

Test Sets

Agricultural greenhouses (AGs) are an important facility for the development of modern agriculture. Accurately and effectively detecting AGs is a necessity for the strategic planning of modern agriculture. With the advent of deep learning algorithms, various convolutional neural network (CNN)-based models have been proposed for object detection with high spatial resolution images. In this paper, we conducted a comparative assessment of the three well-established CNN-based models, which are Faster R-CNN, You Look Only Once-v3 (YOLO v3), and Single Shot Multi-Box Detector (SSD) for detecting AGs. The transfer learning and fine-tuning approaches were implemented to train models. Accuracy and efficiency evaluation results show that YOLO v3 achieved the best performance according to the average precision (mAP), frames per second (FPS) metrics and visual inspection. The SSD demonstrated an advantage in detection speed with an FPS twice higher than Faster R-CNN, although their mAP is close on the test set. The trained models were also applied to two independent test sets, which proved that these models have a certain transability and the higher resolution images are significant for accuracy improvement. Our study suggests YOLO v3 with superiorities in both accuracy and computational efficiency can be applied to detect AGs using high-resolution satellite images operationally.

Download Full-text

Machine learning of genomic features in organotropic metastases stratifies progression risk of primary tumors

10.21203/rs.3.rs-73390/v1 ◽

2020 ◽

Author(s):

Jiguang Wang ◽

Biaobin Jiang ◽

Quanhua Mu ◽

Fufang Qiu ◽

Weiqi Xu

Keyword(s):

Early Stage ◽

Metastatic Cancer ◽

Risk Groups ◽

Sequencing Data ◽

Computational Framework ◽

Primary Tumors ◽

Genomic Features ◽

Organ Specific ◽

Spatiotemporal Behavior ◽

Prostate Cancers

Abstract Metastasis leads to most cancer deaths, but its spatiotemporal behavior remains unpredictable at early stage. Here, we developed MetaNet, a computational framework that integrates clinical and sequencing data from 32,176 primary and metastatic cancer cases, to assess metastatic risks of primary tumors. MetaNet achieved high accuracy in distinguishing the metastasis from the primary in breast and prostate cancers. From the prediction, we identified Metastasis-Featuring Primary (MFP) tumors, a subset of primary tumors with genomic features enriched in metastasis, and demonstrated their high metastatic risks with significantly shorter disease-free survivals and higher migratory potential. In addition, we identified genomic alterations associated with organ-specific metastases, and employed them to stratify patients into the risk groups with propensities toward different metastatic organs. Remarkably, this organotropic stratification achieved better prognostic value than standard histological grading system in prostate cancer, especially between Bone-MFP and Liver-MFP subtypes, with organotropic insights to inform organ-specific examinations in follow-ups.

Download Full-text

DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction

Bioinformatics ◽

10.1093/bioinformatics/bty953 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2216-2225 ◽

Cited By ~ 10

Author(s):

Abdurrahman Elbasir ◽

Balasubramanian Moovarkumudalvan ◽

Khalid Kunji ◽

Prasanna R Kolatkar ◽

Raghvendra Mall ◽

...

Keyword(s):

Deep Learning ◽

Protein Crystallization ◽

Protein Sequences ◽

Structural Features ◽

Attrition Rate ◽

Supplementary Information ◽

Learning Framework ◽

Average Improvement ◽

Independent Test ◽

Test Sets

Abstract Motivation Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. Results Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew’s correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. Availability and implementation The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Designing a general method for predicting the regulatory relationships between long noncoding RNAs and protein-coding genes based on multi-omics characteristics

Bioinformatics ◽

10.1093/bioinformatics/btz886 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2025-2032

Author(s):

Yuwei Zhang ◽

Tianfei Yi ◽

Huihui Ji ◽

Guofang Zhao ◽

Yang Xi ◽

...

Keyword(s):

Noncoding Rna ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Supplementary Information ◽

Protein Coding ◽

Protein Coding Genes ◽

Negative Results ◽

Independent Test ◽

Multiple Characteristics ◽

Test Sets

Abstract Motivation Long noncoding RNA (lncRNA) has been verified to interact with other biomolecules especially protein-coding genes (PCGs), thus playing essential regulatory roles in life activities and disease development. However, the inner mechanisms of most lncRNA–PCG relationships are still unclear. Our study investigated the characteristics of true lncRNA–PCG relationships and constructed a novel predictor with machine learning algorithms. Results We obtained the 307 true lncRNA-PCG pairs from database and found that there are significant differences in multiple characteristics between true and random lncRNA–PCG sets. Besides, 3-fold cross-validation and prediction results on independent test sets show the great AUC values of LR, SVM and RF, among which RF has the best performance with average AUC 0.818 for cross-validation, 0.823 and 0.853 for two independent test sets, respectively. In case study, some candidate lncRNA–PCG relationships in colorectal cancer were found and HOTAIR–COMP interaction was specially exemplified. The proportion of the reported pairs in the predicted positive results was significantly higher than that in negative results (P < 0.05). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Semantic Segmentation of Sorghum Using Hyperspectral Data Identifies Genetic Associations

Plant Phenomics ◽

10.34133/2020/4216373 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

Chenyong Miao ◽

Alejandro Pages ◽

Zheng Xu ◽

Eric Rodene ◽

Jinliang Yang ◽

...

Keyword(s):

Semantic Segmentation ◽

Reproductive Organs ◽

Hyperspectral Data ◽

Training Data ◽

Hyperspectral Images ◽

Genetic Associations ◽

Wide Range ◽

Maize Leaves ◽

Organ Specific ◽

Organ Level

This study describes the evaluation of a range of approaches to semantic segmentation of hyperspectral images of sorghum plants, classifying each pixel as either nonplant or belonging to one of the three organ types (leaf, stalk, panicle). While many current methods for segmentation focus on separating plant pixels from background, organ-specific segmentation makes it feasible to measure a wider range of plant properties. Manually scored training data for a set of hyperspectral images collected from a sorghum association population was used to train and evaluate a set of supervised classification models. Many algorithms show acceptable accuracy for this classification task. Algorithms trained on sorghum data are able to accurately classify maize leaves and stalks, but fail to accurately classify maize reproductive organs which are not directly equivalent to sorghum panicles. Trait measurements extracted from semantic segmentation of sorghum organs can be used to identify both genes known to be controlling variation in a previously measured phenotypes (e.g., panicle size and plant height) as well as identify signals for genes controlling traits not previously quantified in this population (e.g., stalk/leaf ratio). Organ level semantic segmentation provides opportunities to identify genes controlling variation in a wide range of morphological phenotypes in sorghum, maize, and other related grain crops.

Download Full-text