scholarly journals Predicting and characterizing a cancer dependency map of tumors with deep learning

2021 ◽  
Vol 7 (34) ◽  
pp. eabh1275
Author(s):  
Yu-Chiao Chiu ◽  
Siyuan Zheng ◽  
Li-Ju Wang ◽  
Brian S. Iskra ◽  
Manjeet K. Rao ◽  
...  

Genome-wide loss-of-function screens have revealed genes essential for cancer cell proliferation, called cancer dependencies. It remains challenging to link cancer dependencies to the molecular compositions of cancer cells or to unscreened cell lines and further to tumors. Here, we present DeepDEP, a deep learning model that predicts cancer dependencies using integrative genomic profiles. It uses a unique unsupervised pretraining that captures unlabeled tumor genomic representations to improve the learning of cancer dependencies. We demonstrated DeepDEP’s improvement over conventional machine learning methods and validated the performance with three independent datasets. By systematic model interpretations, we extended the current dependency maps with functional characterizations of dependencies and a proof-of-concept in silico assay of synthetic essentiality. We applied DeepDEP to pan-cancer tumor genomics and built the first pan-cancer synthetic dependency map of 8000 tumors with clinical relevance. In summary, DeepDEP is a novel tool for investigating cancer dependency with rapidly growing genomic resources.

2020 ◽  
Vol 12 (12) ◽  
pp. 5074
Author(s):  
Jiyoung Woo ◽  
Jaeseok Yun

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.


2021 ◽  
Author(s):  
Canbiao Wu ◽  
Xiaofang Guo ◽  
Mengyuan Li ◽  
Xiayu Fu ◽  
Zeliang Hou ◽  
...  

Hepatitis B virus (HBV) is one of the main causes for viral hepatitis and liver cancer. Previous studies showed HBV can integrate into host genome and further promote malignant transformation. In this study, we developed an attention-based deep learning model DeepHBV to predict HBV integration sites by learning local genomic features automatically. We trained and tested DeepHBV using the HBV integration sites data from dsVIS database. Initially, DeepHBV showed AUROC of 0.6363 and AUPR of 0.5471 on the dataset. Adding repeat peaks and TCGA Pan Cancer peaks can significantly improve the model performance, with an AUROC of 0.8378 and 0.9430 and an AUPR of 0.7535 and 0.9310, respectively. On independent validation dataset of HBV integration sites from VISDB, DeepHBV with HBV integration sequences plus TCGA Pan Cancer (AUROC of 0.7603 and AUPR of 0.6189) performed better than HBV integration sequences plus repeat peaks (AUROC of 0.6657 and AUPR of 0.5737). Next, we found the transcriptional factor binding sites (TFBS) were significantly enriched near genomic positions that were paid attention to by convolution neural network. The binding sites of AR-halfsite, Arnt, Atf1, bHLHE40, bHLHE41, BMAL1, CLOCK, c-Myc, COUP-TFII, E2A, EBF1, Erra and Foxo3 were highlighted by DeepHBV attention mechanism in both dsVIS dataset and VISDB dataset, revealing the HBV integration preference. In summary, DeepHBV is a robust and explainable deep learning model not only for the prediction of HBV integration sites but also for further mechanism study of HBV induced cancer.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S2) ◽  
Author(s):  
Qi Tian ◽  
Jianxiao Zou ◽  
Jianxiong Tang ◽  
Yuan Fang ◽  
Zhongli Yu ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7078
Author(s):  
Yueting Wang ◽  
Minzan Li ◽  
Ronghua Ji ◽  
Minjuan Wang ◽  
Lihua Zheng

Visible-near-infrared spectrum (Vis-NIR) spectroscopy technology is one of the most important methods for non-destructive and rapid detection of soil total nitrogen (STN) content. In order to find a practical way to build STN content prediction model, three conventional machine learning methods and one deep learning approach are investigated and their predictive performances are compared and analyzed by using a public dataset called LUCAS Soil (19,019 samples). The three conventional machine learning methods include ordinary least square estimation (OLSE), random forest (RF), and extreme learning machine (ELM), while for the deep learning method, three different structures of convolutional neural network (CNN) incorporated Inception module are constructed and investigated. In order to clarify effectiveness of different pre-treatments on predicting STN content, the three conventional machine learning methods are combined with four pre-processing approaches (including baseline correction, smoothing, dimensional reduction, and feature selection) are investigated, compared, and analyzed. The results indicate that the baseline-corrected and smoothed ELM model reaches practical precision (coefficient of determination (R2) = 0.89, root mean square error of prediction (RMSEP) = 1.60 g/kg, and residual prediction deviation (RPD) = 2.34). While among three different structured CNN models, the one with more 1 × 1 convolutions preforms better (R2 = 0.93; RMSEP = 0.95 g/kg; and RPD = 3.85 in optimal case). In addition, in order to evaluate the influence of data set characteristics on the model, the LUCAS data set was divided into different data subsets according to dataset size, organic carbon (OC) content and countries, and the results show that the deep learning method is more effective and practical than conventional machine learning methods and, on the premise of enough data samples, it can be used to build a robust STN content prediction model with high accuracy for the same type of soil with similar agricultural treatment.


2021 ◽  
Author(s):  
Javad Noorbakhsh ◽  
Saman Farahmand ◽  
Ali Foroughi pour ◽  
Sandeep Namburi ◽  
Dennis Caruana ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3085 ◽  
Author(s):  
Raluca Brehar ◽  
Delia-Alexandrina Mitrea ◽  
Flaviu Vancea ◽  
Tiberiu Marita ◽  
Sergiu Nedevschi ◽  
...  

The emergence of deep-learning methods in different computer vision tasks has proved to offer increased detection, recognition or segmentation accuracy when large annotated image datasets are available. In the case of medical image processing and computer-aided diagnosis within ultrasound images, where the amount of available annotated data is smaller, a natural question arises: are deep-learning methods better than conventional machine-learning methods? How do the conventional machine-learning methods behave in comparison with deep-learning methods on the same dataset? Based on the study of various deep-learning architectures, a lightweight multi-resolution Convolutional Neural Network (CNN) architecture is proposed. It is suitable for differentiating, within ultrasound images, between the Hepatocellular Carcinoma (HCC), respectively the cirrhotic parenchyma (PAR) on which HCC had evolved. The proposed deep-learning model is compared with other CNN architectures that have been adapted by transfer learning for the ultrasound binary classification task, but also with conventional machine-learning (ML) solutions trained on textural features. The achieved results show that the deep-learning approach overcomes classical machine-learning solutions, by providing a higher classification performance.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5114 ◽  
Author(s):  
Qin Ni ◽  
Zhuo Fan ◽  
Lei Zhang ◽  
Chris D. Nugent ◽  
Ian Cleland ◽  
...  

Activity recognition has received considerable attention in many research fields, such as industrial and healthcare fields. However, many researches about activity recognition have focused on static activities and dynamic activities in current literature, while, the transitional activities, such as stand-to-sit and sit-to-stand, are more difficult to recognize than both of them. Consider that it may be important in real applications. Thus, a novel framework is proposed in this paper to recognize static activities, dynamic activities, and transitional activities by utilizing stacked denoising autoencoders (SDAE), which is able to extract features automatically as a deep learning model rather than utilize manual features extracted by conventional machine learning methods. Moreover, the resampling technique (random oversampling) is used to improve problem of unbalanced samples due to relatively short duration characteristic of transitional activity. The experiment protocol is designed to collect twelve daily activities (three types) by using wearable sensors from 10 adults in smart lab of Ulster University, the experiment results show the significant performance on transitional activity recognition and achieve the overall accuracy of 94.88% on three types of activities. The results obtained by comparing with other methods and performances on other three public datasets verify the feasibility and priority of our framework. This paper also explores the effect of multiple sensors (accelerometer and gyroscope) to determine the optimal combination for activity recognition.


2021 ◽  
Author(s):  
Azadeh Mozhdehfarahbakhsh ◽  
Saman Chitsazian ◽  
Prasun Chakrabarti ◽  
Tulika Chakrabarti ◽  
Babak Kateb ◽  
...  

AbstractParkinson’s disease (PD) is amongst the relatively prevalent neurodegenerative disorders with its course of progression classified as prodromal, stage1, 2, 3 and sever conditions. With all the shortcomings in clinical setting, it is often challenging to identify the stage of PD severity and predict its progression course. Therefore, there appear to be an ever-growing need need to use supervised and unsupervised artificial intelligence and machine learning methods on clinical and paraclinical datasets to accurately diagnose PD, identify its stage and predict its course. In today’s neuro-medicine practices, MRI-related data are regarded beneficial in detecting various pathologies in the brain. In addition, the field has recently witnessed a growing application of deep learning methods in image processing often with outstanding results. Here, we applied Convolutional Neural Networks (CNN) to propose a model helping to distinguish different stages of PD. The results showed that our current MRI-based CNN model may potentially be employed as a suitable method for the distinction of PD stages at a high accuracy rate (0.94).


2021 ◽  
Vol 22 (16) ◽  
pp. 9054
Author(s):  
Wei Du ◽  
Xuan Zhao ◽  
Yu Sun ◽  
Lei Zheng ◽  
Ying Li ◽  
...  

Identifying secretory proteins from blood, saliva or other body fluids has become an effective method of diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms and are highly dependent on the feature set from the protein. In this article, we propose a deep learning model based on the capsule network and transformer architecture, SecProCT, to predict secretory proteins using only amino acid sequences. The proposed model was validated using cross-validation and achieved 0.921 and 0.892 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively. Meanwhile, the proposed model was validated on an independent test set and achieved 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. The main contributions of this article are as follows: (1) a deep learning model based on a capsule network and transformer architecture is proposed for predicting secretory proteins. The results of this model are better than the those of existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.


Sign in / Sign up

Export Citation Format

Share Document