scholarly journals A Machine Learning Framework that Integrates Multi-omics Data Predicts Cancer-related LncRNAs

Author(s):  
Lin Yuan ◽  
Jing Zhao ◽  
Tao Sun ◽  
Zhen Shen

Abstract Background: LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. Results: In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively. LGDLDA calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. LGDLDA obtains gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix. LGDLDA obtains disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. Conclusions: Compared with lncRNA-disease prediction methods, IHI-BMLLR takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied LGDLDA to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lin Yuan ◽  
Jing Zhao ◽  
Tao Sun ◽  
Zhen Shen

Abstract Background LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. Results In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. Conclusions Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.


2017 ◽  
Author(s):  
Luís Dias ◽  
Rosalvo Neto

Google released on November of 2015 Tensorflow, an open source machine learning framework that can be used to implement Deep Neural Network algorithms, a class of algorithms that shows great potential in solving complex problems. Considering the importance of usability in software success, this research aims to perform a usability analysis on Tensorflow and to compare it with another widely used framework, R. The evaluation was performed through usability tests with university students. The study led do indications that Tensorflow usability is equal or better than the usability of traditional frameworks used by the scientific community.


Cancers ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 3047
Author(s):  
Xiaoyu Zhang ◽  
Yuting Xing ◽  
Kai Sun ◽  
Yike Guo

High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called “the curse of dimensionality” in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.


10.2196/14502 ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. e14502
Author(s):  
Po-Ting Lai ◽  
Wei-Liang Lu ◽  
Ting-Rung Kuo ◽  
Chia-Ru Chung ◽  
Jen-Chieh Han ◽  
...  

Background Research on disease-disease association (DDA), like comorbidity and complication, provides important insights into disease treatment and drug discovery, and a large body of the literature has been published in the field. However, using current search tools, it is not easy for researchers to retrieve information on the latest DDA findings. First, comorbidity and complication keywords pull up large numbers of PubMed studies. Second, disease is not highlighted in search results. Finally, DDA is not identified, as currently no disease-disease association extraction (DDAE) dataset or tools are available. Objective As there are no available DDAE datasets or tools, this study aimed to develop (1) a DDAE dataset and (2) a neural network model for extracting DDA from the literature. Methods In this study, we formulated DDAE as a supervised machine learning classification problem. To develop the system, we first built a DDAE dataset. We then employed two machine learning models, support vector machine and convolutional neural network, to extract DDA. Furthermore, we evaluated the effect of using the output layer as features of the support vector machine-based model. Finally, we implemented large margin context-aware convolutional neural network architecture to integrate context features and convolutional neural networks through the large margin function. Results Our DDAE dataset consisted of 521 PubMed abstracts. Experiment results showed that the support vector machine-based approach achieved an F1 measure of 80.32%, which is higher than the convolutional neural network-based approach (73.32%). Using the output layer of convolutional neural network as a feature for the support vector machine does not further improve the performance of support vector machine. However, our large margin context-aware-convolutional neural network achieved the highest F1 measure of 84.18% and demonstrated that combining the hinge loss function of support vector machine with a convolutional neural network into a single neural network architecture outperforms other approaches. Conclusions To facilitate the development of text-mining research for DDAE, we developed the first publicly available DDAE dataset consisting of disease mentions, Medical Subject Heading IDs, and relation annotations. We developed different conventional machine learning models and neural network architectures and evaluated their effects on our DDAE dataset. To further improve DDAE performance, we propose an large margin context-aware-convolutional neural network model for DDAE that outperforms other approaches.


2019 ◽  
Author(s):  
Po-Ting Lai ◽  
Wei-Liang Lu ◽  
Ting-Rung Kuo ◽  
Chia-Ru Chung ◽  
Jen-Chieh Han ◽  
...  

BACKGROUND Research on disease-disease association, like comorbidity and complication, provides important insights into disease treatment and drug discovery, and a large body of literature has been published in the field. However, using current search tools, it is not easy for researchers to retrieve information on the latest disease association findings. For one thing, comorbidity and complication keywords pull up large numbers of PubMed studies. Secondly, disease is not highlighted in search results. Third, disease-disease association (DDA) is not identified, as currently no DDA extraction dataset or tools are available. OBJECTIVE Since there are no available disease-disease association extraction (DDAE) datasets or tools, we aim to develop (1) a DDAE dataset and (2) a neural network model for extracting DDAs from literature. METHODS In this study, we formulate DDAE as a supervised machine learning classification problem. To develop the system, we first build a DDAE dataset. We then employ two machine-learning models, support vector machine (SVM) and convolutional neural network (CNN), to extract DDAs. Furthermore, we evaluate the effect of using the output layer as features of the SVM-based model. Finally, we implement large margin context-aware convolutional neural network (LC-CNN) architecture to integrate context features and CNN through the large margin function. RESULTS Our DDAE dataset consists of 521 PubMed abstracts. Experiment results show that the SVM-based approach achieves an F1-measure of 80.32%, which is higher than the CNN-based approach (73.32%). Using the output layer of CNN as a feature for SVM does not further improve the performance of SVM. However, our LC-CNN achieves the highest F1-measure of 84.18%, and demonstrates combining the hinge loss function of SVM with CNN into a single NN architecture outperforms other approaches. CONCLUSIONS To facilitate the development of text-mining research for DDAE, we develop the first publicly available DDAE dataset consisting of disease mentions, MeSH IDs and relation annotations. We develop different conventional ML models and NN architectures, and evaluate their effects on our DDAE dataset. To further improve DDAE performance, we propose an LC-CNN model for DDAE that outperforms other approaches.


2021 ◽  
Vol 14 (2) ◽  
pp. 28-34
Author(s):  
Sergey Pobeda ◽  
M. Chernyh ◽  
F. Makarenko ◽  
Konstantin Zolnikov

The article deals with the creation of a behavioral model of lateral metal oxide transistors (LDMOS) based on a neural network of the multilayer percep-tron type. The model is identified using a backpropa-gation algorithm. Demonstrated the process of creating an ANN model using Pytorch, a machine learning framework for the Python language, with subsequent transfer to the standard analog circuit modeling lan-guage Verilog-A.


2019 ◽  
Author(s):  
Po-Ting Lai ◽  
Wei-Liang Lu ◽  
Ting-Rung Kuo ◽  
Chia-Ru Chung ◽  
Jen-Chieh Han ◽  
...  

BACKGROUND Research on disease-disease association (DDA), like comorbidity and complication, provides important insights into disease treatment and drug discovery, and a large body of the literature has been published in the field. However, using current search tools, it is not easy for researchers to retrieve information on the latest DDA findings. First, comorbidity and complication keywords pull up large numbers of PubMed studies. Second, disease is not highlighted in search results. Finally, DDA is not identified, as currently no disease-disease association extraction (DDAE) dataset or tools are available. OBJECTIVE As there are no available DDAE datasets or tools, this study aimed to develop (1) a DDAE dataset and (2) a neural network model for extracting DDA from the literature. METHODS In this study, we formulated DDAE as a supervised machine learning classification problem. To develop the system, we first built a DDAE dataset. We then employed two machine learning models, support vector machine and convolutional neural network, to extract DDA. Furthermore, we evaluated the effect of using the output layer as features of the support vector machine-based model. Finally, we implemented large margin context-aware convolutional neural network architecture to integrate context features and convolutional neural networks through the large margin function. RESULTS Our DDAE dataset consisted of 521 PubMed abstracts. Experiment results showed that the support vector machine-based approach achieved an F1 measure of 80.32%, which is higher than the convolutional neural network-based approach (73.32%). Using the output layer of convolutional neural network as a feature for the support vector machine does not further improve the performance of support vector machine. However, our large margin context-aware-convolutional neural network achieved the highest F1 measure of 84.18% and demonstrated that combining the hinge loss function of support vector machine with a convolutional neural network into a single neural network architecture outperforms other approaches. CONCLUSIONS To facilitate the development of text-mining research for DDAE, we developed the first publicly available DDAE dataset consisting of disease mentions, Medical Subject Heading IDs, and relation annotations. We developed different conventional machine learning models and neural network architectures and evaluated their effects on our DDAE dataset. To further improve DDAE performance, we propose an large margin context-aware-convolutional neural network model for DDAE that outperforms other approaches.


Sign in / Sign up

Export Citation Format

Share Document