Repositioning Traditional Chinese Medicine to PI3K Pathway Proteins Based on Deep Learning Method

Author(s):  
Yue Zhong ◽  
Xun Wang ◽  
Qingyu Tian ◽  
Dayan Liu ◽  
Ying Zhang ◽  
...  
2021 ◽  
Author(s):  
Yongmei Lu ◽  
Li Chen ◽  
Tingting Zhang ◽  
Lingmei Bu ◽  
Ying Ye ◽  
...  

Abstract Background: Ancient literature of Traditional Chinese Medicine (TCM) contains massive clinical experiences which are important ingredient of TCM knowledge and valuable for TCM clinical practice of nowadays. However, it is difficult for TCM professionals to acquire such valuable experiences due to their massive volume and broad occurrence in the literature. Furthermore, different characteristics of ancient Chinese language from the modern one lead to additional challenges for analyzing the literature, regardless of how to perform the analyzing, manually or automatically with a software toolkit. Methods: In order to overcome the aforementioned challenges, we formalize a novel information extraction task for ancient literature of TCM, and the entities to be extracted are Disease-Specific Clinical Experiences (DSCEs) occurring in the literature. For the purpose, we have collected two corpora from ancient literature of TCM and annotated them manually with DSCEs occurrence information for the diseases pregnant abdominalgia and colporrhagia (妊娠腹痛及下血) and jaundice (黄疸) respectively. We further propose a deep learning and CRF-based algorithmic framework with character encoding of ancient Chinese, thus avoiding the special difficulty in word segmentation for ancient Chinese texts. We investigate the framework with different methods for contextual encoding of characters in a sentence, including CNN, Bi-LSTM and BERT, and diverse approaches to aggregate contextual information of characters into a sentence encoding, such as max-pooling and attention mechanism. After that all the encoded sentences in a section of the literature are passed through a Bi-LSTM-based sequence labelling model with CRF inference on its top to obtain an optimal label sequence for the sentences in the section. Results: We conduct a series of experiments on the two corpora to verify the effectiveness of our framework for the task, and evaluate its effectiveness with different metrics in two granularities of labelling, namely accuracy/F1-value in sentence-level labelling and precision/recall/F1-value in correct recognition of the whole DSCEs. Conclusion: The experimental results demonstrate that the deep learning and CRF-based framework with character encoding of ancient Chinese could achieve an accuracy of 80.40%±1.64% and an F1-value of 76.73%±1.59% for the sentence labelling, while for recognition of the whole DSCEs, it is able to obtain the recall of 44.97%±2.16% and the precision of 51.13%±2.64%, meaning that the framework is a promising baseline for further development of the novel information extraction task for TCM.


2019 ◽  
Vol 26 (12) ◽  
pp. 1632-1636 ◽  
Author(s):  
Liang Yao ◽  
Zhe Jin ◽  
Chengsheng Mao ◽  
Yin Zhang ◽  
Yuan Luo

Abstract Traditional Chinese Medicine (TCM) has been developed for several thousand years and plays a significant role in health care for Chinese people. This paper studies the problem of classifying TCM clinical records into 5 main disease categories in TCM. We explored a number of state-of-the-art deep learning models and found that the recent Bidirectional Encoder Representations from Transformers can achieve better results than other deep learning models and other state-of-the-art methods. We further utilized an unlabeled clinical corpus to fine-tune the BERT language model before training the text classifier. The method only uses Chinese characters in clinical text as input without preprocessing or feature engineering. We evaluated deep learning models and traditional text classifiers on a benchmark data set. Our method achieves a state-of-the-art accuracy 89.39% ± 0.35%, Macro F1 score 88.64% ± 0.40% and Micro F1 score 89.39% ± 0.35%. We also visualized attention weights in our method, which can reveal indicative characters in clinical text.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ning Cheng ◽  
Yue Chen ◽  
Wanqing Gao ◽  
Jiajun Liu ◽  
Qunfu Huang ◽  
...  

Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination.Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia, natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie, an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed.Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F1-score of 0.762, both higher than the logistic regression (acc = 0.561, F1-score = 0.567), SVM (acc = 0.703, F1-score = 0.591), LSTM (acc = 0.723, F1-score = 0.621), and TextCNN (acc = 0.745, F1-score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F1-score is greatly improved by an average of 47.1% in 19 models.Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.


Author(s):  
Zeheng Wang ◽  
Liang Li ◽  
Jing Yan ◽  
Yuanzhe Yao

Ethnopharmacological relevance: Novel coronavirus disease (COVID-19) outbroke in Wuhan has imposed a huge influence onto the society in term of the public heath and economy. However, so far, no effective drugs or vaccines have been developed. Whereas, the Traditional Chinese Medicine (TCM) has been considered as a promising supplementary treatment for the disease owing to its clinically proven performance on many diseases even like severe acute respiratory syndrome (SARS). Meanwhile, many side-effect (SE) reports suggest the SE of the TCM prescriptions cannot be ignored in curing the COVID-19, especially because COVID-19 always simultaneously leads to dramatic degradation of the patients’ physical condition. How to evaluate the TCM regarding to their latent SE is a urgent challenge. Aim of the study: In this study, we use an ontology-based side-effect prediction framework (OSPF) developed in our previous work and Artificial Neural Network (ANN)-based deep learning to evaluate the TCM prescriptions that are officially recommended in China for novel coronavirus (COVID-19). Materials and methods: Firstly, we adopted the OSPF developed in our previous work, where an ontology-based model separate all the ingredients in a TCM prescription into two categories: hot and cold. Then, we established a database by converting each TCM prescription into a vector containing the ingredient dosage and the according hot/cold attribution as well as the safe/unsafe label. And, we trained the ANN model using this database, after which a safety indicator (SI), as the complementary percentage of side-effect (SE) possibility, is then given for each TCM prescription. According to the proposed SI from high to low, we re-organize the recommended prescription list. Secondly, by using this method, we also evaluate the safety indicators of some other famous TCM prescriptions that are not in the recommended list but are used traditionally to cure flu-like diseases for extending the potential treatments. Results: Based on the SI generated in the ANN model, FTS, PMSP, and SF are the safest ones in recommended list, which all own a more-than-0.8 SI. Whereas, JHQG, LHQW, SFJD, XBJ, and SHL are the prescriptions that are most likely unsafe, where the indicators are all below 0.2. In the extra list, the indicators of XC, XQRS, CC, and CHBX are all above 0.8, and at the meantime, XZXS, SJ, QW, and KBD’s indicators are all below 0.2. Conclusions: In total, there are seven TCM prescriptions which own the indicators more than 0.8, suggesting these prescriptions should be considered firstly in curing COVID-19, if suitable. We believe this work will provide a reasonable suggestion for the society to choose proper TCM as the supplementary treatment for COVID-19. Besides, this work also introduces a pilot and enlightening method for creating a more reasonable recommendation list of TCM to other diseases.


Sign in / Sign up

Export Citation Format

Share Document