scholarly journals Deep Learning Based Biomedical Literature Classification Using Criteria of Scientific Rigor

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1253
Author(s):  
Muhammad Afzal ◽  
Beom Joo Park ◽  
Maqbool Hussain ◽  
Sungyoung Lee

A major blockade to support the evidence-based clinical decision-making is accurately and efficiently recognizing appropriate and scientifically rigorous studies in the biomedical literature. We trained a multi-layer perceptron (MLP) model on a dataset with two textual features, title and abstract. The dataset consisting of 7958 PubMed citations classified in two classes: scientific rigor and non-rigor, is used to train the proposed model. We compare our model with other promising machine learning models such as Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosted Tree (GBT) approaches. Based on the higher cumulative score, deep learning was chosen and was tested on test datasets obtained by running a set of domain-specific queries. On the training dataset, the proposed deep learning model obtained significantly higher accuracy and AUC of 97.3% and 0.993, respectively, than the competitors, but was slightly lower in the recall of 95.1% as compared to GBT. The trained model sustained the performance of testing datasets. Unlike previous approaches, the proposed model does not require a human expert to create fresh annotated data; instead, we used studies cited in Cochrane reviews as a surrogate for quality studies in a clinical topic. We learn that deep learning methods are beneficial to use for biomedical literature classification. Not only do such methods minimize the workload in feature engineering, but they also show better performance on large and noisy data.

Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1364
Author(s):  
Beomjoo Park ◽  
Muhammad Afzal ◽  
Jamil Hussain ◽  
Asim Abbas ◽  
Sungyoung Lee

To support evidence-based precision medicine and clinical decision-making, we need to identify accurate, appropriate, and clinically relevant studies from voluminous biomedical literature. To address the issue of accurate identification of high impact relevant articles, we propose a novel approach of attention-based deep learning for finding and ranking relevant studies against a topic of interest. For learning the proposed model, we collect data consisting of 240,324 clinical articles from the 2018 Precision Medicine track in Text REtrieval Conference (TREC) to identify and rank relevant documents matched with the user query. We built a BERT (Bidirectional Encoder Representations from Transformers) based classification model to classify high and low impact articles. We contextualized word embedding to create vectors of the documents, and user queries combined with genetic information to find contextual similarity for determining the relevancy score to rank the articles. We compare our proposed model results with existing approaches and obtain a higher accuracy of 95.44% as compared to 94.57% (the next best performer) and get a higher precision by about 14% at P@5 (precision at 5) and about 12% at P@10 (precision at 10). The contextually viable and competitive outcomes of the proposed model confirm the suitability of our proposed model for use in domains like evidence-based precision medicine.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Steven A. Hicks ◽  
Jonas L. Isaksen ◽  
Vajira Thambawita ◽  
Jonas Ghouse ◽  
Gustav Ahlberg ◽  
...  

AbstractDeep learning-based tools may annotate and interpret medical data more quickly, consistently, and accurately than medical doctors. However, as medical doctors are ultimately responsible for clinical decision-making, any deep learning-based prediction should be accompanied by an explanation that a human can understand. We present an approach called electrocardiogram gradient class activation map (ECGradCAM), which is used to generate attention maps and explain the reasoning behind deep learning-based decision-making in ECG analysis. Attention maps may be used in the clinic to aid diagnosis, discover new medical knowledge, and identify novel features and characteristics of medical tests. In this paper, we showcase how ECGradCAM attention maps can unmask how a novel deep learning model measures both amplitudes and intervals in 12-lead electrocardiograms, and we show an example of how attention maps may be used to develop novel ECG features.


2018 ◽  
Vol 16 (1) ◽  
Author(s):  
David Benrimoh ◽  
Robert Fratila ◽  
Sonia Israel ◽  
Kelly Perlman

Globally, depression affects 300 million people and is projected be the leading cause of disability by 2030. While different patients are known to benefit from different therapies, there is no principled way for clinicians to predict individual patient responses or side effect profiles. A form of machine learning based on artificial neural networks, deep learning, might be useful for generating a predictive model that could aid in clinical decision making. Such a model’s primary outcomes would be to help clinicians select the most effective treatment plans and mitigate adverse side effects, allowing doctors to provide greater personalized care to a larger number of patients. In this commentary, we discuss the need for personalization of depression treatment and how a deep learning model might be used to construct a clinical decision aid.


2021 ◽  
Vol 11 (1) ◽  
pp. 491-508
Author(s):  
Monika Lamba ◽  
Yogita Gigras ◽  
Anuradha Dhull

Abstract Detection of plant disease has a crucial role in better understanding the economy of India in terms of agricultural productivity. Early recognition and categorization of diseases in plants are very crucial as it can adversely affect the growth and development of species. Numerous machine learning methods like SVM (support vector machine), random forest, KNN (k-nearest neighbor), Naïve Bayes, decision tree, etc., have been exploited for recognition, discovery, and categorization of plant diseases; however, the advancement of machine learning by DL (deep learning) is supposed to possess tremendous potential in enhancing the accuracy. This paper proposed a model comprising of Auto-Color Correlogram as image filter and DL as classifiers with different activation functions for plant disease. This proposed model is implemented on four different datasets to solve binary and multiclass subcategories of plant diseases. Using the proposed model, results achieved are better, obtaining 99.4% accuracy and 99.9% sensitivity for binary class and 99.2% accuracy for multiclass. It is proven that the proposed model outperforms other approaches, namely LibSVM, SMO (sequential minimal optimization), and DL with activation function softmax and softsign in terms of F-measure, recall, MCC (Matthews correlation coefficient), specificity and sensitivity.


2021 ◽  
Author(s):  
Adrian Ahne ◽  
Guy Fagherazzi ◽  
Xavier Tannier ◽  
Thomas Czernichow ◽  
Francisco Orchard

BACKGROUND The amount of available textual health data such as scientific and biomedical literature is constantly growing and it becomes more and more challenging for health professionals to properly summarise those data and in consequence to practice evidence-based clinical decision making. Moreover, the exploration of large unstructured health text data is very challenging for non experts due to limited time, resources and skills. Current tools to explore text data lack ease of use, need high computation efforts and have difficulties to incorporate domain knowledge and focus on topics of interest. OBJECTIVE We developed a methodology which is able to explore and target topics of interest via an interactive user interface for experts and non-experts. We aim to reach near state of the art performance, while reducing memory consumption, increasing scalability and minimizing user interaction effort to improve the clinical decision making process. The performance is evaluated on diabetes-related abstracts from Pubmed. METHODS The methodology consists of four parts: 1) A novel interpretable hierarchical clustering of documents where each node is defined by headwords (describe documents in this node the most); 2) An efficient classification system to target topics; 3) Minimized users interaction effort through active learning; 4) A visual user interface through which a user interacts. We evaluated our approach on 50,911 diabetes-related abstracts from Pubmed which provide a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against three other strategies: random selection of training instances, uncertainty sampling which chooses instances the model is most uncertain about and an expected gradient length strategy based on convolutional neural networks (CNN). RESULTS For the hierarchical clustering performance, we achieved a F1-Score of 0.73 compared to scikit-learn’s of 0.76. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1-Score over all MeSH codes resulted in satisfying 0.62 F1-Score of our approach, compared to 0.61 of the uncertainty strategy, 0.61 the CNN and 0.45 the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents but increased execution time. CONCLUSIONS We proposed an easy to use tool for experts and non-experts being able to combine domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore our approach is very memory efficient and highly parallelizable making it interesting for large Big Data sets. This approach can be used by health professionals to rapidly get deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process.


2017 ◽  
Vol 15 (03) ◽  
pp. 1750010 ◽  
Author(s):  
Ze Liu ◽  
Hongqiang Lv ◽  
Jiuqiang Han ◽  
Ruiling Liu

Transmembrane region (TR) is a conserved region of transmembrane (TM) subunit in envelope (env) glycoprotein of retrovirus. Evidences have shown that TR is responsible for anchoring the env glycoprotein on the lipid bilayer and substitution of the TR for a covalently linked lipid anchor abrogates fusion. However, universal software could not achieve sufficient accuracy as TM in env also has several motifs such as signal peptide, fusion peptide and immunosuppressive domain composed largely of hydrophobic residues. In this paper, a support vector machine-based (SVM) model is proposed to identify TRs in retroviruses. Firstly, physicochemical and evolutionary information properties were extracted as original features. And then, the feature importance was analyzed by minimum Redundancy Maximum Relevance (mRMR) feature selection criterion. Our model achieved an Sn of 0.955, Sp of 0.998, ACC of 0.995, MCC of 0.954 using 10-fold cross-validation on the training dataset. These results suggest that the proposed model can be used to predict TRs in non-annotation retroviruses and 11917, 3344, 2, 289 and 6 new putative TRs were found in HERV, HIV, HTLV, SIV, MLV, respectively.


2021 ◽  
Author(s):  
Nanditha Mallesh ◽  
Max Zhao ◽  
Lisa Meintker ◽  
Alexander Höllein ◽  
Franz Elsner ◽  
...  

AbstractMulti-parameter flow cytometry (MFC) is a cornerstone in clinical decision making for hematological disorders such as leukemia or lymphoma. MFC data analysis requires trained experts to manually gate cell populations of interest, which is time-consuming and subjective. Manual gating is often limited to a two-dimensional space. In recent years, deep learning models have been developed to analyze the data in high-dimensional space and are highly accurate. Such models have been used successfully in histology, cytopathology, image flow cytometry, and conventional MFC analysis. However, current AI models used for subtype classification based on MFC data are limited to the antibody (flow cytometry) panel they were trained on. Thus, a key challenge in deploying AI models into routine diagnostics is the robustness and adaptability of such models. In this study, we present a workflow to extend our previous model to four additional MFC panels. We employ knowledge transfer to adapt the model to smaller data sets. We trained models for each of the data sets by transferring the features learned from our base model. With our workflow, we could increase the model’s overall performance and more prominently, increase the learning rate for very small training sizes.


2020 ◽  
Vol 25 (3) ◽  
pp. 373-382
Author(s):  
He Yu ◽  
Zaike Tian ◽  
Hongru Li ◽  
Baohua Xu ◽  
Guoqing An

Residual Useful Life (RUL) prediction is a key step of Condition-Based Maintenance (CBM). Deep learning-based techniques have shown wonderful prospects on RUL prediction, although their performances depend on heavy structures and parameter tuning strategies of these deep-learning models. In this paper, we propose a novel Deep Belief Network (DBN) model constructed by improved conditional Restrict Boltzmann Machines (RBMs) and apply it in RUL prediction for hydraulic pumps. DBN is a deep probabilistic digraph neural network that consists of multiple layers of RBMs. Since RBM is an undirected graph model and there is no communication among the nodes of the same layer, the deep feature extraction capability of the original DBN model can hardly ensure the accuracy of modeling continuous data. To address this issue, the DBN model is improved by replacing RBM with the Improved Conditional RBM (ICRBM) that adds timing linkage factors and constraint variables among the nodes of the same layers on the basis of RBM. The proposed model is applied to RUL prediction of hydraulic pumps, and the results show that the prediction model proposed in this paper has higher prediction accuracy compared with traditional DBNs, BP networks, support vector machines and modified DBNs such as DEBN and GC-DBN.


Sign in / Sign up

Export Citation Format

Share Document