A Multidimensional Data Fusion Model Based on Deep Learning for a Patient Similarity Network (Preprint)

BACKGROUND Precision medicine is a novel approach for patient care. It allows the prescription of the appropriate drug as well as suitable treatments to the right patient at the right time. It can be envisioned as the comparison of a new patient with existing patients having similar characteristics, which can be referred to as patient similarity. Several statistical, data mining, and deep learning models have been used to build and apply patient similarity network (PSN) for various purposes. However, the challenges associated with data heterogeneity and dimensionality make it difficult to use a single model that addresses both the challenges of reducing data dimensionality and capturing features of diverse data types, including contextual and longitudinal data. Furthermore, when applying multiple models, we can observe the additional challenges associated with the development of an optimum aggregation scheme that maintains high accuracy and preserves data veracity. OBJECTIVE In this study, we propose a multi-model PSN that considers heterogeneous data with static and dynamic characteristics for disease diagnosis for improving prediction accuracy. The static data model manages the data obtained from patient profiles, whereas the dynamic data model manages longitudinal data from patient treatment pathways and clinical data. METHODS We propose a combination of deep learning models and patient similarity network to obtain abundant clinical evidence and extract relevant information based on which similar patients can be explored and compared, thereby obtaining more accurate and comprehensive diagnosis and recommendations. We use the bidirectional encoder representations from transformers (BERT) to process and analyze the contextual data and generate word embedding, where semantic features are captured using a CNN. Dynamic data is analyzed using a long–short-term memory (LSTM)-based autoencoder, which reduces data dimensionality while preserving the temporal features of the data. Furthermore, we propose an aggregation-based fusion approach in which temporal data and clinical narrative data are combined for estimating the patient similarity. RESULTS We evaluated our proposed method through a series of experiments. The obtained results proved that our proposed deep learning-based PSN fusion model provides higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms. CONCLUSIONS Our multi-model highlights the intensity of the similarity between pairs of patients, thereby realizing precise diagnosis and recommendations for a new patient.

Download Full-text

Sentiment Analysis of Film Reviews Based on BI-GRU +Attention+Capsule Fusion

10.36227/techrxiv.14863401.v1 ◽

2021 ◽

Author(s):

zhifei hu

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Film Review ◽

The Other ◽

Learning Models ◽

Analysis Model ◽

Fusion Model ◽

Data Set ◽

Analysis Task ◽

Film Reviews

In this paper, a sentiment analysis model based on the bi-directional GRU, Attention and Capusle fusion of BI-GRU+Attention+Capsule was designed and implemented based on the sentiment analysis task of the open film review data set IMDB, and combined with the bi-directional GRU, Attention and Capsule. It is compared with six deep learning models, such as LSTM, CNN, GRU, BI-GRU, CNN+GRU and GRU+CNN. The experimental results show that the accuracy of the BI-GRU model combined with Attention and Capusule is higher than the other six models, and the accuracy of the GRU+CNN model is higher than that of the CNN+GRU model, and the accuracy of the CNN+GRU model is higher than that of the CNN model. The accuracy of CNN model was successively higher than that of LSTM, BI-GRU and GRU model. The fusion model of BI-GRU +Attention+Capsule adopted in this paper has the highest accuracy among all the models. In conclusion, the fusion model of BI-GRU+Attention+Capsule effectively improves the accuracy of text sentiment classification.

Download Full-text

Performance Evaluation of DLSARS Framework in Intelligent Product Recommendation Systems

International Journal For Innovative Engineering and Management Research ◽

10.48047/ijiemr/v09/i11/34 ◽

2020 ◽

pp. 165-175

Keyword(s):

Performance Evaluation ◽

Deep Learning ◽

Recommendation System ◽

The Other ◽

Program Model ◽

Learning Models ◽

Technical Approach ◽

Huge Number ◽

Intelligent Product ◽

The Right

The recommendation framework is vital tool for efficient E-commerce contacts between customers and retailers. Efficient and friendly contacts to find the right product have a huge effect on the sales results. In the basis of a technical approach, four of the program model guidelines are: collective filtering, content-based and demographic filtering. Collaborative filtering is considered superior to other methods in the list. Of necessity, in terms of fortuity, novelty and precision, it provides advantages. The DLSARS Framework is a deep learning-based sentiment analysis for the DLSARS recommendation system that uses deep learning models for a proposed system. The dataset selected for this research is synthetic dataset which consists of huge number of reviews for every product. The proposed models display superiorities and compare the findings with other existing models. The proposed DLSARS frame with bigram approach is superior to the other domain on the E-commerce domain.

Download Full-text

Analysing Predictive Coding Algorithms for Document Review

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39076 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1679-1681

Author(s):

Aditi Wikhe

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Text Classification ◽

Unscented Kalman Filter ◽

Predictive Coding ◽

Machine Learning Techniques ◽

Learning Models ◽

Legal Domain ◽

The Right ◽

Document Review

Abstract: Lawsuits and regulatory investigations in today's legal environment demand corporations to engage in increasingly intense data-focused engagements to find, acquire, and evaluate vast amounts of data. In recent years, technology-assisted review (TAR) has become a more crucial part of the document review process in legal discovery. Attorneys now have been using machine learning techniques like text classification to identify responsive information. In the legal domain, text classification is referred to as predictive coding or technology assisted review (TAR). Predictive coding is used to increase the number of relevant documents identified, while reducing human labelling efforts and manual review of documents. Deep learning models mixed with word embeddings have demonstrated to be more effective in predictive coding in recent years. Deep learning models, on the other hand, have a lot of variables, making it difficult and time-consuming for legal professionals to choose the right settings. In this paper, we will look at a few predictive coding algorithms and discuss which one is the most efficient among them. Keywords: Technology-assisted-review, predictive coding, machine learning, text classification, deep learning, CNN , Unscented Kalman Filter, Logistic Regression, SVM

Download Full-text

Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks

International Journal of Neural Systems ◽

10.1142/s0129065720500124 ◽

2020 ◽

Vol 30 (07) ◽

pp. 2050012

Author(s):

Matthew Leming ◽

Juan Manuel Górriz ◽

John Suckling

Keyword(s):

Deep Learning ◽

Autism Spectrum ◽

Black Box ◽

Typically Developing ◽

Learning Models ◽

Cross Sectional ◽

Functional Connections ◽

Independent Variable ◽

The Right ◽

And Task

Deep learning models for MRI classification face two recurring problems: they are typically limited by low sample size, and are abstracted by their own complexity (the “black box problem”). In this paper, we train a convolutional neural network (CNN) with the largest multi-source, functional MRI (fMRI) connectomic dataset ever compiled, consisting of 43,858 datapoints. We apply this model to a cross-sectional comparison of autism spectrum disorder (ASD) versus typically developing (TD) controls that has proved difficult to characterize with inferential statistics. To contextualize these findings, we additionally perform classifications of gender and task versus rest. Employing class-balancing to build a training set, we trained [Formula: see text] modified CNNs in an ensemble model to classify fMRI connectivity matrices with overall AUROCs of 0.6774, 0.7680, and 0.9222 for ASD versus TD, gender, and task versus rest, respectively. Additionally, we aim to address the black box problem in this context using two visualization methods. First, class activation maps show which functional connections of the brain our models focus on when performing classification. Second, by analyzing maximal activations of the hidden layers, we were also able to explore how the model organizes a large and mixed-center dataset, finding that it dedicates specific areas of its hidden layers to processing different covariates of data (depending on the independent variable analyzed), and other areas to mix data from different sources. Our study finds that deep learning models that distinguish ASD from TD controls focus broadly on temporal and cerebellar connections, with a particularly high focus on the right caudate nucleus and paracentral sulcus.

Download Full-text

Sentiment Analysis of Film Reviews Based on BI-GRU +Attention+Capsule Fusion

10.36227/techrxiv.14863401 ◽

2021 ◽

Author(s):

zhifei hu

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Film Review ◽

The Other ◽

Learning Models ◽

Analysis Model ◽

Fusion Model ◽

Data Set ◽

Analysis Task ◽

Film Reviews

Download Full-text

Using deep learning models for learning semantic text similarity of Arabic questions

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i4.pp3519-3528 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3519

Author(s):

Mahmoud Hammad ◽

Mohammed Al-Smadi ◽

Qanita Bani Baker ◽

Sa’ad A. Al-Zboon

Keyword(s):

Deep Learning ◽

Question Answering ◽

Supervised Machine Learning ◽

Learning Models ◽

Text Similarity ◽

Baseline Model ◽

Life Problems ◽

Machine Learning Model ◽

Recurrent Architecture ◽

The Right

Question-answering platforms serve millions of users seeking knowledge and solutions for their daily life problems. However, many knowledge seekers are facing the challenge to find the right answer among similar answered questions and writer’s responding to asked questions feel like they need to repeat answers many times for similar questions. This research aims at tackling the problem of learning the semantic text similarity among different asked questions by using deep learning. Three models are implemented to address the aforementioned problem: i) a supervised-machine learning model using XGBoost trained with pre-defined features, ii) an adapted Siamese-based deep learning recurrent architecture trained with pre-defined features, and iii) a Pre-trained deep bidirectional transformer based on BERT model. Proposed models were evaluated using a reference Arabic dataset from the mawdoo3.com company. Evaluation results show that the BERT-based model outperforms the other two models with an F1=92.99%, whereas the Siamese-based model comes in the second place with F1=89.048%, and finally, the XGBoost as a baseline model achieved the lowest result of F1=86.086%.

Download Full-text

Development Of A Machine Learning Model Using Electrocardiogram Signals To Improve Acute Pulmonary Embolism Screening

European Heart Journal - Digital Health ◽

10.1093/ehjdh/ztab101 ◽

2021 ◽

Author(s):

Sulaiman S Somani ◽

Hossein Honarvar ◽

Sukrit Narula ◽

Isotta Landi ◽

Shawn Lee ◽

...

Keyword(s):

Machine Learning ◽

Pulmonary Embolism ◽

Deep Learning ◽

Clinical Data ◽

Scoring Systems ◽

Sensitivity Analyses ◽

Learning Models ◽

Fusion Model ◽

Waveform Data ◽

Pulmonary Angiogram

Abstract Aims Clinical scoring systems for pulmonary embolism (PE) screening have low specificity and contribute to CT pulmonary angiogram (CTPA) overuse. We assessed whether deep learning models using an existing and routinely collected data modality, electrocardiogram (ECG) waveforms, can increase specificity for PE detection. Methods and Results We create a retrospective cohort of 21,183 patients at moderate- to high-suspicion of PE and associate 23,793 CTPAs (10.0% PE-positive) with 320,746 ECGs and encounter-level clinical data (demographics, comorbidities, vital signs, and labs). We develop three machine learning models to predict PE likelihood: an ECG model using only ECG waveform data, an EHR model using tabular clinical data, and a Fusion model integrating clinical data and an embedded representation of the ECG waveform. We find that a Fusion model (area under receiver-operating characteristic [AUROC] 0.81 ± 0.01) outperforms both the ECG model (AUROC 0.59 ± 0.01) and EHR model (AUROC 0.65 ± 0.01). On a sample of 100 patients from the test set, the Fusion model also achieves greater specificity (0.18) and performance (AUROC 0.84 ± 0.01) than four commonly evaluated clinical scores: Wells' Criteria, Revised Geneva Score, Pulmonary Embolism Rule-Out Criteria, and 4-Level Pulmonary Embolism Clinical Probability Score (AUROC 0.50-0.58, specificity 0.00-0.05). The model is superior to these scores on feature sensitivity analyses (AUROC 0.66 to 0.84) and achieves comparable performance across sex (AUROC 0.81) and racial/ethnic (AUROC 0.77 to 0.84) subgroups. Conclusion Synergistic deep learning of electrocardiogram waveforms with traditional clinical variables can increase the specificity of PE detection in patients at least at moderate suspicion for PE.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as attentional gain – an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.

Download Full-text

Improving the Accuracy of Protein-Ligand Binding Affinity Prediction by Deep Learning Models: Benchmark and Model

10.26434/chemrxiv.9866912 ◽

2019 ◽

Author(s):

Mohammad Rezaei ◽

Yanjun Li ◽

Xiaolin Li ◽

Chenglong Li

Keyword(s):

Deep Learning ◽

Drug Design ◽

Binding Affinity ◽

Benchmark Dataset ◽

Rational Drug Design ◽

Learning Models ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Rational Drug

Introduction: The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results. Objectives: The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters. Methods: The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models. Results: The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model. Conclusions: The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text