Upon Improving the Performance of Localized Healthcare Virtual Assistants

Virtual assistants are becoming popular in a variety of domains, responsible for automating repetitive tasks or allowing users to seamlessly access useful information. With the advances in Machine Learning and Natural Language Processing, there has been an increasing interest in applying such assistants in new areas and with new capabilities. In particular, their application in e-healthcare is becoming attractive and is driven by the need to access medically-related knowledge, as well as providing first-level assistance in an efficient manner. In such types of virtual assistants, localization is of utmost importance, since the general population (especially the aging population) is not familiar with the needed “healthcare vocabulary” to communicate facts properly; and state-of-practice proves relatively poor in performance when it comes to specialized virtual assistants for less frequently spoken languages. In this context, we present a Greek ML-based virtual assistant specifically designed to address some commonly occurring tasks in the healthcare domain, such as doctor’s appointments or distress (panic situations) management. We build on top of an existing open-source framework, discuss the necessary modifications needed to address the language-specific characteristics and evaluate various combinations of word embeddings and machine learning models to enhance the assistant’s behaviour. Results show that we are able to build an efficient Greek-speaking virtual assistant to support e-healthcare, while the NLP pipeline proposed can be applied in other (less frequently spoken) languages, without loss of generality.

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study (Preprint)

10.2196/preprints.17819 ◽

2020 ◽

Author(s):

Christopher A Hane ◽

Vijay S Nori ◽

William H Crown ◽

Darshak M Sanghavi ◽

Paul Bleicher

Keyword(s):

Machine Learning ◽

Language Processing ◽

Disease Onset ◽

Area Under The Curve ◽

Learning Models ◽

Term Care ◽

Clinical Notes ◽

Patients At Risk ◽

Hospital Systems ◽

Machine Learning Models

BACKGROUND Clinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized source of information in machine learning models because of the cost of collection and complexity of analysis. OBJECTIVE This study aimed to investigate the use of deidentified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of the risk of developing ADRD. METHODS We used 2 years of data to predict the future outcome of ADRD onset. Clinical notes are provided in a deidentified format with specific terms and sentiments. Terms in clinical notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians. RESULTS When using clinical notes, the area under the curve (AUC) improved from 0.85 to 0.94, and positive predictive value (PPV) increased from 45.07% (25,245/56,018) to 68.32% (14,153/20,717) in the model at disease onset. Models with clinical notes improved in both AUC and PPV in years 3-6 when notes’ volume was largest; results are mixed in years 7 and 8 with the smallest cohorts. CONCLUSIONS Although clinical notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians undercode diagnoses of ADRD. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using postprocessing techniques to aid model accuracy.

Download Full-text

AUTOMATIC KEYWORD EXTRACTION USING ARTIFICIAL NEURAL NETWORK AND FEATURE EXTRACTION

Journal of Military Science and Technology ◽

10.54939/1859-1043.j.mst.69a.2020.63-74 ◽

2020 ◽

pp. 63-74

Author(s):

Son

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Language Processing ◽

Extraction Methods ◽

Keyword Extraction ◽

Learning Models ◽

New Approach ◽

Word Level ◽

Machine Learning Models

Extracting keywords from documents is an essential task in natural language processing. A challenge of this task is to define a reasonable set of keywords from which we can find all relevant documents. This paper proposes a new approach that exploits word-level handcrafted features and machine learning models to select a single document's most important keywords. To evaluate the proposed solution, we compare our results with the latest supervised and unsupervised automatic keyword extraction methods. Experiment results show that our model achieves the best results on the 9/20 data corpus. It points out that our proposed approach is promising.

Download Full-text

Estimating Nonfatal Gunshot Injury Locations With Natural Language Processing and Machine Learning Models

JAMA Network Open ◽

10.1001/jamanetworkopen.2020.20664 ◽

2020 ◽

Vol 3 (10) ◽

pp. e2020664

Author(s):

Susan T. Parker

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Gunshot Injury ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Permutation-based identification of important biomarkers for complex diseases via machine learning models

Nature Communications ◽

10.1038/s41467-021-22756-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Xinlei Mi ◽

Baiming Zou ◽

Fei Zou ◽

Jianhua Hu

Keyword(s):

Machine Learning ◽

Human Disease ◽

Molecular Mechanisms ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Individual Feature ◽

Learning Models ◽

Efficient Manner ◽

Feature Importance ◽

Machine Learning Models

AbstractStudy of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Many machine learning-based methods have been developed and widely used to alleviate some analytic challenges in complex human disease studies. While enjoying the modeling flexibility and robustness, these model frameworks suffer from non-transparency and difficulty in interpreting each individual feature due to their sophisticated algorithms. However, identifying important biomarkers is a critical pursuit towards assisting researchers to establish novel hypotheses regarding prevention, diagnosis and treatment of complex human diseases. Herein, we propose a Permutation-based Feature Importance Test (PermFIT) for estimating and testing the feature importance, and for assisting interpretation of individual feature in complex frameworks, including deep neural networks, random forests, and support vector machines. PermFIT (available at https://github.com/SkadiEye/deepTL) is implemented in a computationally efficient manner, without model refitting. We conduct extensive numerical studies under various scenarios, and show that PermFIT not only yields valid statistical inference, but also improves the prediction accuracy of machine learning models. With the application to the Cancer Genome Atlas kidney tumor data and the HITChip atlas data, PermFIT demonstrates its practical usage in identifying important biomarkers and boosting model prediction performance.

Download Full-text

An open-source framework for fast-yet-accurate calculation of quantum mechanical features

10.26434/chemrxiv-2021-8gthw ◽

2021 ◽

Author(s):

Eike Caldeweyher ◽

Christoph Bauer ◽

Ali Soltani Tehrani

Keyword(s):

Machine Learning ◽

Open Source ◽

Predictive Power ◽

Medium Size ◽

Quantum Mechanical ◽

Learning Models ◽

Molecular Fingerprints ◽

Open Source Framework ◽

Molecular Polarizabilities ◽

Machine Learning Models

We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities, the predictive power of the presented method competes against second-order perturbation theory in a converged atomic-orbital basis set at a fraction of its computational costs. Robustness tests within a diverse validation set of more than 80,000 molecules show that the calculation of isotropic molecular polarizabilities has a low failure-rate of only 0.3 %. We present furthermore a generally applicable van der Waals radius model that is rooted on atomic static polarizabilites. Efficiency tests show that such radii can even be calculated for small- to medium-size proteins where the largest system (SARS-CoV-2 spike protein) has 42,539 atoms. Following the work of Domingo-Alemenara et al. [Domingo-Alemenara et al., Nat. Comm., 2019, 10, 5811], we present computational predictions for retention times for different chromatographic methods and describe how physicochemical features improve the predictive power of machine-learning models that otherwise only rely on two-dimensional features like molecular fingerprints. Additionally, we developed an internal benchmark set of experimental super-critical fluid chromatography retention times. For those methods, improvements of up to 17 % are obtained when combining molecular fingerprints with physicochemical descriptors. Shapley additive explanation values show furthermore that the physical nature of the applied features can be retained within the final machine-learning models. We generally recommend the kallisto framework as a robust, low-cost, and physically motivated featurizer for upcoming state-of-the-art machine-learning studies.

Download Full-text

Standardization of Featureless Variables for Machine Learning Models Using Natural Language Processing

Lecture Notes in Computer Science - Computational Science – ICCS 2018 ◽

10.1007/978-3-319-93701-4_18 ◽

2018 ◽

pp. 234-246

Author(s):

Kourosh Modarresi ◽

Abdurrahman Munir

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Twitter Data Sentimental Analysis Using Multiple Classifications

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9319 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3776-3781

Author(s):

M. Adimoolam ◽

Raghav Sharma ◽

A. John ◽

M. Suresh Kumar ◽

K. Ashok Kumar

Keyword(s):

Machine Learning ◽

Language Processing ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Human Beings ◽

Learning Models ◽

The Past ◽

Twitter Data ◽

Learning Techniques ◽

Machine Learning Models

In the past few decades human beings have knowledgeable tremendous intensification in the interaction in particular micro blogging websites and various social media as online resources. Many kinds of data have been used and classification data to group and store are challenging in this real world scenario. Various machine and Natural Language Processing (NLP) were being applied to analysis the sentiment. A major concentration of this work was on using several machine learning algorithms to perform sentimental analysis and comparing various machine learning models for the sentiment classification. This work analysed various sentimental using multiple classifications. From the evaluation of this experiment, it can be concluded that NLP and machine learning Techniques are efficient for sentimental analysis.

Download Full-text

Neural Activity Classification with Machine Learning Models Trained on Interspike Interval Time-Series Data

10.1101/2021.03.24.436765 ◽

2021 ◽

Author(s):

Ivan Lazarevich ◽

Ilya Prokin ◽

Boris Gutkin ◽

Victor Kazantsev

Keyword(s):

Machine Learning ◽

Time Series ◽

Language Processing ◽

Neural Activity ◽

Time Series Data ◽

Series Data ◽

Neural Decoding ◽

Learning Models ◽

Wide Range ◽

Machine Learning Models

Modern well-performing approaches to neural decoding are based on machine learning models such as decision tree ensembles and deep neural networks. The wide range of algorithms that can be utilized to learn from neural spike trains, which are essentially time-series data, results in the need for diverse and challenging benchmarks for neural decoding, similar to the ones in the fields of computer vision and natural language processing. In this work, we propose a spike train classification benchmark, based on open-access neural activity datasets and consisting of several learning tasks such as stimulus type classification, animal’s behavioral state prediction and neuron type identification. We demonstrate that an approach based on hand-crafted time-series feature engineering establishes a strong baseline performing on par with state-of-the-art deep learning based models for neural decoding. We release the code allowing to reproduce the reported results 1.

Download Full-text