Natural Language Processing Based Instrument for Classification of Free Text Medical Records

According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) andK-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful.

Download Full-text

The use of a Domain Ontology for the Management of Essential Hypertension (Preprint)

10.2196/preprints.25427 ◽

2020 ◽

Author(s):

Emma Chavez ◽

Vanessa Perez ◽

Angélica Urrutia

Keyword(s):

Decision Making ◽

Essential Hypertension ◽

Medical History ◽

Medical Records ◽

Medical Training ◽

Relevant Information ◽

Domain Ontology ◽

Free Text ◽

Snomed Ct ◽

History Of

BACKGROUND : Currently, hypertension is one of the diseases with greater risk of mortality in the world. Particularly in Chile, 90% of the population with this disease has idiopathic or essential hypertension. Essential hypertension is characterized by high blood pressure rates and it´s cause is unknown, which means that every patient might requires a different treatment, depending on their history and symptoms. Different data, such as history, symptoms, exams, etc., are generated for each patient suffering from the disease. This data is presented in the patient’s medical record, in no order, making it difficult to search for relevant information. Therefore, there is a need for a common, unified vocabulary of the terms that adequately represent the diseased, making searching within the domain more effective. OBJECTIVE The objective of this study is to develop a domain ontology for essential hypertension , therefore arranging the more significant data within the domain as tool for medical training or to support physicians’ decision making will be provided. METHODS The terms used for the ontology were extracted from the medical history of de-identified medical records, of patients with essential hypertension. The Snomed-CT’ collection of medical terms, and clinical guidelines to control the disease were also used. Methontology was used for the design, classes definition and their hierarchy, as well as relationships between concepts and instances. Three criteria were used to validate the ontology, which also helped to measure its quality. Tests were run with a dataset to verify that the tool was created according to the requirements. RESULTS An ontology of 310 instances classified into 37 classes was developed. From these, 4 super classes and 30 relationships were obtained. In the dataset tests, 100% correct and coherent answers were obtained for quality tests (3). CONCLUSIONS The development of this ontology provides a tool for physicians, specialists, and students, among others, that can be incorporated into clinical systems to support decision making regarding essential hypertension. Nevertheless, more instances should be incorporated into the ontology by carrying out further searched in the medical history or free text sections of the medical records of patients with this disease.

Download Full-text

Automatic Extraction and Classification of Patients’ Smoking Status from Free Text Using Natural Language Processing

Value in Health ◽

10.1016/j.jval.2016.09.158 ◽

2016 ◽

Vol 19 (7) ◽

pp. A373

Author(s):

A Caccamisi ◽

L Jörgensen ◽

H Dalianis ◽

M Rosenlund

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Free Text ◽

Automatic Extraction

Download Full-text

Classification of exhaled air IR spectra using combination support vector machine, decision tree, and k-nearest neighbor

Fourth International Conference on Terahertz and Microwave Radiation: Generation, Detection, and Applications ◽

10.1117/12.2581563 ◽

2020 ◽

Author(s):

Viktor V. Nikolaev ◽

Dmitry D. Kuzmin ◽

Viacheslav V. Zasedatel

Keyword(s):

Support Vector Machine ◽

Decision Tree ◽

Ir Spectra ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Exhaled Air

Download Full-text

Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records

Upsala Journal of Medical Sciences ◽

10.1080/03009734.2020.1792010 ◽

2020 ◽

Vol 125 (4) ◽

pp. 316-324

Author(s):

Andrea Caccamisi ◽

Leif Jørgensen ◽

Hercules Dalianis ◽

Mats Rosenlund

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Smoking Status ◽

Automatic Extraction

Download Full-text

Expert guided natural language processing using one-class classification

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv010 ◽

2015 ◽

Vol 22 (5) ◽

pp. 962-966 ◽

Cited By ~ 5

Author(s):

Erel Joffe ◽

Emily J Pettigrew ◽

Jorge R Herskovic ◽

Charles F Bearden ◽

Elmer V Bernstam

Keyword(s):

Breast Cancer ◽

Language Processing ◽

Text Processing ◽

Binary Classification ◽

Model Performance ◽

Imbalanced Data ◽

Superior Performance ◽

Support Vector ◽

Free Text ◽

One Class Classification

Abstract Introduction Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing. Objectives To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text. Methods The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10 000 records with a breast cancer prevalence of 1.4%. Results When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F = 0.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F = 0.61 vs. F = 0.17 for the best performing model) attributable mainly to improved precision (p = .88 vs. p = .09 for the best performing model). Conclusions 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.

Download Full-text

Efficient Reuse of Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: A Phenotype Embedding Approach (Preprint)

10.2196/preprints.14782 ◽

2019 ◽

Author(s):

Honghan Wu ◽

Karen Hodgson ◽

Sue Dyson ◽

Katherine I Morley ◽

Zina M Ibrahim ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Medical Records ◽

Free Text ◽

Model Adaptation ◽

Model Reuse ◽

Language Patterns ◽

Patient Profiles ◽

Health Registry

BACKGROUND Much effort has been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records in order to construct comprehensive patient profiles for delivering better health care. Reusing NLP models in new settings, however, remains cumbersome, as it requires validation and retraining on new data iteratively to achieve convergent results. OBJECTIVE The aim of this work is to minimize the effort involved in reusing NLP models on free-text medical records. METHODS We formally define and analyze the model adaptation problem in phenotype-mention identification tasks. We identify “duplicate waste” and “imbalance waste,” which collectively impede efficient model reuse. We propose a phenotype embedding–based approach to minimize these sources of waste without the need for labelled data from new settings. RESULTS We conduct experiments on data from a large mental health registry to reuse NLP models in four phenotype-mention identification tasks. The proposed approach can choose the best model for a new task, identifying up to 76% waste (duplicate waste), that is, phenotype mentions without the need for validation and model retraining and with very good performance (93%-97% accuracy). It can also provide guidance for validating and retraining the selected model for novel language patterns in new tasks, saving around 80% waste (imbalance waste), that is, the effort required in “blind” model-adaptation approaches. CONCLUSIONS Adapting pretrained NLP models for new tasks can be more efficient and effective if the language pattern landscapes of old settings and new settings can be made explicit and comparable. Our experiments show that the phenotype-mention embedding approach is an effective way to model language patterns for phenotype-mention identification tasks and that its use can guide efficient NLP model reuse.

Download Full-text

The Effects of Missing Data Characteristics on the Choice of Imputation Techniques

Vietnam Journal of Computer Science ◽

10.1142/s2196888820500098 ◽

2020 ◽

Vol 07 (02) ◽

pp. 161-177

Author(s):

Oyekale Abel Alade ◽

Ali Selamat ◽

Roselina Sallehuddin

Keyword(s):

Missing Data ◽

Missing Values ◽

Health Management ◽

Support Vector ◽

Multiple Imputations ◽

Original Dataset ◽

Learning Machine ◽

Elm Classifier ◽

The Right

One major characteristic of data is completeness. Missing data is a significant problem in medical datasets. It leads to incorrect classification of patients and is dangerous to the health management of patients. Many factors lead to the missingness of values in databases in medical datasets. In this paper, we propose the need to examine the causes of missing data in a medical dataset to ensure that the right imputation method is used in solving the problem. The mechanism of missingness in datasets was studied to know the missing pattern of datasets and determine a suitable imputation technique to generate complete datasets. The pattern shows that the missingness of the dataset used in this study is not a monotone missing pattern. Also, single imputation techniques underestimate variance and ignore relationships among the variables; therefore, we used multiple imputations technique that runs in five iterations for the imputation of each missing value. The whole missing values in the dataset were 100% regenerated. The imputed datasets were validated using an extreme learning machine (ELM) classifier. The results show improvement in the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with the original dataset with different classifiers like support vector machine (SVM), radial basis function (RBF), and ELMs.

Download Full-text

Measles outbreak in Gothenburg urban area, Sweden, 2017 to 2018: low viral load in breakthrough infections

Eurosurveillance ◽

10.2807/1560-7917.es.2019.24.17.1900114 ◽

2019 ◽

Vol 24 (17) ◽

Cited By ~ 5

Author(s):

Nicklas Sundell ◽

Leif Dotevall ◽

Martina Sansone ◽

Maria Andersson ◽

Magnus Lindh ◽

...

Keyword(s):

Viral Load ◽

High Risk ◽

Medical Records ◽

Clinical Symptoms ◽

Previous History ◽

Contact Tracing ◽

Measles Outbreak ◽

History Of ◽

Measles Vaccination

In an outbreak of measles in Gothenburg, Sweden, breakthrough infections (i.e. infections in individuals with a history of vaccination) were common. The objective of this study was to compare measles RNA levels between naïve (i.e. primary) and breakthrough infections. We also propose a fast provisional classification of breakthrough infections. Medical records were reviewed and real-time PCR-positive samples genotyped. Cases were classified as naïve, breakthrough or vaccine infections. We compared clinical symptoms and measles RNA cycle threshold (Ct) values between breakthrough and naïve infections. Sixteen of 28 confirmed cases of measles in this outbreak were breakthrough infections. A fast provisional classification, based on previous history of measles vaccination and detectable levels of measles IgG in acute serum, correctly identified 14 of the 16 breakthrough infections, confirmed by IgG avidity testing. Measles viral load was significantly lower in nasopharyngeal samples from individuals with breakthrough compared with naïve infections (median Ct-values: 32 and 19, respectively, p < 0.0001). No onward transmission from breakthrough infections was identified. Our results indicate that a high risk of onward transmission is limited to naïve infections. We propose a fast provisional classification of breakthrough measles that can guide contact tracing in outbreak settings.

Download Full-text

Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish

10.52460/src.2021.004 ◽

2021 ◽

Vol 1 (1) ◽

pp. 1-12

Author(s):

Aytuğ Onan ◽

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Text Messages ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Weighting ◽

Text Documents ◽

Weighting Schemes ◽

Short Text

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.

Download Full-text

Classification of fruits and vegetables using ResNet model.

10.31220/agrirxiv.2021.00075 ◽

2021 ◽

Author(s):

P. Sukhetha ◽

N. Hemalatha ◽

Raji Sukumar

Keyword(s):

Deep Learning ◽

Fruits And Vegetables ◽

Nearest Neighbor ◽

Support Vector ◽

Agricultural Field ◽

K Nearest Neighbor ◽

New Techniques ◽

Learning Techniques ◽

A Current

Abstract Agriculture is one of the important parts of Indian economy. Agricultural field has more contribution towards growth and stability of the nation. Therefore, a current technologies and innovations can help in order to experiment new techniques and methods in the agricultural field. At Present Artificial Intelligence (AI) is one of the main, effective, and widely used technology. Especially, Deep Learning (DL) has numerous functions due to its capability to learn robust interpretations from images. Convolutional Neural Networks (CNN) is the major Deep Learning architecture for image classification. This paper is mainly focus on the deep learning techniques to classify Fruits and Vegetables, the model creation and implementation to identify Fruits and Vegetables on the fruit360 dataset. The models created are Support Vector Machine (SVM), K Nearest Neighbor (KNN), Decision Tree (DT), ResNet Pretrained Model, Convolutional Neural Network (CNN), Multilayer Perceptron (MLP). Among the different models ResNet pretrained Model performed the best with an accuracy of 95.83%.

Download Full-text