Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers (Preprint)

Mapping Intimacies ◽

10.2196/preprints.29120 ◽

2021 ◽

Author(s):

Bruna Stella Zanotto ◽

Ana Paula Beck da Silva Etges ◽

Avner dal Bosco ◽

Eduardo Gabriel Cortes ◽

Renata Ruschel ◽

...

Keyword(s):

Machine Learning ◽

Electronic Medical Records ◽

Text Classification ◽

Medical Records ◽

State Of The Art ◽

Support Vector ◽

Data Set ◽

Automatic Text Classification ◽

Baseline Characteristics ◽

Automatic Text

BACKGROUND With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. OBJECTIVE This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. METHODS Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with <i>subject-wise sampling</i>. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. RESULTS The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. CONCLUSIONS Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.

Download Full-text

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers

JMIR Medical Informatics ◽

10.2196/29120 ◽

2021 ◽

Vol 9 (11) ◽

pp. e29120

Author(s):

Bruna Stella Zanotto ◽

Ana Paula Beck da Silva Etges ◽

Avner dal Bosco ◽

Eduardo Gabriel Cortes ◽

Renata Ruschel ◽

...

Keyword(s):

Machine Learning ◽

Electronic Medical Records ◽

Text Classification ◽

Medical Records ◽

State Of The Art ◽

Support Vector ◽

Data Set ◽

Automatic Text Classification ◽

Baseline Characteristics ◽

Automatic Text

Background With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.

Download Full-text

Multi-Hazard Exposure Mapping Using Machine Learning Techniques: A Case Study from Iran

Remote Sensing ◽

10.3390/rs11161943 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1943 ◽

Cited By ~ 15

Author(s):

Omid Rahmati ◽

Saleh Yousefi ◽

Zahra Kalantari ◽

Evelyn Uuemaa ◽

Teimur Teimurian ◽

...

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Characteristic Curve ◽

Machine Learning Techniques ◽

Support Vector ◽

Mountainous Area ◽

Data Set ◽

Boosted Regression Tree ◽

Hazard Exposure ◽

Learning Techniques

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.

Download Full-text

Automatic text classification using machine learning and optimization algorithms

Soft Computing ◽

10.1007/s00500-020-05209-8 ◽

2020 ◽

Cited By ~ 1

Author(s):

R. Janani ◽

S. Vijayarani

Keyword(s):

Machine Learning ◽

Text Classification ◽

Optimization Algorithms ◽

Automatic Text Classification ◽

Automatic Text

Download Full-text

Predicting Inpatient Falls Using Natural Language Processing of Nursing Records Obtained From Japanese Electronic Medical Records: Case-Control Study

JMIR Medical Informatics ◽

10.2196/16970 ◽

2020 ◽

Vol 8 (4) ◽

pp. e16970 ◽

Cited By ~ 1

Author(s):

Hayao Nakatani ◽

Masatoshi Nakao ◽

Hidefumi Uchiyama ◽

Hiroyoshi Toyoshiba ◽

Chikayuki Ochiai

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Health Care ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Fall Risk ◽

Data Set ◽

Inpatient Falls ◽

Risk Factors For Falls

Background Falls in hospitals are the most common risk factor that affects the safety of inpatients and can result in severe harm. Therefore, preventing falls is one of the most important areas of risk management for health care organizations. However, existing methods for predicting falls are laborious and costly. Objective The objective of this study is to verify whether hospital inpatient falls can be predicted through the analysis of a single input—unstructured nursing records obtained from Japanese electronic medical records (EMRs)—using a natural language processing (NLP) algorithm and machine learning. Methods The nursing records of 335 fallers and 408 nonfallers for a 12-month period were extracted from the EMRs of an acute care hospital and randomly divided into a learning data set and test data set. The former data set was subjected to NLP and machine learning to extract morphemes that contributed to separating fallers from nonfallers to construct a model for predicting falls. Then, the latter data set was used to determine the predictive value of the model using receiver operating characteristic (ROC) analysis. Results The prediction of falls using the test data set showed high accuracy, with an area under the ROC curve, sensitivity, specificity, and odds ratio of mean 0.834 (SD 0.005), mean 0.769 (SD 0.013), mean 0.785 (SD 0.020), and mean 12.27 (SD 1.11) for five independent experiments, respectively. The morphemes incorporated into the final model included many words closely related to known risk factors for falls, such as the use of psychotropic drugs, state of consciousness, and mobility, thereby demonstrating that an NLP algorithm combined with machine learning can effectively extract risk factors for falls from nursing records. Conclusions We successfully established that falls among hospital inpatients can be predicted by analyzing nursing records using an NLP algorithm and machine learning. Therefore, it may be possible to develop a fall risk monitoring system that analyzes nursing records daily and alerts health care professionals when the fall risk of an inpatient is increased.

Download Full-text

English Article Style Recognition and Matching by Using Web Semantics

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.293751 ◽

2022 ◽

Vol 13 (2) ◽

pp. 0-0

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Traditional Method ◽

Utilization Rate ◽

Support Vector ◽

Automatic Text Classification ◽

Internet Information ◽

The Face ◽

Massive Information ◽

Automatic Text

With the explosion of internet information, people feel helpless and difficult to choose in the face of massive information. However, the traditional method to organize a huge set of original documents is not only time-consuming and laborious, but also not ideal. The automatic text classification can liberate users from the tedious document processing work, recognize and distinguish different document contents more conveniently, make a large number of complicated documents institutionalized and systematized, and greatly improve the utilization rate of information. This paper adopts termed-based model to extract the features in web semantics to represent document. The extracted web semantics features are used to learn a reduced support vector machine. The experimental results show that the proposed method can correctly identify most of the writing styles.

Download Full-text

Survey on supervised machine learning techniques for automatic text classification

Artificial Intelligence Review ◽

10.1007/s10462-018-09677-1 ◽

2019 ◽

Vol 52 (1) ◽

pp. 273-292 ◽

Cited By ~ 18

Author(s):

Ammar Ismael Kadhim

Keyword(s):

Machine Learning ◽

Text Classification ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Automatic Text Classification ◽

Automatic Text

Download Full-text

Support Vector Machines and Kernel Functions for Text Processing

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.39702 ◽

2013 ◽

Vol 20 (3) ◽

pp. 130 ◽

Cited By ~ 2

Author(s):

Celso Antonio Alves Kaestner

Keyword(s):

Text Classification ◽

Learning Algorithm ◽

Text Processing ◽

Dimensional Space ◽

Kernel Functions ◽

Support Vector ◽

Svm Classifier ◽

Vector Machines ◽

Automatic Text Classification ◽

Automatic Text

This work presents kernel functions that can be used in conjunction with the Support Vector Machine – SVM – learning algorithm to solve the automatic text classification task. Initially the Vector Space Model for text processing is presented. According to this model text is seen as a set of vectors in a high dimensional space; then extensions and alternative models are derived, and some preprocessing procedures are discussed. The SVM learning algorithm, largely employed for text classification, is outlined: its decision procedure is obtained as a solution of an optimization problem. The “kernel trick”, that allows the algorithm to be applied in non-linearly separable cases, is presented, as well as some kernel functions that are currently used in text applications. Finally some text classification experiments employing the SVM classifier are conducted, in order to illustrate some text preprocessing techniques and the presented kernel functions.

Download Full-text

Automatic Text Classification Based on Hidden Markov Model and Support Vector Machine

Proceedings of The Eighth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), 2013 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-642-37502-6_27 ◽

2013 ◽

pp. 217-224 ◽

Cited By ~ 2

Author(s):

Li Wang ◽

Li Li

Keyword(s):

Support Vector Machine ◽

Markov Model ◽

Hidden Markov Model ◽

Text Classification ◽

Hidden Markov ◽

Support Vector ◽

Automatic Text Classification ◽

Automatic Text

Download Full-text

Predicting Inpatient Falls Using Natural Language Processing of Nursing Records Obtained From Japanese Electronic Medical Records: Case-Control Study (Preprint)

10.2196/preprints.16970 ◽

2019 ◽

Author(s):

Hayao Nakatani ◽

Masatoshi Nakao ◽

Hidefumi Uchiyama ◽

Hiroyoshi Toyoshiba ◽

Chikayuki Ochiai

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Health Care ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Fall Risk ◽

Data Set ◽

Inpatient Falls ◽

Risk Factors For Falls

BACKGROUND Falls in hospitals are the most common risk factor that affects the safety of inpatients and can result in severe harm. Therefore, preventing falls is one of the most important areas of risk management for health care organizations. However, existing methods for predicting falls are laborious and costly. OBJECTIVE The objective of this study is to verify whether hospital inpatient falls can be predicted through the analysis of a single input—unstructured nursing records obtained from Japanese electronic medical records (EMRs)—using a natural language processing (NLP) algorithm and machine learning. METHODS The nursing records of 335 fallers and 408 nonfallers for a 12-month period were extracted from the EMRs of an acute care hospital and randomly divided into a learning data set and test data set. The former data set was subjected to NLP and machine learning to extract morphemes that contributed to separating fallers from nonfallers to construct a model for predicting falls. Then, the latter data set was used to determine the predictive value of the model using receiver operating characteristic (ROC) analysis. RESULTS The prediction of falls using the test data set showed high accuracy, with an area under the ROC curve, sensitivity, specificity, and odds ratio of mean 0.834 (SD 0.005), mean 0.769 (SD 0.013), mean 0.785 (SD 0.020), and mean 12.27 (SD 1.11) for five independent experiments, respectively. The morphemes incorporated into the final model included many words closely related to known risk factors for falls, such as the use of psychotropic drugs, state of consciousness, and mobility, thereby demonstrating that an NLP algorithm combined with machine learning can effectively extract risk factors for falls from nursing records. CONCLUSIONS We successfully established that falls among hospital inpatients can be predicted by analyzing nursing records using an NLP algorithm and machine learning. Therefore, it may be possible to develop a fall risk monitoring system that analyzes nursing records daily and alerts health care professionals when the fall risk of an inpatient is increased.

Download Full-text

Automatic Marathi Text Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7023.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2446-2454

Keyword(s):

Information Retrieval ◽

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Indian Languages ◽

K Nearest Neighbor ◽

Internet Users ◽

Automatic Text Classification ◽

Automatic Text

Multifold growth of internet users due to penetration of Information and Communication technology has resulted in huge soft content on the internet. Though most of it is available in English language, other languages including Indian languages are also catching up the race rapidly. Due to exponential growth in Internet users in India common man is also posting moderate size data on the web. Due to which e-content in Indian languages is growing in size. This high dimensionality of e-content is a curse for Information Retrieval. Hence automatic text classification and structuring of this e-content has become the need of the day. Automatic text classification is the process of assigning a category or categories to a new test document from one or more predefined categories according to the contents of that document. Text classification works for 14 Indian languages are reported in the literature. Marathi language is one of the officially recognized languages of Indian union. Little work has been done for Marathi text classification. This paper investigates Marathi text classification using popular Machine Learning methods such as Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Centroid Based and Modified KNN (MKNN) on manually extracted newspaper data from sport’s domain. Our experimental results show that Naïve Bayes and Centroid Based give best performance with 99.166% Micro and Macro Average of F-score and Modified KNN gives lowest performance with 97.16% Micro Average of F-Score and 96.997% Macro Average of F-score. The proposed work will be helpful for proper organization of Marathi text document and many applications of Marathi Information Retrieval.

Download Full-text