JMIR Medical Informatics
Latest Publications


TOTAL DOCUMENTS

978
(FIVE YEARS 736)

H-INDEX

26
(FIVE YEARS 11)

Published By Jmir Publications Inc.

2291-9694

10.2196/25157 ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. e25157
Author(s):  
Zhen Yang ◽  
Chloé Pou-Prom ◽  
Ashley Jones ◽  
Michaelia Banning ◽  
David Dai ◽  
...  

Background The Expanded Disability Status Scale (EDSS) score is a widely used measure to monitor disability progression in people with multiple sclerosis (MS). However, extracting and deriving the EDSS score from unstructured electronic health records can be time-consuming. Objective We aimed to compare rule-based and deep learning natural language processing algorithms for detecting and predicting the total EDSS score and EDSS functional system subscores from the electronic health records of patients with MS. Methods We studied 17,452 electronic health records of 4906 MS patients followed at one of Canada’s largest MS clinics between June 2015 and July 2019. We randomly divided the records into training (80%) and test (20%) data sets, and compared the performance characteristics of 3 natural language processing models. First, we applied a rule-based approach, extracting the EDSS score from sentences containing the keyword “EDSS.” Next, we trained a convolutional neural network (CNN) model to predict the 19 half-step increments of the EDSS score. Finally, we used a combined rule-based–CNN model. For each approach, we determined the accuracy, precision, recall, and F-score compared with the reference standard, which was manually labeled EDSS scores in the clinic database. Results Overall, the combined keyword-CNN model demonstrated the best performance, with accuracy, precision, recall, and an F-score of 0.90, 0.83, 0.83, and 0.83 respectively. Respective figures for the rule-based and CNN models individually were 0.57, 0.91, 0.65, and 0.70, and 0.86, 0.70, 0.70, and 0.70. Because of missing data, the model performance for EDSS subscores was lower than that for the total EDSS score. Performance improved when considering notes with known values of the EDSS subscores. Conclusions A combined keyword-CNN natural language processing model can extract and accurately predict EDSS scores from patient records. This approach can be automated for efficient information extraction in clinical and research settings.


10.2196/32724 ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. e32724
Author(s):  
Moritz Kraus ◽  
Maximilian Michael Saller ◽  
Sebastian Felix Baumbach ◽  
Carl Neuerburg ◽  
Ulla Cordula Stumpf ◽  
...  

Background Assessment of the physical frailty of older patients is of great importance in many medical disciplines to be able to implement individualized therapies. For physical tests, time is usually used as the only objective measure. To record other objective factors, modern wearables offer great potential for generating valid data and integrating the data into medical decision-making. Objective The aim of this study was to compare the predictive value of insole data, which were collected during the Timed-Up-and-Go (TUG) test, to the benchmark standard questionnaire for sarcopenia (SARC-F: strength, assistance with walking, rising from a chair, climbing stairs, and falls) and physical assessment (TUG test) for evaluating physical frailty, defined by the Short Physical Performance Battery (SPPB), using machine learning algorithms. Methods This cross-sectional study included patients aged >60 years with independent ambulation and no mental or neurological impairment. A comprehensive set of parameters associated with physical frailty were assessed, including body composition, questionnaires (European Quality of Life 5-dimension [EQ 5D 5L], SARC-F), and physical performance tests (SPPB, TUG), along with digital sensor insole gait parameters collected during the TUG test. Physical frailty was defined as an SPPB score≤8. Advanced statistics, including random forest (RF) feature selection and machine learning algorithms (K-nearest neighbor [KNN] and RF) were used to compare the diagnostic value of these parameters to identify patients with physical frailty. Results Classified by the SPPB, 23 of the 57 eligible patients were defined as having physical frailty. Several gait parameters were significantly different between the two groups (with and without physical frailty). The area under the receiver operating characteristic curve (AUROC) of the TUG test was superior to that of the SARC-F (0.862 vs 0.639). The recursive feature elimination algorithm identified 9 parameters, 8 of which were digital insole gait parameters. Both the KNN and RF algorithms trained with these parameters resulted in excellent results (AUROC of 0.801 and 0.919, respectively). Conclusions A gait analysis based on machine learning algorithms using sensor soles is superior to the SARC-F and the TUG test to identify physical frailty in orthogeriatric patients.


10.2196/27386 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e27386
Author(s):  
Qingyu Chen ◽  
Alex Rankine ◽  
Yifan Peng ◽  
Elaheh Aghaarabi ◽  
Zhiyong Lu

Background Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank. Objective Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications. Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures. Results Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications. Conclusions Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.


10.2196/28632 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e28632
Author(s):  
Daphne Chopard ◽  
Matthias S Treder ◽  
Padraig Corcoran ◽  
Nagheen Ahmed ◽  
Claire Johnson ◽  
...  

Background Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. Objective This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. Methods We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases–10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. Results The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. Conclusions These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.


10.2196/29212 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e29212
Author(s):  
Dohyun Park ◽  
Soo Jin Cho ◽  
Kyunga Kim ◽  
Hyunki Woo ◽  
Jee Eun Kim ◽  
...  

Background Pulse transit time and pulse wave velocity (PWV) are related to blood pressure (BP), and there were continuous attempts to use these to predict BP through wearable devices. However, previous studies were conducted on a small scale and could not confirm the relative importance of each variable in predicting BP. Objective This study aims to predict systolic blood pressure and diastolic blood pressure based on PWV and to evaluate the relative importance of each clinical variable used in BP prediction models. Methods This study was conducted on 1362 healthy men older than 18 years who visited the Samsung Medical Center. The systolic blood pressure and diastolic blood pressure were estimated using the multiple linear regression method. Models were divided into two groups based on age: younger than 60 years and 60 years or older; 200 seeds were repeated in consideration of partition bias. Mean of error, absolute error, and root mean square error were used as performance metrics. Results The model divided into two age groups (younger than 60 years and 60 years and older) performed better than the model without division. The performance difference between the model using only three variables (PWV, BMI, age) and the model using 17 variables was not significant. Our final model using PWV, BMI, and age met the criteria presented by the American Association for the Advancement of Medical Instrumentation. The prediction errors were within the range of about 9 to 12 mmHg that can occur with a gold standard mercury sphygmomanometer. Conclusions Dividing age based on the age of 60 years showed better BP prediction performance, and it could show good performance even if only PWV, BMI, and age variables were included. Our final model with the minimal number of variables (PWB, BMI, age) would be efficient and feasible for predicting BP.


10.2196/25022 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e25022
Author(s):  
Janmajay Singh ◽  
Masahiro Sato ◽  
Tomoko Ohkuma

Background Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient’s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. Objective The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. Methods A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. Results Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. Conclusions This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.


10.2196/27072 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e27072
Author(s):  
Frederick North ◽  
Elissa M Nelson ◽  
Rebecca J Buss ◽  
Rebecca J Majerus ◽  
Matthew C Thompson ◽  
...  

Background Screening mammography is recommended for the early detection of breast cancer. The processes for ordering screening mammography often rely on a health care provider order and a scheduler to arrange the time and location of breast imaging. Self-scheduling after automated ordering of screening mammograms may offer a more efficient and convenient way to schedule screening mammograms. Objective The aim of this study was to determine the use, outcomes, and efficiency of an automated mammogram ordering and invitation process paired with self-scheduling. Methods We examined appointment data from 12 months of scheduled mammogram appointments, starting in September 2019 when a web and mobile app self-scheduling process for screening mammograms was made available for the Mayo Clinic primary care practice. Patients registered to the Mayo Clinic Patient Online Services could view the schedules and book their mammogram appointment via the web or a mobile app. Self-scheduling required no telephone calls or staff appointment schedulers. We examined uptake (count and percentage of patients utilizing self-scheduling), number of appointment actions taken by self-schedulers and by those using staff schedulers, no-show outcomes, scheduling efficiency, and weekend and after-hours use of self-scheduling. Results For patients who were registered to patient online services and had screening mammogram appointment activity, 15.3% (14,387/93,901) used the web or mobile app to do either some mammogram self-scheduling or self-cancelling appointment actions. Approximately 24.4% (3285/13,454) of self-scheduling occurred after normal business hours/on weekends. Approximately 9.3% (8736/93,901) of the patients used self-scheduling/cancelling exclusively. For self-scheduled mammograms, there were 5.7% (536/9433) no-shows compared to 4.6% (3590/77,531) no-shows in staff-scheduled mammograms (unadjusted odds ratio 1.24, 95% CI 1.13-1.36; P<.001). The odds ratio of no-shows for self-scheduled mammograms to staff-scheduled mammograms decreased to 1.12 (95% CI 1.02-1.23; P=.02) when adjusted for age, race, and ethnicity. On average, since there were only 0.197 staff-scheduler actions for each finalized self-scheduled appointment, staff schedulers were rarely used to redo or “clean up” self-scheduled appointments. Exclusively self-scheduled appointments were significantly more efficient than staff-scheduled appointments. Self-schedulers experienced a single appointment step process (one and done) for 93.5% (7553/8079) of their finalized appointments; only 74.5% (52,804/70,839) of staff-scheduled finalized appointments had a similar one-step appointment process (P<.001). For staff-scheduled appointments, 25.5% (18,035/70,839) of the finalized appointments took multiple appointment steps. For finalized appointments that were exclusively self-scheduled, only 6.5% (526/8079) took multiple appointment steps. The staff-scheduled to self-scheduled odds ratio of taking multiple steps for a finalized screening mammogram appointment was 4.9 (95% CI 4.48-5.37; P<.001). Conclusions Screening mammograms can be efficiently self-scheduled but may be associated with a slight increase in no-shows. Self-scheduling can decrease staff scheduler work and can be convenient for patients who want to manage their appointment scheduling activity after business hours or on weekends.


10.2196/29768 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e29768
Author(s):  
Vishal Dey ◽  
Peter Krasniak ◽  
Minh Nguyen ◽  
Clara Lee ◽  
Xia Ning

Background A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. Objective The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. Methods We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. Results Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. Conclusions Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses.


10.2196/30308 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e30308
Author(s):  
Mark R Stöhr ◽  
Andreas Günther ◽  
Raphael W Majeed

Background In the field of medicine and medical informatics, the importance of comprehensive metadata has long been recognized, and the composition of metadata has become its own field of profession and research. To ensure sustainable and meaningful metadata are maintained, standards and guidelines such as the FAIR (Findability, Accessibility, Interoperability, Reusability) principles have been published. The compilation and maintenance of metadata is performed by field experts supported by metadata management apps. The usability of these apps, for example, in terms of ease of use, efficiency, and error tolerance, crucially determines their benefit to those interested in the data. Objective This study aims to provide a metadata management app with high usability that assists scientists in compiling and using rich metadata. We aim to evaluate our recently developed interactive web app for our collaborative metadata repository (CoMetaR). This study reflects how real users perceive the app by assessing usability scores and explicit usability issues. Methods We evaluated the CoMetaR web app by measuring the usability of 3 modules: core module, provenance module, and data integration module. We defined 10 tasks in which users must acquire information specific to their user role. The participants were asked to complete the tasks in a live web meeting. We used the System Usability Scale questionnaire to measure the usability of the app. For qualitative analysis, we applied a modified think aloud method with the following thematic analysis and categorization into the ISO 9241-110 usability categories. Results A total of 12 individuals participated in the study. We found that over 97% (85/88) of all the tasks were completed successfully. We measured usability scores of 81, 81, and 72 for the 3 evaluated modules. The qualitative analysis resulted in 24 issues with the app. Conclusions A usability score of 81 implies very good usability for the 2 modules, whereas a usability score of 72 still indicates acceptable usability for the third module. We identified 24 issues that serve as starting points for further development. Our method proved to be effective and efficient in terms of effort and outcome. It can be adapted to evaluate apps within the medical informatics field and potentially beyond.


10.2196/34529 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e34529
Author(s):  
Artin Entezarjou ◽  
Susanna Calling ◽  
Tapomita Bhattacharyya ◽  
Veronica Milos Nymberg ◽  
Lina Vigren ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document