Text Mining and Machine Learning Algorithms to Identifying Diseases and Providing Repair Action Using ICD-10 Codes

Author(s):  
Ashish P. Ramdasi ◽  
S. Sathyalakshmi
2020 ◽  
Vol 9 (1) ◽  
Author(s):  
E. Popoff ◽  
M. Besada ◽  
J. P. Jansen ◽  
S. Cope ◽  
S. Kanters

Abstract Background Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. Methods We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. Results The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. Conclusions ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach.


2021 ◽  
Author(s):  
Burak Kolukisa ◽  
Bilge Kagan Dedeturk ◽  
Beyhan Adanur Dedeturk ◽  
Abdulkadir Gulsen ◽  
Gokhan Bakal

Author(s):  
Durmuş Özkan Şahin ◽  
Erdal Kılıç

In this study, the authors give both theoretical and experimental information about text mining, which is one of the natural language processing topics. Three different text mining problems such as news classification, sentiment analysis, and author recognition are discussed for Turkish. They aim to reduce the running time and increase the performance of machine learning algorithms. Four different machine learning algorithms and two different feature selection metrics are used to solve these text classification problems. Classification algorithms are random forest (RF), logistic regression (LR), naive bayes (NB), and sequential minimal optimization (SMO). Chi-square and information gain metrics are used as the feature selection method. The highest classification performance achieved in this study is 0.895 according to the F-measure metric. This result is obtained by using the SMO classifier and information gain metric for news classification. This study is important in terms of comparing the performances of classification algorithms and feature selection methods.


Author(s):  
Hadj Ahmed Bouarara

A recent British study of people between the ages of 14 and 35 has shown that social media has a negative impact on mental health. The purpose of the paper is to detect people with mental disorders' behavior in social media in order to help Twitter users in overcoming their mental health problems such as anxiety, phobia, depression, paranoia, etc. For this, the author used text mining and machine learning algorithms (naïve Bayes, k-nearest neighbours) to analyse tweets. The obtained results were validated using different evaluation measures such as f-measure, recall, precision, entropy, etc.


2020 ◽  
Vol 24 (5) ◽  
pp. 300-312
Author(s):  
Jian-qiang Guo ◽  
Shu-hen Chiang ◽  
Min Liu ◽  
Chi-Chun Yang ◽  
Kai-yi Guo

Housing frenzies in China have attracted widespread global attention over the past few years, but the key is how to more accurately forecast housing prices in order to establish an effective real estate policy. Based on the ubiquitousness and immediacy of Internet data, this research adopts a broader version of text mining to search for keywords in relation to housing prices and then evaluates the predictive abilities using machine learning algorithms. Our findings indicate that this new method, especially random forest, not only detects turning points, but also offers prediction ability that clearly outperforms traditional regression analysis. Overall, the prediction based on online search data through a machine learning mechanism helps us better understand the trends of house prices in China.


Sign in / Sign up

Export Citation Format

Share Document