A Tree Based Approach for Multi-class Classification of Surgical Procedures Using Structured and Unstructured Data

Abstract Background: In surgical department, CPT code assignment has been a complicated manual human effort, that entails significant related knowledge and experience. While there are several studies using CPTs to make predictions in surgical services, literature on predicting CPTs in surgical and other services using text features is very sparse. This study improves the prediction of CPTs by the means of informative features and novel re-prioritization algorithm. Methods: The input data used in this study is composed of both structured and unstructured data. The ground truth labels (CPTs) are obtained from medical coding databases using relative value units which indicates the major operational procedures in each surgery case. In the modeling process, we first utilize Random Forest multi-class classification model to predict the CPT codes. Second, we extract the key information such as label probabilities, feature importance measures, and medical term frequency. Then, the indicated factors are used in a novel algorithm to rearrange the alternative CPT codes in the list of potential candidates based on the calculated weights. Results: To evaluate the performance of both phases, prediction and complementary improvement, we report the accuracy scores of multi-class CPT prediction tasks for datasets of 5 key surgery case specialities. The Random Forest model performs the classification task with 74% to 76% when predicting the primary CPT versus the CPT set with respect to the two filtering conditions on CPT codes. The complementary algorithm improves the results from initial step by 8% on average. Furthermore, the incorporated text features enhanced the quality of the output by 20-35%. Conclusions: We have established a robust framework based on a decision tree predictive model. We predict the surgical codes more accurately and robust compared to the state-of-the-art deep neural structures which can help immensely in both surgery billing and scheduling purposes in such units.

Download Full-text

A tree based approach for multi-class classification of surgical procedures using structured and unstructured data

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01665-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Tannaz Khaleghi ◽

Alper Murat ◽

Suzan Arslanturk

Keyword(s):

Random Forest ◽

State Of The Art ◽

Ground Truth ◽

Initial Step ◽

The State ◽

Unstructured Data ◽

Classification Model ◽

Surgical Department ◽

Text Features ◽

Multi Class Classification

Abstract Background In surgical department, CPT code assignment has been a complicated manual human effort, that entails significant related knowledge and experience. While there are several studies using CPTs to make predictions in surgical services, literature on predicting CPTs in surgical and other services using text features is very sparse. This study improves the prediction of CPTs by the means of informative features and a novel re-prioritization algorithm. Methods The input data used in this study is composed of both structured and unstructured data. The ground truth labels (CPTs) are obtained from medical coding databases using relative value units which indicates the major operational procedures in each surgery case. In the modeling process, we first utilize Random Forest multi-class classification model to predict the CPT codes. Second, we extract the key information such as label probabilities, feature importance measures, and medical term frequency. Then, the indicated factors are used in a novel algorithm to rearrange the alternative CPT codes in the list of potential candidates based on the calculated weights. Results To evaluate the performance of both phases, prediction and complementary improvement, we report the accuracy scores of multi-class CPT prediction tasks for datasets of 5 key surgery case specialities. The Random Forest model performs the classification task with 74–76% when predicting the primary CPT (accuracy@1) versus the CPT set (accuracy@2) with respect to two filtering conditions on CPT codes. The complementary algorithm improves the results from initial step by 8% on average. Furthermore, the incorporated text features enhanced the quality of the output by 20–35%. The model outperforms the state-of-the-art neural network model with respect to accuracy, precision and recall. Conclusions We have established a robust framework based on a decision tree predictive model. We predict the surgical codes more accurately and robust compared to the state-of-the-art deep neural structures which can help immensely in both surgery billing and scheduling purposes in such units.

Download Full-text

Crime Mapping with Automatic Classifier System using Machine Learning and GIS

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l1034.10812s219 ◽

2019 ◽

Vol 8 (12S2) ◽

pp. 195-198

Keyword(s):

Random Forest ◽

Unstructured Data ◽

Support Vector ◽

Classification Algorithms ◽

Crime Data ◽

Classifier System ◽

Vector Machines ◽

Hidden Patterns ◽

The Government ◽

Multi Class Classification

Crime rate is increasing over the years, and it remains a great challenge for the government to track the crimes convicted. Each area has a pattern of which type of crime is happening and the crime knowledge is inevitable to control from the crime happening. Crime occurs in a sequence leaving hidden patterns. Thus the crime data is to be processed for finding underlying patterns, this project finds the patterns and insights about the crime data. Majorly being an unstructured data, this is been preprocessed and checked for future values. Crimes convicted are collected from a particular area (Indore in our case) and checked for predictions using Multi class Classification Algorithms like Random Forest and the future crime to be convicted in an area is predicted and visualizations are made accordingly. Many classification algorithms like support vector machines, decision trees, and random forest are used to classify and random forest shows better accuracy. Features to be given as input and output are selected by visualizing the data by graphs and plots.

Download Full-text

Automated Classification of Changes of Direction in Soccer Using Inertial Measurement Units

Sensors ◽

10.3390/s21144625 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4625

Author(s):

Brian Reilly ◽

Oliver Morgan ◽

Gabriela Czanner ◽

Mark A. Robinson

Keyword(s):

Random Forest ◽

Video Analysis ◽

Ground Truth ◽

Video Annotation ◽

Classification Model ◽

Soccer Players ◽

Model Parameters ◽

Automated Classification ◽

Time Motion ◽

Player Tracking

Changes of direction (COD) are an important aspect of soccer match play. Understanding the physiological and biomechanical demands on players in games allows sports scientists to effectively train and rehabilitate soccer players. COD are conventionally recorded using manually annotated time-motion video analysis which is highly time consuming, so more time-efficient approaches are required. The aim was to develop an automated classification model based on multi-sensor player tracking device data to detect COD > 45°. Video analysis data and individual multi-sensor player tracking data (GPS, accelerometer, gyroscopic) for 23 academy-level soccer players were used. A novel ‘GPS-COD Angle’ variable was developed and used in model training; along with 24 GPS-derived, gyroscope and accelerometer variables. Video annotation was the ground truth indicator of occurrence of COD > 45°. The random forest classifier using the full set of features demonstrated the highest accuracy (AUROC = 0.957, 95% CI = 0.956–0.958, Sensitivity = 0.941, Specificity = 0.772. To balance sensitivity and specificity, model parameters were optimised resulting in a value of 0.889 for both metrics. Similarly high levels of accuracy were observed for random forest models trained using a reduced set of features, accelerometer-derived variables only, and gyroscope-derived variables only. These results point to the potential effectiveness of the novel methodology implemented in automatically identifying COD in soccer players.

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

Extraction of Arecanut Planting Distribution Based on the Feature Space Optimization of PlanetScope Imagery

Agriculture ◽

10.3390/agriculture11040371 ◽

2021 ◽

Vol 11 (4) ◽

pp. 371

Author(s):

Yu Jin ◽

Jiawei Guo ◽

Huichun Ye ◽

Jinling Zhao ◽

Wenjiang Huang ◽

...

Keyword(s):

Random Forest ◽

Satellite Imagery ◽

Feature Space ◽

Kappa Coefficient ◽

Classification Model ◽

Support Vector ◽

Textural Feature ◽

Monitoring Accuracy ◽

Areca Catechu ◽

High Level

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.

Download Full-text

Improving Medication Regimen Recommendation for Parkinson’s Disease Using Sensor Technology

Sensors ◽

10.3390/s21103553 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3553

Author(s):

Jeremy Watts ◽

Anahita Khojandi ◽

Rama Vasudevan ◽

Fatta B. Nahab ◽

Ritesh A. Ramdhani

Keyword(s):

Parkinson’S Disease ◽

Time Series ◽

Parkinson's Disease ◽

Random Forest ◽

Treatment Planning ◽

Time Series Data ◽

Classification Model ◽

Series Data ◽

Demographic Information ◽

Subjective Data

Parkinson’s disease medication treatment planning is generally based on subjective data obtained through clinical, physician-patient interactions. The Personal KinetiGraph™ (PKG) and similar wearable sensors have shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to cluster patients based on levodopa regimens and response. The resulting clusters are then used to enhance treatment planning by providing improved initial treatment estimates to supplement a physician’s initial assessment. We apply k-means clustering to a dataset of within-subject Parkinson’s medication changes—clinically assessed by the MDS-Unified Parkinson’s Disease Rating Scale-III (MDS-UPDRS-III) and the PKG sensor for movement staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective demographic information, MDS-UPDRS-III scores, and PKG time-series data. Clinically relevant clusters were partitioned by levodopa dose, medication administration frequency, and total levodopa equivalent daily dose—with the PKG providing similar symptomatic assessments to physician MDS-UPDRS-III scores. A random forest classifier trained on demographic information, MDS-UPDRS-III scores, and PKG time-series data was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 86.9%, an F1 score of 90.7%, and an AUC of 0.871. A model that relied solely on demographic information and PKG time-series data provided the next best performance with an accuracy of 83.8%, an F1 score of 88.5%, and an AUC of 0.831, hence further enabling fully remote assessments. These computational methods demonstrate the feasibility of using sensor-based data to cluster patients based on their medication responses with further potential to assist with medication recommendations.

Download Full-text

Modeling and multi-class classification of vibroarthographic signals via time domain curvilinear divergence random forest

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02869-0 ◽

2021 ◽

Author(s):

Balajee Alphonse ◽

Venkatesan Rajagopal ◽

Sudhakar Sengan ◽

Kousalya Kittusamy ◽

Amudha Kandasamy ◽

...

Keyword(s):

Random Forest ◽

Time Domain ◽

Multi Class Classification

Download Full-text

Real-Time AI-Based Informational Decision-Making Support System Utilizing Dynamic Text Sources

Applied Sciences ◽

10.3390/app11136237 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6237

Author(s):

Azharul Islam ◽

KyungHi Chang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Random Forest ◽

Support System ◽

Classification Accuracy ◽

Short Term Memory ◽

Learning Algorithm ◽

Unstructured Data ◽

Stochastic Gradient Descent ◽

Decision Making Support

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.

Download Full-text

Predicting Parkinson’s Disease Medication Regimen Using Sensor Technology

10.21203/rs.3.rs-198765/v1 ◽

2021 ◽

Author(s):

Jeremy Watts ◽

Anahita Khojandi ◽

Rama Vasudevan ◽

Fatta B. Nahab ◽

Ritesh Ramdhani

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Classification Model ◽

Sensor Data ◽

Sensor Technology ◽

Medication Regimen ◽

Levodopa Dose ◽

Subjective Data ◽

Random Forest Classification

Abstract Parkinson’s disease (PD) medication treatment planning is generally based on subjective data through in-office, physicianpatient interactions. The Personal KinetiGraphTM (PKG) has shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to subtype patients based on levodopa regimens and response. We apply k-means clustering to a dataset of with-in-subject Parkinson’s medication changes—clinically assessed by the PKG and Hoehn & Yahr (H&Y) staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective PKG data and demographic information. Clinically relevant clusters were developed based on longitudinal dopaminergic regimens—partitioned by levodopa dose, administration frequency, and total levodopa equivalent daily dose—with the PKG increasing cluster granularity compared to the H&Y staging. A random forest classifier was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 87:9 ±1:3

Download Full-text

Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2035 ◽

2021 ◽

Author(s):

Ming Hao ◽

Weijing Wang ◽

Fang Zhou

Keyword(s):

Language Processing ◽

Text Classification ◽

Ground Truth ◽

Language Models ◽

Text Representation ◽

Short Text ◽

Practical Applications ◽

Classical Models ◽

Multi Class Classification ◽

Public Datasets

Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.

Download Full-text