scholarly journals A Tree Based Approach for Multi-class Classification of Surgical Procedures Using Structured and Unstructured Data

Author(s):  
Tannaz Khaleghi ◽  
َAlper Murat ◽  
Suzan Arslanturk

Abstract Background: In surgical department, CPT code assignment has been a complicated manual human effort, that entails significant related knowledge and experience. While there are several studies using CPTs to make predictions in surgical services, literature on predicting CPTs in surgical and other services using text features is very sparse. This study improves the prediction of CPTs by the means of informative features and novel re-prioritization algorithm. Methods: The input data used in this study is composed of both structured and unstructured data. The ground truth labels (CPTs) are obtained from medical coding databases using relative value units which indicates the major operational procedures in each surgery case. In the modeling process, we first utilize Random Forest multi-class classification model to predict the CPT codes. Second, we extract the key information such as label probabilities, feature importance measures, and medical term frequency. Then, the indicated factors are used in a novel algorithm to rearrange the alternative CPT codes in the list of potential candidates based on the calculated weights. Results: To evaluate the performance of both phases, prediction and complementary improvement, we report the accuracy scores of multi-class CPT prediction tasks for datasets of 5 key surgery case specialities. The Random Forest model performs the classification task with 74% to 76% when predicting the primary CPT versus the CPT set with respect to the two filtering conditions on CPT codes. The complementary algorithm improves the results from initial step by 8% on average. Furthermore, the incorporated text features enhanced the quality of the output by 20-35%. Conclusions: We have established a robust framework based on a decision tree predictive model. We predict the surgical codes more accurately and robust compared to the state-of-the-art deep neural structures which can help immensely in both surgery billing and scheduling purposes in such units.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Tannaz Khaleghi ◽  
Alper Murat ◽  
Suzan Arslanturk

Abstract Background In surgical department, CPT code assignment has been a complicated manual human effort, that entails significant related knowledge and experience. While there are several studies using CPTs to make predictions in surgical services, literature on predicting CPTs in surgical and other services using text features is very sparse. This study improves the prediction of CPTs by the means of informative features and a novel re-prioritization algorithm. Methods The input data used in this study is composed of both structured and unstructured data. The ground truth labels (CPTs) are obtained from medical coding databases using relative value units which indicates the major operational procedures in each surgery case. In the modeling process, we first utilize Random Forest multi-class classification model to predict the CPT codes. Second, we extract the key information such as label probabilities, feature importance measures, and medical term frequency. Then, the indicated factors are used in a novel algorithm to rearrange the alternative CPT codes in the list of potential candidates based on the calculated weights. Results To evaluate the performance of both phases, prediction and complementary improvement, we report the accuracy scores of multi-class CPT prediction tasks for datasets of 5 key surgery case specialities. The Random Forest model performs the classification task with 74–76% when predicting the primary CPT (accuracy@1) versus the CPT set (accuracy@2) with respect to two filtering conditions on CPT codes. The complementary algorithm improves the results from initial step by 8% on average. Furthermore, the incorporated text features enhanced the quality of the output by 20–35%. The model outperforms the state-of-the-art neural network model with respect to accuracy, precision and recall. Conclusions We have established a robust framework based on a decision tree predictive model. We predict the surgical codes more accurately and robust compared to the state-of-the-art deep neural structures which can help immensely in both surgery billing and scheduling purposes in such units.


Crime rate is increasing over the years, and it remains a great challenge for the government to track the crimes convicted. Each area has a pattern of which type of crime is happening and the crime knowledge is inevitable to control from the crime happening. Crime occurs in a sequence leaving hidden patterns. Thus the crime data is to be processed for finding underlying patterns, this project finds the patterns and insights about the crime data. Majorly being an unstructured data, this is been preprocessed and checked for future values. Crimes convicted are collected from a particular area (Indore in our case) and checked for predictions using Multi class Classification Algorithms like Random Forest and the future crime to be convicted in an area is predicted and visualizations are made accordingly. Many classification algorithms like support vector machines, decision trees, and random forest are used to classify and random forest shows better accuracy. Features to be given as input and output are selected by visualizing the data by graphs and plots.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4625
Author(s):  
Brian Reilly ◽  
Oliver Morgan ◽  
Gabriela Czanner ◽  
Mark A. Robinson

Changes of direction (COD) are an important aspect of soccer match play. Understanding the physiological and biomechanical demands on players in games allows sports scientists to effectively train and rehabilitate soccer players. COD are conventionally recorded using manually annotated time-motion video analysis which is highly time consuming, so more time-efficient approaches are required. The aim was to develop an automated classification model based on multi-sensor player tracking device data to detect COD > 45°. Video analysis data and individual multi-sensor player tracking data (GPS, accelerometer, gyroscopic) for 23 academy-level soccer players were used. A novel ‘GPS-COD Angle’ variable was developed and used in model training; along with 24 GPS-derived, gyroscope and accelerometer variables. Video annotation was the ground truth indicator of occurrence of COD > 45°. The random forest classifier using the full set of features demonstrated the highest accuracy (AUROC = 0.957, 95% CI = 0.956–0.958, Sensitivity = 0.941, Specificity = 0.772. To balance sensitivity and specificity, model parameters were optimised resulting in a value of 0.889 for both metrics. Similarly high levels of accuracy were observed for random forest models trained using a reduced set of features, accelerometer-derived variables only, and gyroscope-derived variables only. These results point to the potential effectiveness of the novel methodology implemented in automatically identifying COD in soccer players.


Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


Agriculture ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 371
Author(s):  
Yu Jin ◽  
Jiawei Guo ◽  
Huichun Ye ◽  
Jinling Zhao ◽  
Wenjiang Huang ◽  
...  

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3553
Author(s):  
Jeremy Watts ◽  
Anahita Khojandi ◽  
Rama Vasudevan ◽  
Fatta B. Nahab ◽  
Ritesh A. Ramdhani

Parkinson’s disease medication treatment planning is generally based on subjective data obtained through clinical, physician-patient interactions. The Personal KinetiGraph™ (PKG) and similar wearable sensors have shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to cluster patients based on levodopa regimens and response. The resulting clusters are then used to enhance treatment planning by providing improved initial treatment estimates to supplement a physician’s initial assessment. We apply k-means clustering to a dataset of within-subject Parkinson’s medication changes—clinically assessed by the MDS-Unified Parkinson’s Disease Rating Scale-III (MDS-UPDRS-III) and the PKG sensor for movement staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective demographic information, MDS-UPDRS-III scores, and PKG time-series data. Clinically relevant clusters were partitioned by levodopa dose, medication administration frequency, and total levodopa equivalent daily dose—with the PKG providing similar symptomatic assessments to physician MDS-UPDRS-III scores. A random forest classifier trained on demographic information, MDS-UPDRS-III scores, and PKG time-series data was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 86.9%, an F1 score of 90.7%, and an AUC of 0.871. A model that relied solely on demographic information and PKG time-series data provided the next best performance with an accuracy of 83.8%, an F1 score of 88.5%, and an AUC of 0.831, hence further enabling fully remote assessments. These computational methods demonstrate the feasibility of using sensor-based data to cluster patients based on their medication responses with further potential to assist with medication recommendations.


Author(s):  
Balajee Alphonse ◽  
Venkatesan Rajagopal ◽  
Sudhakar Sengan ◽  
Kousalya Kittusamy ◽  
Amudha Kandasamy ◽  
...  

2021 ◽  
Vol 11 (13) ◽  
pp. 6237
Author(s):  
Azharul Islam ◽  
KyungHi Chang

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.


2021 ◽  
Author(s):  
Jeremy Watts ◽  
Anahita Khojandi ◽  
Rama Vasudevan ◽  
Fatta B. Nahab ◽  
Ritesh Ramdhani

Abstract Parkinson’s disease (PD) medication treatment planning is generally based on subjective data through in-office, physicianpatient interactions. The Personal KinetiGraphTM (PKG) has shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to subtype patients based on levodopa regimens and response. We apply k-means clustering to a dataset of with-in-subject Parkinson’s medication changes—clinically assessed by the PKG and Hoehn & Yahr (H&Y) staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective PKG data and demographic information. Clinically relevant clusters were developed based on longitudinal dopaminergic regimens—partitioned by levodopa dose, administration frequency, and total levodopa equivalent daily dose—with the PKG increasing cluster granularity compared to the H&Y staging. A random forest classifier was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 87:9 ±1:3


Author(s):  
Ming Hao ◽  
Weijing Wang ◽  
Fang Zhou

Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.


Sign in / Sign up

Export Citation Format

Share Document