Comparison of different feature extraction methods for applicable automated ICD coding

Abstract Background Automated ICD coding on medical texts via machine learning has been a hot topic. Related studies from medical field heavily relies on conventional bag-of-words (BoW) as the feature extraction method, and do not commonly use more complicated methods, such as word2vec (W2V) and large pretrained models like BERT. This study aimed at uncovering the most effective feature extraction methods for coding models by comparing BoW, W2V and BERT variants. Methods We experimented with a Chinese dataset from Fuwai Hospital, which contains 6947 records and 1532 unique ICD codes, and a public Spanish dataset, which contains 1000 records and 2557 unique ICD codes. We designed coding tasks with different code frequency thresholds (denoted as $$f_s$$ f s ), with a lower threshold indicating a more complex task. Using traditional classifiers, we compared BoW, W2V and BERT variants on accomplishing these coding tasks. Results When $$f_s$$ f s was equal to or greater than 140 for Fuwai dataset, and 60 for the Spanish dataset, the BERT variants with the whole network fine-tuned was the best method, leading to a Micro-F1 of 93.9% for Fuwai data when $$f_s=200$$ f s = 200 , and a Micro-F1 of 85.41% for the Spanish dataset when $$f_s=180$$ f s = 180 . When $$f_s$$ f s fell below 140 for Fuwai dataset, and 60 for the Spanish dataset, BoW turned out to be the best, leading to a Micro-F1 of 83% for Fuwai dataset when $$f_s=20$$ f s = 20 , and a Micro-F1 of 39.1% for the Spanish dataset when $$f_s=20$$ f s = 20 . Our experiments also showed that both the BERT variants and BoW possessed good interpretability, which is important for medical applications of coding models. Conclusions This study shed light on building promising machine learning models for automated ICD coding by revealing the most effective feature extraction methods. Concretely, our results indicated that fine-tuning the whole network of the BERT variants was the optimal method for tasks covering only frequent codes, especially codes that represented unspecified diseases, while BoW was the best for tasks involving both frequent and infrequent codes. The frequency threshold where the best-performing method varied differed between different datasets due to factors like language and codeset.

Download Full-text

How to Utilize My App Reviews? A Novel Topics Extraction Machine Learning Schema for Strategic Business Purposes

Entropy ◽

10.3390/e22111310 ◽

2020 ◽

Vol 22 (11) ◽

pp. 1310

Author(s):

Ioannis Triantafyllou ◽

Ioannis C. Drivas ◽

Georgios Giannakopoulos

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Software Maintenance ◽

Extraction Methods ◽

Machine Learning Algorithms ◽

Classification Problems ◽

Case Scenario ◽

Feature Extraction Method ◽

Term Extraction ◽

One Step

Acquiring knowledge about users’ opinion and what they say regarding specific features within an app, constitutes a solid steppingstone for understanding their needs and concerns. App review utilization helps project management teams to identify threads and opportunities for app software maintenance, optimization and strategic marketing purposes. Nevertheless, app user review classification for identifying valuable gems of information for app software improvement, is a complex and multidimensional issue. It requires foresight and multiple combinations of sophisticated text pre-processing, feature extraction and machine learning methods to efficiently classify app reviews into specific topics. Against this backdrop, we propose a novel feature engineering classification schema that is capable to identify more efficiently and earlier terms-words within reviews that could be classified into specific topics. For this reason, we present a novel feature extraction method, the DEVMAX.DF combined with different machine learning algorithms to propose a solution in app review classification problems. One step further, a simulation of a real case scenario takes place to validate the effectiveness of the proposed classification schema into different apps. After multiple experiments, results indicate that the proposed schema outperforms other term extraction methods such as TF.IDF and χ2 to classify app reviews into topics. To this end, the paper contributes to the knowledge expansion of research and practitioners with the purpose to reinforce their decision-making process within the realm of app reviews utilization.

Download Full-text

Neural Decoding of EEG Signals with Machine Learning: A Systematic Review

Brain Sciences ◽

10.3390/brainsci11111525 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1525

Author(s):

Maham Saeidi ◽

Waldemar Karwowski ◽

Farzad V. Farahani ◽

Krzysztof Fiok ◽

Redha Taiar ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Feature Extraction ◽

Mental Workload ◽

Extraction Methods ◽

Invasive Technique ◽

Support Vector ◽

Eeg Signals ◽

Feature Extraction Method ◽

Analysis Group

Electroencephalography (EEG) is a non-invasive technique used to record the brain’s evoked and induced electrical activity from the scalp. Artificial intelligence, particularly machine learning (ML) and deep learning (DL) algorithms, are increasingly being applied to EEG data for pattern analysis, group membership classification, and brain-computer interface purposes. This study aimed to systematically review recent advances in ML and DL supervised models for decoding and classifying EEG signals. Moreover, this article provides a comprehensive review of the state-of-the-art techniques used for EEG signal preprocessing and feature extraction. To this end, several academic databases were searched to explore relevant studies from the year 2000 to the present. Our results showed that the application of ML and DL in both mental workload and motor imagery tasks has received substantial attention in recent years. A total of 75% of DL studies applied convolutional neural networks with various learning algorithms, and 36% of ML studies achieved competitive accuracy by using a support vector machine algorithm. Wavelet transform was found to be the most common feature extraction method used for all types of tasks. We further examined the specific feature extraction methods and end classifier recommendations discovered in this systematic review.

Download Full-text

Evaluation of Fine Tuning and Feature Extraction methods in Biometric Periocular Recognition

10.5753/wvc.2019.7626 ◽

2019 ◽

Author(s):

William Barcellos ◽

Nicolas Hiroaki Shitara ◽

Carolina Toledo Ferraz ◽

Raissa Tavares Vieira Queiroga ◽

Jose Hiroki Saito ◽

...

Keyword(s):

Feature Extraction ◽

Transfer Learning ◽

Class Imbalance ◽

Extraction Methods ◽

Fine Tuning ◽

Feature Extraction Method ◽

Network Training ◽

Learning Techniques ◽

Lower Accuracy ◽

Different Characteristics

The aim of this paper is to evaluate the performance of Transfer Learning techniques applied in Convolucional Neural Networks for biometric periocular classification. Two aspects of Transfer Learning were evaluated: the technique known as Fine Tuning and the technique known as Feature Extraction. Two CNN architectures were evaluated, the AlexNet and the VGG-16, and two image databases were used. These two databases have different characteristics regarding the method of acquisition, the amount of classes, the class balancing, and the number of elements in each class. Three experiments were conducted to evaluate the performance of the CNNs. In the first experiment we measured the Feature Extraction accuracy, and in the second one we evaluated the Fine Tuning performance. In the third experiment, we used the AlexNet for Fine Tuning in one database, and then, the FC7 layer of this trained CNN was used for Feature Extraction in the other database. We concluded that the data quality (the presence or not of class samples in the training set), class imbalance (different number of elements in each class) and the selection method of the training and testing, directly influence the CNN accuracy. The Feature Extraction method, by being more simple and does not require network training, has lower accuracy than Fine Tuning. Furthermore, Fine Tuning a CNN with periocular's images from one database, doesn't increase the accuracy of this CNN in Feature Extraction mode for another periocular's database. The accuracy is quite similar to that obtained by the original pre-trained network

Download Full-text

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 16-41 ◽

Cited By ~ 1

Author(s):

Htwe Pa Pa Win ◽

Phyo Thu Thu Khine ◽

Khin Nwe Ni Tun

Keyword(s):

Feature Extraction ◽

Structural Analysis ◽

Character Recognition ◽

Optical Character Recognition ◽

Extraction Method ◽

Recognition Performance ◽

Extraction Methods ◽

Support Vector ◽

Svm Classifier ◽

Feature Extraction Method

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.

Download Full-text

Machine learning, waveform preprocessing and feature extraction methods for classification of acoustic startle waveforms

MethodsX ◽

10.1016/j.mex.2020.101166 ◽

2021 ◽

Vol 8 ◽

pp. 101166

Author(s):

Timothy J. Fawcett ◽

Chad S. Cooper ◽

Ryan J. Longenecker ◽

Joseph P. Walton

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Acoustic Startle ◽

Extraction Methods

Download Full-text

Performance Analysis of Machine Learning Algorithms and Feature Extraction Methods for Sentiment Analysis

10.1109/icses52305.2021.9633882 ◽

2021 ◽

Author(s):

Anshumaan Chauhan ◽

Ayushi Agarwal ◽

Razia Sulthana

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Performance Analysis ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms

Download Full-text

Headnote Prediction Using Machine Learning

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/7 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Sarmad Mahar ◽

Sahar Zafar ◽

Kamran Nishat

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Active Learning ◽

Text Classification ◽

Extraction Methods ◽

Text Summarization ◽

Training Data ◽

Second Step ◽

Support Vector ◽

Classification Algorithms

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.

Download Full-text

Research on Feature Extraction Method Based on Rub-Impact Acoustic Emission Signals of Mechanical Seal End Faces

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.36.68 ◽

2010 ◽

Vol 36 ◽

pp. 68-74

Author(s):

Chuan Jun Liao ◽

Shuang Fu Suo ◽

Wei Feng Huang

Keyword(s):

Acoustic Emission ◽

Feature Extraction ◽

Extraction Methods ◽

Mechanical Seal ◽

Mechanical Seals ◽

Feature Extraction Method ◽

Different Types ◽

Ae Signal ◽

Feature Information ◽

Ae Signals

Acoustic emission (AE) techniques are put forward to monitor rub-impacts between rotating rings and stationary rings of mechanical seals by this paper. By analyzing feature extraction methods of the typical rub-impact AE signal, the method combining of wavelet scalogram and power spectrum is found useful, and can used to attribute the feature information implicated in rub-impact AE signals of mechanical seal end faces. Both simulations and experimental research prove that the method is effective, and are used successfully to identify the typical features of different types of rub-impacts of mechanical seal end faces.

Download Full-text

A REVIEW OF FEATURE EXTRACTION METHODS ON MACHINE LEARNING

Journal of Information System and Technology Management ◽

10.35631/jistm.622005 ◽

2021 ◽

Vol 6 (22) ◽

pp. 51-59

Author(s):

Mustazzihim Suhaidi ◽

Rabiah Abdul Kadir ◽

Sabrina Tiun

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Feature Selection ◽

Input Data ◽

Feature Vector ◽

Learning Algorithm ◽

Extraction Methods ◽

Machine Learning Algorithm ◽

Learning Tasks ◽

Low Dimensional

Extracting features from input data is vital for successful classification and machine learning tasks. Classification is the process of declaring an object into one of the predefined categories. Many different feature selection and feature extraction methods exist, and they are being widely used. Feature extraction, obviously, is a transformation of large input data into a low dimensional feature vector, which is an input to classification or a machine learning algorithm. The task of feature extraction has major challenges, which will be discussed in this paper. The challenge is to learn and extract knowledge from text datasets to make correct decisions. The objective of this paper is to give an overview of methods used in feature extraction for various applications, with a dataset containing a collection of texts taken from social media.

Download Full-text

Lung Disease Classification by Novel Shape-Based Feature Extraction and New Hybrid Genetic Approach

Emerging Technologies in Intelligent Applications for Image and Video Processing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-9685-3.ch013 ◽

2016 ◽

pp. 321-346

Author(s):

Bhuvaneswari Chandran ◽

P. Aruna ◽

D. Loganathan

Keyword(s):

Feature Extraction ◽

Extraction Method ◽

Lung Diseases ◽

Hybrid Genetic Algorithm ◽

Extraction Methods ◽

Disease Classification ◽

Filter Method ◽

Svm Classifier ◽

Genetic Approach ◽

Feature Extraction Method

The purpose of the chapter is to present a novel method to classify lung diseases from the computed tomography images which assist physicians in the diagnosis of lung diseases. The method is based on a new approach which combines a proposed M2 feature extraction method and a novel hybrid genetic approach with different types of classifiers. The feature extraction methods performed in this work are moment invariants, proposed multiscale filter method and proposed M2 feature extraction method. The essential features which are the results of the feature extraction technique are selected by the novel hybrid genetic algorithm feature selection algorithms. Classification is performed by the support vector machine, multilayer perceptron neural network and Bayes Net classifiers. The result obtained proves that the proposed technique is an efficient and robust method. The performance of the proposed M2 feature extraction with proposed hybrid GA and SVM classifier combination achieves maximum classification accuracy.

Download Full-text