EP11 Using machine learning to recover unrecorded prehospital data

BackgroundThe recording practices, of electronic patient records for ambulance crews, are continuously developing. South Central Ambulance Service (SCAS) adapted the common AVPU-scale (Alert, Voice, Pain, Unresponsive) in 2019 to include an option for ‘New Confusion’. Progressing to this new AVCPU-scale made comparisons with older data impossible. We demonstrate a method to retrospectively classify patients into the alertness levels most influenced by this update.MethodsSCAS provided ~1.6 million Electronic Patient Records, including vital signs, demographics, and presenting complaint free-text, these were split into training, validation, and testing datasets (80%, 10%, 10% respectively), and under sampled to the minority class. These data were used to train and validate predictions of the classes most affected by the modification of the scale (Alert, New Confusion, Voice).A transfer-learning natural language processing (NLP) classifier was used, using a language model described by Smerity et al. (2017) to classify the presenting complaint free-text.A second approach used vital signs, demographics, conveyance, and assessments (30 metrics) for classification. Categorical data were binary encoded and continuous variables were normalised. 20 machine learning algorithms were empirically tested and the best 3 combined into a voting ensemble combining three vital-sign based algorithms (Random Forest, Extra Tree Classifier, Decision Tree) with the NLP classifier using a Random Forest output layer.ResultsThe ensemble method resulted in a weighted F1 of 0.78 for the test set. The sensitivities/specificities for each of the classes are: 84%/ 90% (Alert), 73%/ 89% (Newly Confused) and 68%/ 93% (Voice).ConclusionsThe ensemble combining free text and vital signs resulted in high sensitivity and specificity when reclassifying the alertness levels of prehospital patients. This study demonstrates the capabilities of machine learning classifiers to recover missing data, allowing the comparison of data collected with different recording standards.

Download Full-text

An Improved Random Forest Algorithm for Class-Imbalanced Data Classification and its Application in PAD Risk Factors Analysis

The Open Electrical & Electronic Engineering Journal ◽

10.2174/1874129001307010062 ◽

2013 ◽

Vol 7 (1) ◽

pp. 62-70 ◽

Cited By ~ 9

Author(s):

Dengju Yao ◽

Jing Yang ◽

Xiaojuan Zhan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Majority Voting ◽

Training Dataset ◽

Random Forest Algorithm ◽

Research Subjects ◽

Minority Class ◽

Imbalanced Data Classification

The classification problem is one of the important research subjects in the field of machine learning. However, most machine learning algorithms train a classifier based on the assumption that the number of training examples of classes is almost equal. When a classifier was trained on imbalanced data, the performance of the classifier declined clearly. For resolving the class-imbalanced problem, an improved random forest algorithm was proposed based on sampling with replacement. We extracted multiple example subsets randomly with replacement from majority class, and the example number of extracted example subsets is as the same with minority class example dataset. Then, multiple new training datasets were constructed by combining the each exacted majority example subset and minority class dataset respectively, and multiple random forest classifiers were training on these training dataset. For a prediction example, the class was determined by majority voting of multiple random forest classifiers. The experimental results on five groups UCI datasets and a real clinical dataset show that the proposed method could deal with the class-imbalanced data problem and the improved random forest algorithm outperformed original random forest and other methods in literatures.

Download Full-text

Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches

10.21203/rs.3.rs-39653/v3 ◽

2020 ◽

Author(s):

Yuanren Tong ◽

Keming Lu ◽

Yingyun Yang ◽

Ji Li ◽

Yucong Lin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Language Processing ◽

Intestinal Tuberculosis ◽

Machine Learning Algorithms ◽

Free Text ◽

Intestinal Diseases ◽

Specificity And Sensitivity

Abstract Background: Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. Methods: A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built. Results: The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively.Conclusions: Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases.

Download Full-text

Can Natural Language Processing Help Differentiate Inflammatory Intestinal Diseases in China? Models Applying Random Forest and Convolutional Neural Network Approaches

10.21203/rs.3.rs-39653/v1 ◽

2020 ◽

Author(s):

Yuanren Tong ◽

Keming Lu ◽

Yingyun Yang ◽

Ji Li ◽

Yucong Lin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Language Processing ◽

Intestinal Tuberculosis ◽

Machine Learning Algorithms ◽

Free Text ◽

Intestinal Diseases ◽

Specificity And Sensitivity

Abstract Background: Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. Methods: A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built.Results: The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively.Conclusions: Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases.

Download Full-text

A Study of Machine Learning Algorithms for DDoS Detection

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.34922 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 174-178

Author(s):

Sheikh Shehzad Ahmed

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Random Forest Classifier ◽

Attack Detection ◽

Machine Learning Algorithms ◽

The Internet ◽

Ddos Attacks ◽

Decision Tree Classifier ◽

Tree Classifier

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.

Download Full-text

Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01277-w ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Yuanren Tong ◽

Keming Lu ◽

Yingyun Yang ◽

Ji Li ◽

Yucong Lin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Language Processing ◽

Intestinal Tuberculosis ◽

Machine Learning Algorithms ◽

Free Text ◽

Intestinal Diseases ◽

Specificity And Sensitivity

Abstract Background Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. Methods A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built. Results The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively. Conclusions Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases. Conference The abstract of this article has won the first prize of the Young Investigator Award during the Asian Pacific Digestive Week (APDW) 2019 held in Kolkata, India.

Download Full-text

P213 Can artificial intelligence help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neuron network

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjz203.342 ◽

2020 ◽

Vol 14 (Supplement_1) ◽

pp. S247-S248

Author(s):

Y Li ◽

Y Tong ◽

K Lu ◽

S Yu ◽

J Qian

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Random Forest ◽

Intestinal Tuberculosis ◽

Machine Learning Algorithms ◽

Free Text ◽

Neuron Network ◽

College Hospital ◽

Intestinal Diseases ◽

Specificity And Sensitivity

Abstract Background Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) is challenging under endoscopy. We aimed to realise automatic differential diagnosis among these diseases through machine learning algorithms. Methods A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had taken colonoscopy examinations in Peking Union Medical College Hospital from January 2008 to November 2018 was enrolled. The input was the description of the endoscopic image in the form of free-text. Word segmentation and key word infiltration were conducted as data pre-processing. Random forest (RF) and convolutional neural network (CNN) were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, CD and ITB) and a three-class classifier (UC, CD and ITB) were built. Sensitivity/specificity and precision/recall were applied to evaluate the performance of two-class classifiers and the three-class classifier, respectively. Results The classifiers built in this research were well-performed and the CNN had a better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB and CD-ITB were 0.89/0.84, 0.83/0.82 and 0.72/0.77, while the CNN of CD-ITB was 0.90/0.77. The precision/recall of UC-CD-ITB was 0.97/0.97, 0.65/0.53 and 0.68/0.76 by RF, respectively, and 0.99/0.97,0.87/0.83 and 0.52/0.81 by CNN, respectively. Conclusion Classifiers built by RF and CNN had an excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were reached as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases.

Download Full-text

Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches

10.21203/rs.3.rs-39653/v2 ◽

2020 ◽

Author(s):

Yuanren Tong ◽

Keming Lu ◽

Yingyun Yang ◽

Ji Li ◽

Yucong Lin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

Language Processing ◽

Intestinal Tuberculosis ◽

Machine Learning Algorithms ◽

Free Text ◽

Intestinal Diseases ◽

Specificity And Sensitivity

Download Full-text

Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

10.2196/preprints.11728 ◽

2018 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huixian Li ◽

Jiexin Zhang ◽

...

Keyword(s):

Machine Learning ◽

Retrospective Study ◽

Random Forest ◽

Precocious Puberty ◽

Prediction Models ◽

Central Precocious Puberty ◽

Machine Learning Algorithms ◽

Stimulation Test ◽

Gnrh Analogue ◽

Prediction Probability

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Download Full-text

Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine

Remote Sensing ◽

10.3390/rs13010010 ◽

2020 ◽

Vol 13 (1) ◽

pp. 10

Author(s):

Andrea Sulova ◽

Jamal Jokar Arsanjani

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Google Earth ◽

Summer Season ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Training Dataset ◽

Google Earth Engine

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text