Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings

AbstractComputer-assisted reporting (CAR) tools were suggested to improve radiology report quality by context-sensitively recommending key imaging biomarkers. However, studies evaluating machine learning (ML) algorithms on cross-lingual ontological (RadLex) mappings for developing embedded CAR algorithms are lacking. Therefore, we compared ML algorithms developed on human expert-annotated features against those developed on fully automated cross-lingual (German to English) RadLex mappings using 206 CT reports of suspected stroke. Target label was whether the Alberta Stroke Programme Early CT Score (ASPECTS) should have been provided (yes/no:154/52). We focused on probabilistic outputs of ML-algorithms including tree-based methods, elastic net, support vector machines (SVMs) and fastText (linear classifier), which were evaluated in the same 5 × fivefold nested cross-validation framework. This allowed for model stacking and classifier rankings. Performance was evaluated using calibration metrics (AUC, brier score, log loss) and -plots. Contextual ML-based assistance recommending ASPECTS was feasible. SVMs showed the highest accuracies both on human-extracted- (87%) and RadLex features (findings:82.5%; impressions:85.4%). FastText achieved the highest accuracy (89.3%) and AUC (92%) on impressions. Boosted trees fitted on findings had the best calibration profile. Our approach provides guidance for choosing ML classifiers for CAR tools in fully automated and language-agnostic fashion using bag-of-RadLex terms on limited expert-labelled training data.

Download Full-text

Comparative Analysis of Machine Learning Algorithms for Computer-Assisted Reporting Based on Fully Automated Cross-Lingual RadLex® Mappings

10.20944/preprints202004.0354.v1 ◽

2020 ◽

Author(s):

Máté E. Maros ◽

Chang Gyu Cho ◽

Andreas G. Junge ◽

Benedikt Kämpgen ◽

Victor Saase ◽

...

Keyword(s):

Machine Learning ◽

Language Processing ◽

Confusion Matrix ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Imaging Biomarkers ◽

Brier Score ◽

Support Vector ◽

Computer Assisted ◽

Cross Lingual

Objectives: Studies evaluating machine learning (ML) algorithms on cross-lingual RadLex® mappings for developing context-sensitive radiological reporting tools are lacking. Therefore, we investigated whether ML-based approaches can be utilized to assist radiologists in providing key imaging biomarkers – such as The Alberta stroke programme early CT score (APECTS). Material and Methods: A stratified random sample (age, gender, year) of CT reports (n=206) with suspected ischemic stroke was generated out of 3997 reports signed off between 2015-2019. Three independent, blinded readers assessed these reports and manually annotated clinico-radiologically relevant key features. The primary outcome was whether ASPECTS should have been provided (yes/no: 154/52). For all reports, both the findings and impressions underwent cross-lingual (German to English) RadLex®-mappings using natural language processing. Well-established ML-algorithms including classification trees, random forests, elastic net, support vector machines (SVMs) and boosted trees were evaluated in a 5 x 5-fold nested cross-validation framework. Further, a linear classifier (fastText) was directly fitted on the German reports. Ensemble learning was used to provide robust importance rankings of these ML-algorithms. Performance was evaluated using derivates of the confusion matrix and metrics of calibration including AUC, brier score and log loss as well as visually by calibration plots. Results: On this imbalanced classification task SVMs showed the highest accuracies both on human-extracted- (87%) and fully automated RadLex® features (findings: 82.5%; impressions: 85.4%). FastText without pre-trained language model showed the highest accuracy (89.3%) and AUC (92%) on the impressions. Ensemble learner revealed that boosted trees, fastText and SVMs are the most important ML-classifiers. Boosted trees fitted on the findings showed the best overall calibration curve. Conclusions: Contextual ML-based assistance suggesting ASPECTS while reporting neuroradiological emergencies is feasible, even if ML-models are restricted to be developed on limited and highly imbalanced data sets.

Download Full-text

Drill-Core Mineral Abundance Estimation Using Hyperspectral and High-Resolution Mineralogical Data

Remote Sensing ◽

10.3390/rs12071218 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1218

Author(s):

Laura Tuşa ◽

Mahdi Khodadadzadeh ◽

Cecilia Contreras ◽

Kasra Rafiezadeh Shahi ◽

Margret Fuchs ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Ore Deposits ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Drill Core ◽

Data Types ◽

Mineralogical Characterization ◽

Core Samples

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.

Download Full-text

Automatic Grading of Stroke Symptoms for Rapid Assessment Using Optimized Machine Learning and 4-Limb Kinematics: Clinical Validation Study (Preprint)

10.2196/preprints.20641 ◽

2020 ◽

Author(s):

Eunjeong Park ◽

Kijeong Lee ◽

Taehwa Han ◽

Hyo Suk Nam

Keyword(s):

Machine Learning ◽

Medical Staff ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Grading System ◽

Operating Characteristics ◽

Stroke Patients ◽

Motor Weakness ◽

Automatic Grading

BACKGROUND Subtle abnormal motor signs are indications of serious neurological diseases. Although neurological deficits require fast initiation of treatment in a restricted time, it is difficult for nonspecialists to detect and objectively assess the symptoms. In the clinical environment, diagnoses and decisions are based on clinical grading methods, including the National Institutes of Health Stroke Scale (NIHSS) score or the Medical Research Council (MRC) score, which have been used to measure motor weakness. Objective grading in various environments is necessitated for consistent agreement among patients, caregivers, paramedics, and medical staff to facilitate rapid diagnoses and dispatches to appropriate medical centers. OBJECTIVE In this study, we aimed to develop an autonomous grading system for stroke patients. We investigated the feasibility of our new system to assess motor weakness and grade NIHSS and MRC scores of 4 limbs, similar to the clinical examinations performed by medical staff. METHODS We implemented an automatic grading system composed of a measuring unit with wearable sensors and a grading unit with optimized machine learning. Inertial sensors were attached to measure subtle weaknesses caused by paralysis of upper and lower limbs. We collected 60 instances of data with kinematic features of motor disorders from neurological examination and demographic information of stroke patients with NIHSS 0 or 1 and MRC 7, 8, or 9 grades in a stroke unit. Training data with 240 instances were generated using a synthetic minority oversampling technique to complement the imbalanced number of data between classes and low number of training data. We trained 2 representative machine learning algorithms, an ensemble and a support vector machine (SVM), to implement auto-NIHSS and auto-MRC grading. The optimized algorithms performed a 5-fold cross-validation and were searched by Bayes optimization in 30 trials. The trained model was tested with the 60 original hold-out instances for performance evaluation in accuracy, sensitivity, specificity, and area under the receiver operating characteristics curve (AUC). RESULTS The proposed system can grade NIHSS scores with an accuracy of 83.3% and an AUC of 0.912 using an optimized ensemble algorithm, and it can grade with an accuracy of 80.0% and an AUC of 0.860 using an optimized SVM algorithm. The auto-MRC grading achieved an accuracy of 76.7% and a mean AUC of 0.870 in SVM classification and an accuracy of 78.3% and a mean AUC of 0.877 in ensemble classification. CONCLUSIONS The automatic grading system quantifies proximal weakness in real time and assesses symptoms through automatic grading. The pilot outcomes demonstrated the feasibility of remote monitoring of motor weakness caused by stroke. The system can facilitate consistent grading with instant assessment and expedite dispatches to appropriate hospitals and treatment initiation by sharing auto-MRC and auto-NIHSS scores between prehospital and hospital responses as an objective observation.

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

A Comparison of Machine Learning Algorithms for the Segmentation and Classification of Snow Micro Penetrometer Profiles on Arctic Sea Ice

10.5194/egusphere-egu21-15637 ◽

2021 ◽

Author(s):

Julia Kaltenborn ◽

Viviane Clay ◽

Amy R. Macfarlane ◽

Joshua Michael Lloyd King ◽

Martin Schneebeli

Keyword(s):

Machine Learning ◽

Sea Ice ◽

Arctic Sea Ice ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Snow Layer ◽

Arctic Sea ◽

Execution Speed

Snow-layer classification is an essential diagnostic task for a wide variety of cryospheric science and climate research applications. Traditionally, these measurements are made in snow pits, requiring trained operators and a substantial time commitment. The SnowMicroPen (SMP), a portable high-resolution snow penetrometer, has been demonstrated as a capable tool for rapid snow grain classification and layer type segmentation through statistical inversion of its mechanical signal. The manual classification of the SMP profiles requires time and training and becomes infeasible for large datasets.Here, we introduce a novel set of SMP measurements collected during the MOSAiC expedition and apply Machine Learning (ML) algorithms to automatically classify and segment SMP profiles of snow on Arctic sea ice. To this end, different supervised and unsupervised ML methods, including Random Forests, Support Vector Machines, Artificial Neural Networks, and k-means Clustering, are compared. A subsequent segmentation of the classified data results in distinct layers and snow grain markers for the SMP profiles. The models are trained with the dataset by King et al. (2020) and the MOSAiC SMP dataset. The MOSAiC dataset is a unique and extensive dataset characterizing seasonal and spatial variation of snow on the central Arctic sea-ice.We will test and compare the different algorithms and evaluate the algorithms&#8217; effectiveness based on the need for initial dataset labeling, execution speed, and ease of implementation. In particular, we will compare supervised to unsupervised methods, which are distinguished by their need for labeled training data.The implementation of different ML algorithms for SMP profile classification could provide a fast and automatic grain type classification and snow layer segmentation. Based on the gained knowledge from the algorithms&#8217; comparison, a tool can be built to provide scientists from different fields with an immediate SMP profile classification and segmentation.&#160;&#160;King, J., Howell, S., Brady, M., Toose, P., Derksen, C., Haas, C., & Beckers, J. (2020). Local-scale variability of snow density on Arctic sea ice. The Cryosphere, 14(12), 4323-4339, https://doi.org/10.5194/tc-14-4323-2020.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool

PeerJ ◽

10.7717/peerj.10083 ◽

2020 ◽

Vol 8 ◽

pp. e10083 ◽

Cited By ~ 1

Author(s):

Ashis Kumar Das ◽

Shiba Mishra ◽

Saji Saraswathy Gopalan

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Open Source ◽

Mortality Risk ◽

Machine Learning Algorithms ◽

Brier Score ◽

Gradient Boosting ◽

Support Vector ◽

Prediction Tool ◽

Online Prediction

Background The recent pandemic of CoVID-19 has emerged as a threat to global health security. There are very few prognostic models on CoVID-19 using machine learning. Objectives To predict mortality among confirmed CoVID-19 patients in South Korea using machine learning and deploy the best performing algorithm as an open-source online prediction tool for decision-making. Materials and Methods Mortality for confirmed CoVID-19 patients (n = 3,524) between January 20, 2020 and May 30, 2020 was predicted using five machine learning algorithms (logistic regression, support vector machine, K nearest neighbor, random forest and gradient boosting). The performance of the algorithms was compared, and the best performing algorithm was deployed as an online prediction tool. Results The logistic regression algorithm was the best performer in terms of discrimination (area under ROC curve = 0.830), calibration (Matthews Correlation Coefficient = 0.433; Brier Score = 0.036) and. The best performing algorithm (logistic regression) was deployed as the online CoVID-19 Community Mortality Risk Prediction tool named CoCoMoRP (https://ashis-das.shinyapps.io/CoCoMoRP/). Conclusions We describe the development and deployment of an open-source machine learning tool to predict mortality risk among CoVID-19 confirmed patients using publicly available surveillance data. This tool can be utilized by potential stakeholders such as health providers and policymakers to triage patients at the community level in addition to other approaches.

Download Full-text

A Novel GIS-Based Random Forest Machine Algorithm for the Spatial Prediction of Shallow Landslide Susceptibility

Forests ◽

10.3390/f11010118 ◽

2020 ◽

Vol 11 (1) ◽

pp. 118 ◽

Cited By ~ 6

Author(s):

Viet-Hung Dang ◽

Nhat-Duc Hoang ◽

Le-Mai-Duyen Nguyen ◽

Dieu Tien Bui ◽

Pijush Samui

Keyword(s):

Machine Learning ◽

Random Forest ◽

Landslide Susceptibility ◽

Spatial Prediction ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Conditioning Factors ◽

Susceptibility Modeling

This study developed and verified a new hybrid machine learning model, named random forest machine (RFM), for the spatial prediction of shallow landslides. RFM is a hybridization of two state-of-the-art machine learning algorithms, random forest classifier (RFC) and support vector machine (SVM), in which RFC is used to generate subsets from training data and SVM is used to build decision functions for these subsets. To construct and verify the hybrid RFM model, a shallow landslide database of the Lang Son area (northern Vietnam) was prepared. The database consisted of 101 shallow landslide polygons and 14 conditioning factors. The relevance of these factors for shallow landslide susceptibility modeling was assessed using the ReliefF method. Experimental results pointed out that the proposed RFM can help to achieve the desired prediction with an F1 score of roughly 0.96. The performance of the RFM was better than those of benchmark approaches, including the SVM, RFC, and logistic regression. Thus, the newly developed RFM is a promising tool to help local authorities in shallow landslide hazard mitigations.

Download Full-text

Spatial–Temporal Analysis of Land Cover Change at the Bento Rodrigues Dam Disaster Area Using Machine Learning Techniques

Remote Sensing ◽

10.3390/rs11212548 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2548

Author(s):

Dong Luo ◽

Douglas G. Goodin ◽

Marcellus M. Caldas

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Land Cover ◽

Decision Tree ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Disaster Area ◽

Mine Sites

Disasters are an unpredictable way to change land use and land cover. Improving the accuracy of mapping a disaster area at different time is an essential step to analyze the relationship between human activity and environment. The goals of this study were to test the performance of different processing procedures and examine the effect of adding normalized difference vegetation index (NDVI) as an additional classification feature for mapping land cover changes due to a disaster. Using Landsat ETM+ and OLI images of the Bento Rodrigues mine tailing disaster area, we created two datasets, one with six bands, and the other one with six bands plus the NDVI. We used support vector machine (SVM) and decision tree (DT) algorithms to build classifier models and validated models performance using 10-fold cross-validation, resulting in accuracies higher than 90%. The processed results indicated that the accuracy could reach or exceed 80%, and the support vector machine had a better performance than the decision tree. We also calculated each land cover type’s sensitivity (true positive rate) and found that Agriculture, Forest and Mine sites had higher values but Bareland and Water had lower values. Then, we visualized land cover maps in 2000 and 2017 and found out the Mine sites areas have been expanded about twice of the size, but Forest decreased 12.43%. Our findings showed that it is feasible to create a training data pool and use machine learning algorithms to classify a different year’s Landsat products and NDVI can improve the vegetation covered land classification. Furthermore, this approach can provide a venue to analyze land pattern change in a disaster area over time.

Download Full-text