Integrating Label Uncertainty in Ultrasound Image Classification using Weighted Support Vector Machines

AbstractInference from medical image data using machine learning still suffers from the disregard of label uncertainty. Usually, medical images are labeled by multiple experts. However, the uncertainty of this training data, assessible as the unity of opinions of observers, is neglected as training is commonly performed on binary decision labels. In this work, we present a novel method to incorporate this label uncertainty into the learning problem using weighted Support Vector Machines (wSVM). The idea is to assign an uncertainty score to each data point. The score is between 0 and 1 and is calculated based on the unity of opinions of all observers, where u = 1 if all observers have the same opinion and u = 0 if the observers opinions are exactly 50/50, with linear interpolation in between. This score is integrated in the Support Vector Machine (SVM) optimization as a weighting of errors made for the corresponding data point. For evaluation, we asked 15 observers to label 48 2D ultrasound images of aortic roots addressing whether the images show a healthy or a pathologically dilated anatomy, where the ground truth was known. As the observers were not trained experts, a high diversity of opinions was present in the data set. We performed image classification using both approaches, i.e. classical SVM and wSVM with integrated uncertainty weighting, utilizing 10-fold Cross Validation, respectively (linear kernel, C = 7). By incorporating the observer uncertainty, the classification accuracy could be improved by 3.1 percentage points (SVM: 83.5%, wSVM: 86.6%). This indicates that integrating information on the observers’ unity of opinions increases the generalization performance of the classifier and that uncertainty weighted wSVM could present a promising method for machine learning in the medical domain.

Download Full-text

Prediction of Heart Disease using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1081.0982s1019 ◽

2019 ◽

Vol 8 (2S10) ◽

pp. 474-477

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Support Vector Machines ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Set ◽

Vector Machines ◽

Naive Bayes Classification ◽

Naïve Bayes Classification

Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.

Download Full-text

Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning (Preprint)

10.2196/preprints.19133 ◽

2020 ◽

Author(s):

Andrea Ferrario ◽

Burcu Demiray ◽

Kristina Yordanova ◽

Minxia Luo ◽

Mike Martin

Keyword(s):

Machine Learning ◽

Older Adults ◽

Support Vector Machines ◽

Learning Strategies ◽

Support Vector ◽

Bag Of Words ◽

Word Embeddings ◽

Data Set ◽

Extreme Gradient Boosting ◽

Vector Machines

BACKGROUND Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. OBJECTIVE The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. METHODS The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. RESULTS Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. CONCLUSIONS This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.

Download Full-text

Feature Reduction for Computationally Efficient Damage State Classification Using Binary Tree Support Vector Machines

Smart Materials, Adaptive Structures and Intelligent Systems, Volume 2 ◽

10.1115/smasis2008-640 ◽

2008 ◽

Author(s):

Clyde Coelho ◽

Aditi Chattopadhyay

Keyword(s):

Support Vector Machines ◽

Binary Tree ◽

Feature Reduction ◽

Training Data ◽

Support Vector ◽

Computationally Efficient ◽

Damage State ◽

Data Set ◽

Linear Discriminant ◽

Vector Machines

This paper proposes a computationally efficient methodology for classifying damage in structural hotspots. Data collected from a sensor instrumented lug joint subjected to fatigue loading was preprocessed using a linear discriminant analysis (LDA) to extract features that are relevant for classification and reduce the dimensionality of the data. The data is then reduced in the feature space by analyzing the structure of the mapped clusters and removing the data points that do not affect the construction of interclass separating hyperplanes. The reduced data set is used to train a support vector machines (SVM) based classifier and the results of the classification problem are compared to those when the entire data set is used for training. To further improve the efficiency of the classification scheme, the SVM classifiers are arranged in a binary tree format to reduce the number of comparisons that are necessary. The experimental results show that the data reduction does not reduce the ability of the classifier to distinguish between classes while providing a nearly fourfold decrease in the amount of training data processed.

Download Full-text

BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-379-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 379-384 ◽

Cited By ~ 1

Author(s):

M. Ustuner ◽

F. B. Sanli ◽

S. Abdikan

Keyword(s):

Support Vector Machines ◽

Image Classification ◽

Landscape Heterogeneity ◽

Sample Selection ◽

Training Data ◽

Support Vector ◽

Training Set ◽

Learning Stage ◽

Vector Machines ◽

Imbalanced Training Data

The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.

Download Full-text

BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-379-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 379-384

Author(s):

M. Ustuner ◽

F. B. Sanli ◽

S. Abdikan

Keyword(s):

Support Vector Machines ◽

Image Classification ◽

Landscape Heterogeneity ◽

Sample Selection ◽

Training Data ◽

Support Vector ◽

Training Set ◽

Learning Stage ◽

Vector Machines ◽

Imbalanced Training Data

Download Full-text

An Algebraic Approach to Clustering and Classification with Support Vector Machines

Mathematics ◽

10.3390/math10010128 ◽

2022 ◽

Vol 10 (1) ◽

pp. 128

Author(s):

Güvenç Arslan ◽

Uğur Madran ◽

Duygu Soyoğlu

Keyword(s):

Support Vector Machines ◽

Clustering Algorithm ◽

Algebraic Approach ◽

Real Data ◽

Training Data ◽

Support Vector ◽

Intermediate Step ◽

Data Set ◽

Vector Machines ◽

Clustering And Classification

In this note, we propose a novel classification approach by introducing a new clustering method, which is used as an intermediate step to discover the structure of a data set. The proposed clustering algorithm uses similarities and the concept of a clique to obtain clusters, which can be used with different strategies for classification. This approach also reduces the size of the training data set. In this study, we apply support vector machines (SVMs) after obtaining clusters with the proposed clustering algorithm. The proposed clustering algorithm is applied with different strategies for applying SVMs. The results for several real data sets show that the performance is comparable with the standard SVM while reducing the size of the training data set and also the number of support vectors.

Download Full-text

Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning

Journal of Medical Internet Research ◽

10.2196/19133 ◽

2020 ◽

Vol 22 (9) ◽

pp. e19133 ◽

Cited By ~ 1

Author(s):

Andrea Ferrario ◽

Burcu Demiray ◽

Kristina Yordanova ◽

Minxia Luo ◽

Mike Martin

Keyword(s):

Machine Learning ◽

Older Adults ◽

Support Vector Machines ◽

Learning Strategies ◽

Support Vector ◽

Bag Of Words ◽

Word Embeddings ◽

Data Set ◽

Extreme Gradient Boosting ◽

Vector Machines

Background Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.

Download Full-text

A comparison study: Support vector machines for binary classification in machine learning

2011 4th International Conference on Biomedical Engineering and Informatics (BMEI) ◽

10.1109/bmei.2011.6098517 ◽

2011 ◽

Cited By ~ 4

Author(s):

Wencai Zeng ◽

Jiong Jia ◽

Zhonglong Zheng ◽

Chenmao Xie ◽

Li Guo

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Binary Classification ◽

Support Vector ◽

Comparison Study ◽

Vector Machines ◽

Study Support

Download Full-text

A machine learning based method for classification of fractal features of forearm sEMG using Twin Support vector machines

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5627902 ◽

2010 ◽

Cited By ~ 12

Author(s):

S P Arjunan ◽

D K Kumar ◽

G R Naik

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Support Vector ◽

Twin Support Vector Machines ◽

Vector Machines

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text