scholarly journals Integrating Label Uncertainty in Ultrasound Image Classification using Weighted Support Vector Machines

2019 ◽  
Vol 5 (1) ◽  
pp. 285-287
Author(s):  
Jannis Hagenah ◽  
Sascha Leymann ◽  
Floris Ernst

AbstractInference from medical image data using machine learning still suffers from the disregard of label uncertainty. Usually, medical images are labeled by multiple experts. However, the uncertainty of this training data, assessible as the unity of opinions of observers, is neglected as training is commonly performed on binary decision labels. In this work, we present a novel method to incorporate this label uncertainty into the learning problem using weighted Support Vector Machines (wSVM). The idea is to assign an uncertainty score to each data point. The score is between 0 and 1 and is calculated based on the unity of opinions of all observers, where u = 1 if all observers have the same opinion and u = 0 if the observers opinions are exactly 50/50, with linear interpolation in between. This score is integrated in the Support Vector Machine (SVM) optimization as a weighting of errors made for the corresponding data point. For evaluation, we asked 15 observers to label 48 2D ultrasound images of aortic roots addressing whether the images show a healthy or a pathologically dilated anatomy, where the ground truth was known. As the observers were not trained experts, a high diversity of opinions was present in the data set. We performed image classification using both approaches, i.e. classical SVM and wSVM with integrated uncertainty weighting, utilizing 10-fold Cross Validation, respectively (linear kernel, C = 7). By incorporating the observer uncertainty, the classification accuracy could be improved by 3.1 percentage points (SVM: 83.5%, wSVM: 86.6%). This indicates that integrating information on the observers’ unity of opinions increases the generalization performance of the classifier and that uncertainty weighted wSVM could present a promising method for machine learning in the medical domain.

Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.


2020 ◽  
Author(s):  
Andrea Ferrario ◽  
Burcu Demiray ◽  
Kristina Yordanova ◽  
Minxia Luo ◽  
Mike Martin

BACKGROUND Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. OBJECTIVE The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. METHODS The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. RESULTS Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. CONCLUSIONS This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.


Author(s):  
Clyde Coelho ◽  
Aditi Chattopadhyay

This paper proposes a computationally efficient methodology for classifying damage in structural hotspots. Data collected from a sensor instrumented lug joint subjected to fatigue loading was preprocessed using a linear discriminant analysis (LDA) to extract features that are relevant for classification and reduce the dimensionality of the data. The data is then reduced in the feature space by analyzing the structure of the mapped clusters and removing the data points that do not affect the construction of interclass separating hyperplanes. The reduced data set is used to train a support vector machines (SVM) based classifier and the results of the classification problem are compared to those when the entire data set is used for training. To further improve the efficiency of the classification scheme, the SVM classifiers are arranged in a binary tree format to reduce the number of comparisons that are necessary. The experimental results show that the data reduction does not reduce the ability of the classifier to distinguish between classes while providing a nearly fourfold decrease in the amount of training data processed.


Author(s):  
M. Ustuner ◽  
F. B. Sanli ◽  
S. Abdikan

The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.


Author(s):  
M. Ustuner ◽  
F. B. Sanli ◽  
S. Abdikan

The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.


Mathematics ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 128
Author(s):  
Güvenç Arslan ◽  
Uğur Madran ◽  
Duygu Soyoğlu

In this note, we propose a novel classification approach by introducing a new clustering method, which is used as an intermediate step to discover the structure of a data set. The proposed clustering algorithm uses similarities and the concept of a clique to obtain clusters, which can be used with different strategies for classification. This approach also reduces the size of the training data set. In this study, we apply support vector machines (SVMs) after obtaining clusters with the proposed clustering algorithm. The proposed clustering algorithm is applied with different strategies for applying SVMs. The results for several real data sets show that the performance is comparable with the standard SVM while reducing the size of the training data set and also the number of support vectors.


10.2196/19133 ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. e19133 ◽  
Author(s):  
Andrea Ferrario ◽  
Burcu Demiray ◽  
Kristina Yordanova ◽  
Minxia Luo ◽  
Mike Martin

Background Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


Sign in / Sign up

Export Citation Format

Share Document