scholarly journals The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Zina Z. R. Al-Shamaa ◽  
Sefer Kurnaz ◽  
Adil Deniz Duru ◽  
Nadia Peppa ◽  
Alex H. Mirnezami ◽  
...  

Imbalanced class distribution in the medical dataset is a challenging task that hinders classifying disease correctly. It emerges when the number of healthy class instances being much larger than the disease class instances. To solve this problem, we proposed undersampling the healthy class instances to improve disease class classification. This model is named Hellinger Distance Undersampling (HDUS). It employs the Hellinger Distance to measure the resemblance between majority class instance and its neighbouring minority class instances to separate classes effectively and boost the discrimination power for each class. An extensive experiment has been conducted on four imbalanced medical datasets using three classifiers to compare HDUS with a baseline model and three state-of-the-art undersampling models. The outcomes display that HDUS can perform better than other models in terms of sensitivity, F1 measure, and balanced accuracy.

2017 ◽  
Vol 56 (05) ◽  
pp. 370-376 ◽  
Author(s):  
Roberto Pérez-Rodríguez ◽  
Luis E. Anido-Rifón ◽  
Marcos A. Mouriño-García

SummaryObjectives: The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of Wikipedia concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. We propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space.Methods: The performance of the classifier is compared to several baselines: a classifier based on machine translation, a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic an- notator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts.Results: The performance of our approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values < 0.0001.Conclusion: Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.


Author(s):  
Scott Blunsden ◽  
Robert Fisher

This chapter presents a way to classify interactions between people. Examples of the interactions we investigate are: people meeting one another, walking together, and fighting. A new feature set is proposed along with a corresponding classification method. Results are presented which show the new method performing significantly better than the previous state of the art method as proposed by Oliver et al. (2000).


Author(s):  
Li Rui ◽  
Zheng Shunyi ◽  
Duan Chenxi ◽  
Yang Yang ◽  
Wang Xiqi

In recent years, more and more researchers have gradually paid attention to Hyperspectral Image (HSI) classification. It is significant to implement researches on how to use HSI's sufficient spectral and spatial information to its fullest potential. To capture spectral and spatial features, we propose a Double-Branch Dual-Attention mechanism network (DBDA) for HSI classification in this paper, Two branches aer designed to extract spectral and spatial features separately to reduce the interferences between these two kinds of features. What is more, because distinguishing characteristics exist in the two branches, two types of attention mechanisms are applied in two branches above separately, ensuring to exploit spectral and spatial features more discriminatively. Finally, the extracted features are fused for classification. A series of empirical studies have been conducted on four hyperspectral datasets, and the results show that the proposed method performs better than the state-of-the-art method.


Plants ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1319
Author(s):  
Muhammad Hammad Saleem ◽  
Johan Potgieter ◽  
Khalid Mahmood Arif

Recently, plant disease classification has been done by various state-of-the-art deep learning (DL) architectures on the publicly available/author generated datasets. This research proposed the deep learning-based comparative evaluation for the classification of plant disease in two steps. Firstly, the best convolutional neural network (CNN) was obtained by conducting a comparative analysis among well-known CNN architectures along with modified and cascaded/hybrid versions of some of the DL models proposed in the recent researches. Secondly, the performance of the best-obtained model was attempted to improve by training through various deep learning optimizers. The comparison between various CNNs was based on performance metrics such as validation accuracy/loss, F1-score, and the required number of epochs. All the selected DL architectures were trained in the PlantVillage dataset which contains 26 different diseases belonging to 14 respective plant species. Keras with TensorFlow backend was used to train deep learning architectures. It is concluded that the Xception architecture trained with the Adam optimizer attained the highest validation accuracy and F1-score of 99.81% and 0.9978 respectively which is comparatively better than the previous approaches and it proves the novelty of the work. Therefore, the method proposed in this research can be applied to other agricultural applications for transparent detection and classification purposes.


2021 ◽  
Author(s):  
Sravya Sravya ◽  
Andriy Miranskyy ◽  
Ayse Bener

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.


2021 ◽  
Author(s):  
Sravya Sravya ◽  
Andriy Miranskyy ◽  
Ayse Bener

Software Bug Localization involves a significant amount of time and effort on the part of the software developer. Many state-of-the-art bug localization models have been proposed in the past, to help developers localize bugs easily. However, none of these models meet the adoption thresholds of the software practitioner. Recently some deep learning-based models have been proposed, that have been shown to perform better than the state-of-the-art models. With this motivation, we experiment on Convolution Neural Networks (CNNs) to examine their effectiveness in localizing bugs. We also train a SimpleLogistic model as a baseline model for our experiments. We train both our models on five open source Java projects and compare their performance across the projects. Our experiments show that the CNN models perform better than the SimpleLogistic models in most of the cases, but do not meet the adoption criteria set by the practitioners.


2020 ◽  
Vol 17 (1) ◽  
pp. 94-104
Author(s):  
Antonio F. Mottese ◽  
Maria R. Fede ◽  
Francesco Caridi ◽  
Giuseppe Sabatino ◽  
Giuseppe Marcianò ◽  
...  

Background and Objectives: In this work, yellow and green varieties of Cucumis melo fruits belonging to different cultivars were studied. In detail, three Sicilian cultivars of winter melons tutelated by TAP (Traditional agro-alimentary products) labels were considered, whereas asun protected the Calabrian winter melon was studied too. With the aim to compare the selective uptakes of inorganic elements among winter and summer fruits, the “PGI Melone Mantovano” was investigated. The purpose of this work was to apply the obtained results i) to guarantee the quality and healthiness of fruits, ii) to producers defend, iii) to help the customers in safe food purchase. Method: All samples were analyzed by ICP-MS and the obtained results, subsequently, were subjected to Cluster analysis (CA), Principal component analysis (PCA) and Canonical discriminant analysis (CDA). Results: CA results were generally in agreement with samples origin, whereas the PCA elaboration has confirmed the presence of a strong relation between fruit origins and trace element contents. In particular, two principal components justified the 57.32% of the total variance (PC1= 40.95%, PC2= 16.37%). Finally, the CDA approach has provided several functions with high discrimination power, confirmed by the correct classification of all samples (100%). Conclusions: CA, PCA and CDA could represent an integrated to label to discriminate the origin of agri-food products and, thus, protect and guarantee their healthiness.


Author(s):  
M. Selvi ◽  
K. Thangaramya ◽  
M. S. Saranya ◽  
K. Kulothungan ◽  
S. Ganapathy ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2503
Author(s):  
Taro Suzuki ◽  
Yoshiharu Amano

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.


2021 ◽  
Vol 13 (9) ◽  
pp. 1623
Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.


Sign in / Sign up

Export Citation Format

Share Document