data classification
Recently Published Documents


TOTAL DOCUMENTS

2947
(FIVE YEARS 1001)

H-INDEX

58
(FIVE YEARS 15)

2022 ◽  
pp. 016555152110695
Author(s):  
Ahmed Hamed ◽  
Mohamed Tahoun ◽  
Hamed Nassar

The original K-nearest neighbour ( KNN) algorithm was meant to classify homogeneous complete data, that is, data with only numerical features whose values exist completely. Thus, it faces problems when used with heterogeneous incomplete (HI) data, which has also categorical features and is plagued with missing values. Many solutions have been proposed over the years but most have pitfalls. For example, some solve heterogeneity by converting categorical features into numerical ones, inflicting structural damage. Others solve incompleteness by imputation or elimination, causing semantic disturbance. Almost all use the same K for all query objects, leading to misclassification. In the present work, we introduce KNNHI, a KNN-based algorithm for HI data classification that avoids all these pitfalls. Leveraging rough set theory, KNNHI preserves both categorical and numerical features, leaves missing values untouched and uses a different K for each query. The end result is an accurate classifier, as demonstrated by extensive experimentation on nine datasets mostly from the University of California Irvine repository, using a 10-fold cross-validation technique. We show that KNNHI outperforms six recently published KNN-based algorithms, in terms of precision, recall, accuracy and F-Score. In addition to its function as a mighty classifier, KNNHI can also serve as a K calculator, helping KNN-based algorithms that use a single K value for all queries that find the best such value. Sure enough, we show how four such algorithms improve their performance using the K obtained by KNNHI. Finally, KNNHI exhibits impressive resilience to the degree of incompleteness, degree of heterogeneity and the metric used to measure distance.


2022 ◽  
pp. 1-29
Author(s):  
Yancheng Lv ◽  
Lin Lin ◽  
Jie Liu ◽  
Hao Guo ◽  
Changsheng Tong

Abstract Most of the research on machine learning classification methods is based on balanced data; the research on imbalanced data classification needs improvement. Generative adversarial networks (GANs) are able to learn high-dimensional complex data distribution without relying on a prior hypothesis, which has become a hot technology in artificial intelligence. In this letter, we propose a new structure, classroom-like generative adversarial networks (CLGANs), to construct a model with multiple generators. Taking inspiration from the fact that teachers arrange teaching activities according to students' learning situation, we propose a weight allocation function to adaptively adjust the influence weight of generator loss function on discriminator loss function. All the generators work together to improve the degree of discriminator and training sample space, so that a discriminator with excellent performance is trained and applied to the tasks of imbalanced data classification. Experimental results on the Case Western Reserve University data set and 2.4 GHz Indoor Channel Measurements data set show that the data classification ability of the discriminator trained by CLGANs with multiple generators is superior to that of other imbalanced data classification models, and the optimal discriminator can be obtained by selecting the right matching scheme of the generator models.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 476
Author(s):  
S. Manimurugan ◽  
Saad Almutairi ◽  
Majed Mohammed Aborokbah ◽  
C. Narmatha ◽  
Subramaniam Ganesan ◽  
...  

Internet of Things (IoT) technology has recently been applied in healthcare systems as an Internet of Medical Things (IoMT) to collect sensor information for the diagnosis and prognosis of heart disease. The main objective of the proposed research is to classify data and predict heart disease using medical data and medical images. The proposed model is a medical data classification and prediction model that operates in two stages. If the result from the first stage is efficient in predicting heart disease, there is no need for stage two. In the first stage, data gathered from medical sensors affixed to the patient’s body were classified; then, in stage two, echocardiogram image classification was performed for heart disease prediction. A hybrid linear discriminant analysis with the modified ant lion optimization (HLDA-MALO) technique was used for sensor data classification, while a hybrid Faster R-CNN with SE-ResNet-101 modelwass used for echocardiogram image classification. Both classification methods were carried out, and the classification findings were consolidated and validated to predict heart disease. The HLDA-MALO method obtained 96.85% accuracy in detecting normal sensor data, and 98.31% accuracy in detecting abnormal sensor data. The proposed hybrid Faster R-CNN with SE-ResNeXt-101 transfer learning model performed better in classifying echocardiogram images, with 98.06% precision, 98.95% recall, 96.32% specificity, a 99.02% F-score, and maximum accuracy of 99.15%.


2022 ◽  
Vol 14 (1) ◽  
pp. 217
Author(s):  
Bishwas Praveen ◽  
Vineetha Menon

Hyperspectral remote sensing presents a unique big data research paradigm through its rich information captured across hundreds of spectral bands, which embodies vital spatial and temporal information about the underlying land cover. Deep-learning-based hyperspectral data analysis methodologies have made significant advancements over the past few years. Despite their success, most deep learning frameworks for hyperspectral data classification tend to suffer in terms of computational and classification efficacy as the data size increases. This is largely due to their equal emphasis criteria on the rich spectral information present in the data, albeit all of the spectral information not being essential for hyperspectral data analysis. On the contrary, this redundant information present in the spectral bands can deter the performance of hyperspectral data analysis techniques. Therefore, in this work, we propose a novel bidirectional spectral attention mechanism, which is computationally efficient and capable of adaptive spectral information diversification through selective emphasis on spectral bands that comprise more information and suppress the ones with lesser information. The concept of 3D-convolutions in tandem with bidirectional long short-term memory (LSTM) is used in the proposed architecture as spectral attention mechanism. A feedforward neural network (FNN)-based supervised classification is then performed to validate the performance of our proposed approach. Experimental results reveal that the proposed hyperspectral data analysis model with spectral attention mechanism outperforms other spatial- and spectral-information-extraction-based hyperspectral data analysis techniques compared.


2022 ◽  
pp. 147-172
Author(s):  
Saumendra Kumar Mohapatra ◽  
Mihir Narayan Mohanty

Sign in / Sign up

Export Citation Format

Share Document