An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Scattering Transform Framework for Unmixing of Hyperspectral Data

Remote Sensing ◽

10.3390/rs11232868 ◽

2019 ◽

Vol 11 (23) ◽

pp. 2868 ◽

Cited By ~ 1

Author(s):

Zeng ◽

Ritz ◽

Zhao ◽

Lan

Keyword(s):

Nearest Neighbor ◽

Hyperspectral Image ◽

Three Dimensional ◽

Synthetic Data ◽

Low Frequency ◽

Hyperspectral Data ◽

Training Data ◽

K Nearest Neighbor ◽

Scattering Transform ◽

Scattering Transforms

The scattering transform, which applies multiple convolutions using known filters targeting different scales of time or frequency, has a strong similarity to the structure of convolution neural networks (CNNs), without requiring training to learn the convolution filters, and has been used for hyperspectral image classification in recent research. This paper investigates the application of the scattering transform framework to hyperspectral unmixing (STFHU). While state-of-the-art research on unmixing hyperspectral data utilizing scattering transforms is limited, the proposed end-to-end method applies pixel-based scattering transforms and preliminary three-dimensional (3D) scattering transforms to hyperspectral images in the remote sensing scenario to extract feature vectors, which are then trained by employing the regression model based on the k-nearest neighbor (k-NN) to estimate the abundance of maps of endmembers. Experiments compare performances of the proposed algorithm with a series of existing methods in quantitative terms based on both synthetic data and real-world hyperspectral datasets. Results indicate that the proposed approach is more robust to additive noise, which is suppressed by utilizing the rich information in both high-frequency and low-frequency components represented by the scattering transform. Furthermore, the proposed method achieves higher accuracy for unmixing using the same amount of training data with all comparative approaches, while achieving equivalent performance to the best performing CNN method but using much less training data.

Download Full-text

Learning from Imbalanced Multi-label Data Sets by Using Ensemble Strategies

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v4i1.109 ◽

2015 ◽

Vol 4 (1) ◽

pp. 61-81

Author(s):

Mohammad Masoud Javidi

Keyword(s):

Logistic Regression ◽

Ensemble Learning ◽

Nearest Neighbor ◽

Imbalanced Data ◽

Classification Performance ◽

Training Data ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Stable Algorithm

Multi-label classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Problems of this type are ubiquitous in everyday life. Such as, a movie can be categorized as action, crime, and thriller. Most algorithms on multi-label classification learning are designed for balanced data and donâ€™t work well on imbalanced data. On the other hand, in real applications, most datasets are imbalanced. Therefore, we focused to improve multi-label classification performance on imbalanced datasets. In this paper, a state-of-the-art multi-label classification algorithm, which called IBLR_ML, is employed. This algorithm is produced from combination of k-nearest neighbor and logistic regression algorithms. Logistic regression part of this algorithm is combined with two ensemble learning algorithms, Bagging and Boosting. My approach is called IB-ELR. In this paper, for the first time, the ensemble bagging method whit stable learning as the base learner and imbalanced data sets as the training data is examined. Finally, to evaluate the proposed methods; they are implemented in JAVA language. Experimental results show the effectiveness of proposed methods. Keywords: Multi-label classification, Imbalanced data set, Ensemble learning, Stable algorithm, Logistic regression, Bagging, Boosting

Download Full-text

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w4-91-2017 ◽

2017 ◽

Vol XLII-4/W4 ◽

pp. 91-96

Author(s):

N. Jamshidpour ◽

S. Homayouni ◽

A. Safari

Keyword(s):

Image Classification ◽

Supervised Classification ◽

Spatial Information ◽

Hyperspectral Image ◽

Main Idea ◽

Hyperspectral Data ◽

Training Data ◽

Data Sets ◽

Hyperspectral Image Classification ◽

Data Set

Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.

Download Full-text

Comparison of Support Vector Machine and Random Forest Algorithms for Invasive and Expansive Species Classification Using Airborne Hyperspectral Data

Remote Sensing ◽

10.3390/rs12030516 ◽

2020 ◽

Vol 12 (3) ◽

pp. 516 ◽

Cited By ~ 7

Author(s):

Anita Sabat-Tomala ◽

Edwin Raczko ◽

Bogdan Zagajewski

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

Natural Ecosystem ◽

Data Set ◽

Calamagrostis Epigejos

Invasive and expansive plant species are considered a threat to natural biodiversity because of their high adaptability and low habitat requirements. Species investigated in this research, including Solidago spp., Calamagrostis epigejos, and Rubus spp., are successfully displacing native vegetation and claiming new areas, which in turn severely decreases natural ecosystem richness, as they rapidly encroach on protected areas (e.g., Natura 2000 habitats). Because of the damage caused, the European Union (EU) has committed all its member countries to monitor biodiversity. In this paper we compared two machine learning algorithms, Support Vector Machine (SVM) and Random Forest (RF), to identify Solidago spp., Calamagrostis epigejos, and Rubus spp. on HySpex hyperspectral aerial images. SVM and RF are reliable and well-known classifiers that achieve satisfactory results in the literature. Data sets containing 30, 50, 100, 200, and 300 pixels per class in the training data set were used to train SVM and RF classifiers. The classifications were performed on 430-spectral bands and on the most informative 30 bands extracted using the Minimum Noise Fraction (MNF) transformation. As a result, maps of the spatial distribution of analyzed species were achieved; high accuracies were observed for all data sets and classifiers (an average F1 score above 0.78). The highest accuracies were obtained using 30 MNF bands and 300 sample pixels per class in the training data set (average F1 score > 0.9). Lower training data set sample sizes resulted in decreased average F1 scores, up to 13 percentage points in the case of 30-pixel samples per class.

Download Full-text

Combined Clustering Methods for Microarray Data Analysis

Advanced Engineering Forum ◽

10.4028/www.scientific.net/aef.8-9.508 ◽

2013 ◽

Vol 8-9 ◽

pp. 508-515

Author(s):

Raul Malutan ◽

Pedro Gómez Vilda ◽

Monica Borda

Keyword(s):

Supervised Classification ◽

Nearest Neighbor ◽

Training Data ◽

Microarray Data Analysis ◽

Support Vector ◽

Data Sets ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Gene Shaving

Data classification has an important role in analyzing high dimensional data. In this paper Gene Shaving algorithm was used for a previous supervised classification and once the cluster information was obtained, data was classified again with supervised algorithms like Support Vector Machine (SVM) and k-Nearest Neighbor (k-NN) for an optimal clustering. These algorithms have proven to be useful when the classes of the training data and the attributes of each class are well established. The algorithms were run on several data sets, observing that the quality of the obtained clusters is dependent on the number of clusters specified.

Download Full-text

Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

Polymers ◽

10.3390/polym13213811 ◽

2021 ◽

Vol 13 (21) ◽

pp. 3811

Author(s):

Iosif Sorin Fazakas-Anca ◽

Arina Modrea ◽

Sorin Vlase

Keyword(s):

Experimental Data ◽

Nearest Neighbor ◽

Optimization Method ◽

Reactivity Ratios ◽

Data Sets ◽

K Nearest Neighbor ◽

Integration Algorithm ◽

Data Set ◽

Parametric Regression ◽

Non Parametric

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text

Detection and Characterization of Physical Activity and Psychological Stress from Wristband Data

Signals ◽

10.3390/signals1020011 ◽

2020 ◽

Vol 1 (2) ◽

pp. 188-208

Author(s):

Mert Sevil ◽

Mudassir Rashid ◽

Mohammad Reza Askari ◽

Zacharie Maloney ◽

Iman Hajizadeh ◽

...

Keyword(s):

Physical Activity ◽

Signal Processing ◽

Feature Extraction ◽

Psychological Stress ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Linear Discriminant ◽

Physiological Variables

Wearable devices continuously measure multiple physiological variables to inform users of health and behavior indicators. The computed health indicators must rely on informative signals obtained by processing the raw physiological variables with powerful noise- and artifacts-filtering algorithms. In this study, we aimed to elucidate the effects of signal processing techniques on the accuracy of detecting and discriminating physical activity (PA) and acute psychological stress (APS) using physiological measurements (blood volume pulse, heart rate, skin temperature, galvanic skin response, and accelerometer) collected from a wristband. Data from 207 experiments involving 24 subjects were used to develop signal processing, feature extraction, and machine learning (ML) algorithms that can detect and discriminate PA and APS when they occur individually or concurrently, classify different types of PA and APS, and estimate energy expenditure (EE). Training data were used to generate feature variables from the physiological variables and develop ML models (naïve Bayes, decision tree, k-nearest neighbor, linear discriminant, ensemble learning, and support vector machine). Results from an independent labeled testing data set demonstrate that PA was detected and classified with an accuracy of 99.3%, and APS was detected and classified with an accuracy of 92.7%, whereas the simultaneous occurrences of both PA and APS were detected and classified with an accuracy of 89.9% (relative to actual class labels), and EE was estimated with a low mean absolute error of 0.02 metabolic equivalent of task (MET).The data filtering and adaptive noise cancellation techniques used to mitigate the effects of noise and artifacts on the classification results increased the detection and discrimination accuracy by 0.7% and 3.0% for PA and APS, respectively, and by 18% for EE estimation. The results demonstrate the physiological measurements from wristband devices are susceptible to noise and artifacts, and elucidate the effects of signal processing and feature extraction on the accuracy of detection, classification, and estimation of PA and APS.

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text