Analysis of Wheat Samples Using the Calculation of Multifractal Spectrum

The computational analysis of wheat images to identify wheat varieties and quality has wide applications in agriculture and production. This paper presents an approach to the analysis and classiﬁcation of images of wheat samples obtained by the method of crystallization with additives. In tests 3 concentration and 4 times for each concentration were used, such that each type of wheat was characterized by 12 images. We used the images obtained for 5 classes. All the images have similar visual characteristics, that makes it diﬃcult to use statistical methods of analysis. The multifractal spectrum obtained by calculating the local density function was used as a classifying feature. The classiﬁcation was performed on a set of 60 wheat images corresponding to 5 different samples (classes) by various machine learning methods such as linear regression, naive Bayesian classiﬁer, support vector machine, and random forest. In some cases, to reduce the dimension of the feature space the method of principal components was applied. To identify the relationships between wheat samples obtained at different concentrations, 3 different clustering methods were used. The classiﬁcation results showed that the multifractal spectrum as classifying sign and using the random forest method in combination with the principal component analysis allow identifying wheat samples obtained by crystallization with additives, being the highest average classi- ﬁcation accuracy is 74 %.

Download Full-text

Extraction of Arecanut Planting Distribution Based on the Feature Space Optimization of PlanetScope Imagery

Agriculture ◽

10.3390/agriculture11040371 ◽

2021 ◽

Vol 11 (4) ◽

pp. 371

Author(s):

Yu Jin ◽

Jiawei Guo ◽

Huichun Ye ◽

Jinling Zhao ◽

Wenjiang Huang ◽

...

Keyword(s):

Random Forest ◽

Satellite Imagery ◽

Feature Space ◽

Kappa Coefficient ◽

Classification Model ◽

Support Vector ◽

Textural Feature ◽

Monitoring Accuracy ◽

Areca Catechu ◽

High Level

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.

Download Full-text

A Methodology Based on FT-IR Data Combined with Random Forest Model to Generate Spectralprints for the Characterization of High-Quality Vinegars

Foods ◽

10.3390/foods10061411 ◽

2021 ◽

Vol 10 (6) ◽

pp. 1411

Author(s):

José Luis P. Calle ◽

Marta Ferreiro-González ◽

Ana Ruiz-Rodríguez ◽

Gerardo F. Barbero ◽

José Á. Álvarez ◽

...

Keyword(s):

Random Forest ◽

Raw Materials ◽

Principal Component ◽

Hierarchical Cluster ◽

Raw Material ◽

Support Vector ◽

Protected Designation Of Origin ◽

Ft Ir

Sherry wine vinegar is a Spanish gourmet product under Protected Designation of Origin (PDO). Before a vinegar can be labeled as Sherry vinegar, the product must meet certain requirements as established by its PDO, which, in this case, means that it has been produced following the traditional solera and criadera ageing system. The quality of the vinegar is determined by many factors such as the raw material, the acetification process or the aging system. For this reason, mainly producers, but also consumers, would benefit from the employment of effective analytical tools that allow precisely determining the origin and quality of vinegar. In the present study, a total of 48 Sherry vinegar samples manufactured from three different starting wines (Palomino Fino, Moscatel, and Pedro Ximénez wine) were analyzed by Fourier-transform infrared (FT-IR) spectroscopy. The spectroscopic data were combined with unsupervised exploratory techniques such as hierarchical cluster analysis (HCA) and principal component analysis (PCA), as well as other nonparametric supervised techniques, namely, support vector machine (SVM) and random forest (RF), for the characterization of the samples. The HCA and PCA results present a clear grouping trend of the vinegar samples according to their raw materials. SVM in combination with leave-one-out cross-validation (LOOCV) successfully classified 100% of the samples, according to the type of wine used for their production. The RF method allowed selecting the most important variables to develop the characteristic fingerprint (“spectralprint”) of the vinegar samples according to their starting wine. Furthermore, the RF model reached 100% accuracy for both LOOCV and out-of-bag (OOB) sets.

Download Full-text

Multiscale Supervised Classification of Point Clouds with Urban and Forest Applications

Sensors ◽

10.3390/s19204523 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4523 ◽

Cited By ~ 1

Author(s):

Carlos Cabo ◽

Celestino Ordóñez ◽

Fernando Sáchez-Lasheras ◽

Javier Roca-Pardiñas ◽

and Javier de Cos-Juez

Keyword(s):

Random Forest ◽

Laser Scanning ◽

Supervised Classification ◽

Computing Time ◽

Principal Component ◽

Point Clouds ◽

Support Vector ◽

Linear Discriminant ◽

Vector Machines ◽

Input Variables

We analyze the utility of multiscale supervised classification algorithms for object detection and extraction from laser scanning or photogrammetric point clouds. Only the geometric information (the point coordinates) was considered, thus making the method independent of the systems used to collect the data. A maximum of five features (input variables) was used, four of them related to the eigenvalues obtained from a principal component analysis (PCA). PCA was carried out at six scales, defined by the diameter of a sphere around each observation. Four multiclass supervised classification models were tested (linear discriminant analysis, logistic regression, support vector machines, and random forest) in two different scenarios, urban and forest, formed by artificial and natural objects, respectively. The results obtained were accurate (overall accuracy over 80% for the urban dataset, and over 93% for the forest dataset), in the range of the best results found in the literature, regardless of the classification method. For both datasets, the random forest algorithm provided the best solution/results when discrimination capacity, computing time, and the ability to estimate the relative importance of each variable are considered together.

Download Full-text

Learning Target Class Feature Subspace (LTC-FS) Using Eigenspace Analysis and N-ary Search-Based Autonomous Hyperparameter Tuning for OCSVM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510150 ◽

2021 ◽

Author(s):

Sanjay Kumar Sonbhadra ◽

Sonali Agarwal ◽

P. Nagabhushan

Keyword(s):

Principal Component ◽

Feature Space ◽

Support Vector ◽

Feature Subset ◽

Target Class ◽

Significant Information ◽

Feature Extraction Method ◽

Specificity And Sensitivity ◽

Feature Subspace ◽

Novel Target

Existing dimensionality reduction (DR) techniques such as principal component analysis (PCA) and its variants are not suitable for target class mining due to the negligence of unique statistical properties of class-of-interest (CoI) samples. Conventionally, these approaches utilize higher or lower eigenvalued principal components (PCs) for data transformation; but the higher eigenvalued PCs may split the target class, whereas lower eigenvalued PCs do not contribute significant information and wrong selection of PCs leads to performance degradation. Considering these facts, the present research offers a novel target class-guided feature extraction method. In this approach, initially, the eigendecomposition is performed on variance–covariance matrix of only the target class samples, where the higher- and lower-valued eigenvectors are rejected via statistical analysis, and the selected eigenvectors are utilized to extract the most promising feature subspace. The extracted feature-subset gives a more tighter description of the CoI with enhanced associativity among target class samples and ensures the strong separation from nontarget class samples. One-class support vector machine (OCSVM) is evaluated to validate the performance of learned features. To obtain optimized values of hyperparameters of OCSVM a novel [Formula: see text]-ary search-based autonomous method is also proposed. Exhaustive experiments with a wide variety of datasets are performed in feature-space (original and reduced) and eigenspace (obtained from original and reduced features) to validate the performance of the proposed approach in terms of accuracy, precision, specificity and sensitivity.

Download Full-text

Weed recognition by SVM texture feature classification in outdoor vegetable crops images

Ingeniería e Investigación ◽

10.15446/ing.investig.v37n1.54703 ◽

2017 ◽

Vol 37 (1) ◽

pp. 68 ◽

Cited By ~ 13

Author(s):

Camilo Pulido Rojas ◽

Leonardo Solaque Guzmán ◽

Nelson Velasco Toledo

Keyword(s):

Scale Parameter ◽

Texture Feature ◽

Principal Component ◽

Feature Space ◽

Support Vector ◽

Gray Level ◽

Vegetable Crops ◽

Nonlinear Case ◽

Classifier Performance ◽

Weed Recognition

This paper presents a classification system for weeds and vegetables from outdoor crop images. The classifier is based on support vector machine (SVM) with its extension to nonlinear case using radial basis function (RBF) and optimizing its scale parameter σ to smooth the decision boundary. The feature space is the result of principal component analysis (PCA) for 10 texture measurements calculated from gray level co-occurrence matrices (GLCM). The results indicate that classifier performance is above 90%, validated with specificity, sensitivity and precision calculations.

Download Full-text

Arrhythmia Classification Based on Multiple Features Fusion and Random Forest Using ECG

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2019.2798 ◽

2019 ◽

Vol 9 (8) ◽

pp. 1645-1654

Author(s):

Zhizhong Wang ◽

Hongyi Li ◽

Chuang Han ◽

Songwei Wang ◽

Li Shi

Keyword(s):

Random Forest ◽

Wavelet Packet ◽

Back Propagation ◽

Principal Component ◽

Support Vector ◽

Features Fusion ◽

Specificity And Sensitivity ◽

Average Accuracy ◽

Skewness Coefficient ◽

Novel Method

Cardiovascular diseases have become more and more prominent in recent years, which have proven to be a major threat to people's health. Accurate detection of arrhythmia in patients has important implications for clinical treatment. The aim of this study was to propose a novel automatic classification method for arrhythmia in order to improve classification accuracy. The electrocardiogram (ECG) signal was subjected preprocessing for denoising purposes using a wavelet transform. Then, the local and global characteristics of the beat, which contained RR interval features according with the clinical diagnosis criterion, morphology features based on wavelet packet decomposition and statistical features along with kurtosis coefficient, skewness coefficient and variance are exploited and fused. Meanwhile, the dimensionality of wavelet packet coefficients were reduced via principal component analysis (PCA). Finally, these features were used as the input of the random forest classifier to train the model and were then compared with the support vector machine (SVM) and back propagation (BP) neural networks. Based on 100,647 beats from the MIT-BIH database, the proposed method achieved an average accuracy, specificity and sensitivity of 99.08%, 99.00% and 89.31%, respectively, using the intra-patient beats, and 92.31%, 89.98% and 37.47%, respectively, using the inter-patient beats. Moreover, two classification schemes, namely, inter-patient and intra-patient scheme, were validated. Compared with the other methods referred to in this paper, the performance of the novel method yielded better results.

Download Full-text

Somatic Cells Recognition by Application of Gabor Feature-Based (2D)2PCA

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417570099 ◽

2017 ◽

Vol 31 (12) ◽

pp. 1757009 ◽

Cited By ~ 2

Author(s):

Xiaojing Gao ◽

Heru Xue ◽

Xin Pan ◽

Xinhua Jiang ◽

Yanqing Zhou ◽

...

Keyword(s):

Gabor Filter ◽

Somatic Cells ◽

Bovine Mastitis ◽

Principal Component ◽

Feature Space ◽

Support Vector ◽

Large Set ◽

Novel Approach ◽

Gabor Feature ◽

Feature Based

In this paper, we propose a novel approach of Gabor feature based on bi-directional two-dimensional principal component analysis ((2D)2PCA) for somatic cells recognition. Firstly, Gabor features of different orientations and scales are extracted by the convolution of Gabor filter bank. Secondly, dimensionality reduction of the feature space applies (2D)2PCA in both row and column. Finally, the classifier uses Support Vector Machine (SVM) to achieve our goal. The experimental results are obtained using a large set of images from different sources. The results of our proposed method are not only efficient in accuracy and speed, but also robust to illumination in bovine mastitis via optical microscopy.

Download Full-text

Discrimination between Alzheimer's Disease and Mild Cognitive Impairment Using SOM and PSO-SVM

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/253670 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 11

Author(s):

Shih-Ting Yang ◽

Jiann-Der Lee ◽

Tzyh-Chyang Chang ◽

Chung-Hsien Huang ◽

Jiun-Jie Wang ◽

...

Keyword(s):

A Priori ◽

Principal Component ◽

Feature Space ◽

Support Vector ◽

Patient Classification ◽

Self Organizing Map ◽

Multiple Features ◽

Classification Framework ◽

Volumetric Features ◽

Processing Steps

In this study, an MRI-based classification framework was proposed to distinguish the patients with AD and MCI from normal participants by using multiple features and different classifiers. First, we extracted features (volume and shape) from MRI data by using a series of image processing steps. Subsequently, we applied principal component analysis (PCA) to convert a set of features of possibly correlated variables into a smaller set of values of linearly uncorrelated variables, decreasing the dimensions of feature space. Finally, we developed a novel data mining framework in combination with support vector machine (SVM) and particle swarm optimization (PSO) for the AD/MCI classification. In order to compare the hybrid method with traditional classifier, two kinds of classifiers, that is, SVM and a self-organizing map (SOM), were trained for patient classification. With the proposed framework, the classification accuracy is improved up to 82.35% and 77.78% in patients with AD and MCI. The result achieved up to 94.12% and 88.89% in AD and MCI by combining the volumetric features and shape features and using PCA. The present results suggest that novel multivariate methods of pattern matching reach a clinically relevant accuracy for the a priori prediction of the progression from MCI to AD.

Download Full-text

Towards a software defect proneness model: feature selection

Applied Aspects of Information Technology ◽

10.15276/aait.04.2021.5 ◽

2021 ◽

Vol 4 (4) ◽

pp. 354-365

Author(s):

Vitaliy S. Yakovyna ◽

◽

Ivan I. Symets

Keyword(s):

Principal Component Analysis ◽

Feature Selection ◽

Random Forest ◽

Software Reliability ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Tree Classifier ◽

Code Metrics ◽

Software Code

This article is focused on improving static models of software reliability based on using machine learning methods to select the software code metrics that most strongly affect its reliability. The study used a merged dataset from the PROMISE Software Engineering repository, which contained data on testing software modules of five programs and twenty-one code metrics. For the prepared sampling, the most important features that affect the quality of software code have been selected using the following methods of feature selection: Boruta, Stepwise selection, Exhaustive Feature Selection, Random Forest Importance, LightGBM Importance, Genetic Algorithms, Principal Component Analysis, Xverse python. Basing on the voting on the results of the work of the methods of feature selection, a static (deterministic) model of software reliability has been built, which establishes the relationship between the probability of a defect in the software module and the metrics of its code. It has been shown that this model includes such code metrics as branch count of a program, McCabe’s lines of code and cyclomatic complexity, Halstead’s total number of operators and operands, intelligence, volume, and effort value. A comparison of the effectiveness of different methods of feature selection has been put into practice, in particular, a study of the effect of the method of feature selection on the accuracy of classification using the following classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, Decision Tree classifier, AdaBoost classifier, Gradient Boosting for classification. It has been shown that the use of any method of feature selection increases the accuracy of classification by at least ten percent compared to the original dataset, which confirms the importance of this procedure for predicting software defects based on metric datasets that contain a significant number of highly correlated software code metrics. It has been found that the best accuracy of the forecast for most classifiers was reached using a set of features obtained from the proposed static model of software reliability. In addition, it has been shown that it is also possible to use separate methods, such as Autoencoder, Exhaustive Feature Selection and Principal Component Analysis with an insignificant loss of classification and prediction accuracy

Download Full-text

Reliable Identification of Oolong Tea Species: Nondestructive Testing Classification Based on Fluorescence Hyperspectral Technology and Machine Learning

Agriculture ◽

10.3390/agriculture11111106 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1106

Author(s):

Yan Hu ◽

Lijia Xu ◽

Peng Huang ◽

Xiong Luo ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Principal Component ◽

Classification Model ◽

Recursive Feature Elimination ◽

Support Vector ◽

K Nearest Neighbor ◽

Oolong Tea ◽

The Impact ◽

T Distribution

A rapid and nondestructive tea classification method is of great significance in today’s research. This study uses fluorescence hyperspectral technology and machine learning to distinguish Oolong tea by analyzing the spectral features of tea in the wavelength ranging from 475 to 1100 nm. The spectral data are preprocessed by multivariate scattering correction (MSC) and standard normal variable (SNV), which can effectively reduce the impact of baseline drift and tilt. Then principal component analysis (PCA) and t-distribution random neighborhood embedding (t-SNE) are adopted for feature dimensionality reduction and visual display. Random Forest-Recursive Feature Elimination (RF-RFE) is used for feature selection. Decision Tree (DT), Random Forest Classification (RFC), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are used to establish the classification model. The results show that MSC-RF-RFE-SVM is the best model for the classification of Oolong tea in which the accuracy of the training set and test set is 100% and 98.73%, respectively. It can be concluded that fluorescence hyperspectral technology and machine learning are feasible to classify Oolong tea.

Download Full-text