Investigation of Algorithms for Converting Dimension of Feature Space in Retail Data Analysis Problems

Author(s):  
Nikita V. Popov ◽  
Natalya V. Razmochaeva ◽  
Dmitry M. Klionskiy
Keyword(s):  
Author(s):  
Ricardo G. Villar ◽  
Jigg L. Pelayo ◽  
Ray Mari N. Mozo ◽  
James B. Salig Jr. ◽  
Jojemar Bantugan

Leaning on the derived results conducted by Central Mindanao University Phil-LiDAR 2.B.11 Image Processing Component, the paper attempts to provides the application of the Light Detection and Ranging (LiDAR) derived products in arriving quality Landcover classification considering the theoretical approach of data analysis principles to minimize the common problems in image classification. These are misclassification of objects and the non-distinguishable interpretation of pixelated features that results to confusion of class objects due to their closely-related spectral resemblance, unbalance saturation of RGB information is a challenged at the same time. Only low density LiDAR point cloud data is exploited in the research denotes as 2 pts/m<sup>2</sup> of accuracy which bring forth essential derived information such as textures and matrices (number of returns, intensity textures, nDSM, etc.) in the intention of pursuing the conditions for selection characteristic. A novel approach that takes gain of the idea of object-based image analysis and the principle of allometric relation of two or more observables which are aggregated for each acquisition of datasets for establishing a proportionality function for data-partioning. In separating two or more data sets in distinct regions in a feature space of distributions, non-trivial computations for fitting distribution were employed to formulate the ideal hyperplane. Achieving the distribution computations, allometric relations were evaluated and match with the necessary rotation, scaling and transformation techniques to find applicable border conditions. Thus, a customized hybrid feature was developed and embedded in every object class feature to be used as classifier with employed hierarchical clustering strategy for cross-examining and filtering features. This features are boost using machine learning algorithms as trainable sets of information for a more competent feature detection. The product classification in this investigation was compared to a classification based on conventional object-oriented approach promoting straight-forward functionalities of the software eCognition. A compelling rise of efficiency in the overall accuracy (74.4% to 93.4%) and kappa index of agreement (70.5% to 91.7%) is noticeable based on the initial process. Nevertheless, having low-dense LiDAR dataset could be enough in generating exponential increase of performance in accuracy.


Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1316
Author(s):  
Kuiyong Song ◽  
Lianke Zhou ◽  
Hongbin Wang

Vigilance estimation of drivers is a hot research field of current traffic safety. Wearable devices can monitor information regarding the driver’s state in real time, which is then analyzed by a data analysis model to provide an estimation of vigilance. The accuracy of the data analysis model directly affects the effect of vigilance estimation. In this paper, we propose a deep coupling recurrent auto-encoder (DCRA) that combines electroencephalography (EEG) and electrooculography (EOG). This model uses a coupling layer to connect two single-modal auto-encoders to construct a joint objective loss function optimization model, which consists of single-modal loss and multi-modal loss. The single-modal loss is measured by Euclidean distance, and the multi-modal loss is measured by a Mahalanobis distance of metric learning, which can effectively reflect the distance between different modal data so that the distance between different modes can be described more accurately in the new feature space based on the metric matrix. In order to ensure gradient stability in the long sequence learning process, a multi-layer gated recurrent unit (GRU) auto-encoder model was adopted. The DCRA integrates data feature extraction and feature fusion. Relevant comparative experiments show that the DCRA is better than the single-modal method and the latest multi-modal fusion. The DCRA has a lower root mean square error (RMSE) and a higher Pearson correlation coefficient (PCC).


Author(s):  
Seyyed Ali Ahmadi ◽  
Nasser Mehrshad ◽  
Seyyed Mohammad Razavi

Containing hundreds of spectral bands (features), hyperspectral images (HSIs) have high ability in discrimination of land cover classes. Traditional HSIs data processing methods consider the same importance for all bands in the original feature space (OFS), while different spectral bands play different roles in identification of samples of different classes. In order to explore the relative importance of each feature, we learn a weighting matrix and obtain the relative weighted feature space (RWFS) as an enriched feature space for HSIs data analysis in this paper. To overcome the difficulty of limited labeled samples which is common case in HSIs data analysis, we extend our method to semisupervised framework. To transfer available knowledge to unlabeled samples, we employ graph based clustering where low rank representation (LRR) is used to define the similarity function for graph. After construction the RWFS, any arbitrary dimension reduction method and classification algorithm can be employed in RWFS. The experimental results on two well-known HSIs data set show that some dimension reduction algorithms have better performance in the new weighted feature space.


2020 ◽  
Vol 52 (3) ◽  
pp. 2583-2605
Author(s):  
Ludwig Lausser ◽  
Lisa M. Schäfer ◽  
Silke D. Kühlwein ◽  
Angelika M. R. Kestler ◽  
Hans A. Kestler

AbstractOrdinal classifier cascades are constrained by a hypothesised order of the semantic class labels of a dataset. This order determines the overall structure of the decision regions in feature space. Assuming the correct order on these class labels will allow a high generalisation performance, while an incorrect one will lead to diminished results. In this way ordinal classifier systems can facilitate explorative data analysis allowing to screen for potential candidate orders of the class labels. Previously, we have shown that screening is possible for total orders of all class labels. However, as datasets might comprise samples of ordinal as well as non-ordinal classes, the assumption of a total ordering might be not appropriate. An analysis of subsets of classes is required to detect such hidden ordinal substructures. In this work, we devise a novel screening procedure for exhaustive evaluations of all order permutations of all subsets of classes by bounding the number of enumerations we have to examine. Experiments with multi-class data from diverse applications revealed ordinal substructures that generate new and support known relations.


Author(s):  
Muhammad Amjad

Advances in manifold learning have proven to be of great benefit in reducing the dimensionality of large complex datasets. Elements in an intricate dataset will typically belong in high-dimensional space as the number of individual features or independent variables will be extensive. However, these elements can be integrated into a low-dimensional manifold with well-defined parameters. By constructing a low-dimensional manifold and embedding it into high-dimensional feature space, the dataset can be simplified for easier interpretation. In spite of this elemental dimensionality reduction, the dataset’s constituents do not lose any information, but rather filter it with the hopes of elucidating the appropriate knowledge. This paper will explore the importance of this method of data analysis, its applications, and its extensions into topological data analysis.


2020 ◽  
Vol 5 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Ting Xie ◽  
Ruihua Liu ◽  
Zhengyuan Wei

AbstractClustering as a fundamental unsupervised learning is considered an important method of data analysis, and K-means is demonstrably the most popular clustering algorithm. In this paper, we consider clustering on feature space to solve the low efficiency caused in the Big Data clustering by K-means. Different from the traditional methods, the algorithm guaranteed the consistency of the clustering accuracy before and after descending dimension, accelerated K-means when the clustering centeres and distance functions satisfy certain conditions, completely matched in the preprocessing step and clustering step, and improved the efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed algorithm.


Sign in / Sign up

Export Citation Format

Share Document