Unsupervised random forest for affinity estimation

Yunai Yi; Diya Sun; Peixin Li; Tae-Kyun Kim; Tianmin Xu; Yuru Pei

doi:10.1007/s41095-021-0241-9

Unsupervised random forest for affinity estimation

Computational Visual Media ◽

10.1007/s41095-021-0241-9 ◽

2021 ◽

Vol 8 (2) ◽

pp. 257-272

Author(s):

Yunai Yi ◽

Diya Sun ◽

Peixin Li ◽

Tae-Kyun Kim ◽

Tianmin Xu ◽

...

Keyword(s):

Random Forest ◽

Decision Trees ◽

State Of The Art ◽

High Dimensional ◽

Spatial Relationships ◽

Rank Deficiency ◽

Node Splitting ◽

The Common ◽

Cluster Compactness ◽

Parent Node

AbstractThis paper presents an unsupervised clustering random-forest-based metric for affinity estimation in large and high-dimensional data. The criterion used for node splitting during forest construction can handle rank-deficiency when measuring cluster compactness. The binary forest-based metric is extended to continuous metrics by exploiting both the common traversal path and the smallest shared parent node.The proposed forest-based metric efficiently estimates affinity by passing down data pairs in the forest using a limited number of decision trees. A pseudo-leaf-splitting (PLS) algorithm is introduced to account for spatial relationships, which regularizes affinity measures and overcomes inconsistent leaf assign-ments. The random-forest-based metric with PLS facilitates the establishment of consistent and point-wise correspondences. The proposed method has been applied to automatic phrase recognition using color and depth videos and point-wise correspondence. Extensive experiments demonstrate the effectiveness of the proposed method in affinity estimation in a comparison with the state-of-the-art.

Download Full-text

Classifying many-class high-dimensional fingerprint datasets using random forest of oblique decision trees

Vietnam Journal of Computer Science ◽

10.1007/s40595-014-0024-7 ◽

2014 ◽

Vol 2 (1) ◽

pp. 3-12 ◽

Cited By ~ 14

Author(s):

Thanh-Nghi Do ◽

Philippe Lenca ◽

Stéphane Lallich

Keyword(s):

Random Forest ◽

Decision Trees ◽

High Dimensional

Download Full-text

Review of the Applications of Deep Learning in Bioinformatics

Current Bioinformatics ◽

10.2174/1574893615999200711165743 ◽

2021 ◽

Vol 15 (8) ◽

pp. 898-911

Author(s):

Yongqing Zhang ◽

Jianrong Yan ◽

Siyu Chen ◽

Meiqin Gong ◽

Dongrui Gao ◽

...

Keyword(s):

Deep Learning ◽

Drug Discovery ◽

Biomedical Imaging ◽

State Of The Art ◽

Black Box ◽

Medical Data ◽

Biological Data ◽

High Dimensional ◽

Biological Research ◽

Process Data

Rapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.

Download Full-text

Caged-electron States and Split-electron States in Endohedral Alkali C60

Physical Chemistry Chemical Physics ◽

10.1039/d1cp01341f ◽

2021 ◽

Author(s):

yifan yang ◽

Lorenz S Cederbaum

Keyword(s):

State Of The Art ◽

Electronic States ◽

Electron States ◽

The Common ◽

High Level

The low-lying electronic states of neutral X@C60(X=Li, Na, K, Rb) have been computed and analyzed by employing state-of-the-art high level many-electron methods. Apart from the common charge-separated states, well known...

Download Full-text

Efficient Rank-Based Diffusion Process with Assured Convergence

Journal of Imaging ◽

10.3390/jimaging7030049 ◽

2021 ◽

Vol 7 (3) ◽

pp. 49

Author(s):

Daniel Carlos Guimarães Pedronette ◽

Lucas Pascotti Valem ◽

Longin Jan Latecki

Keyword(s):

Diffusion Process ◽

Learning Strategies ◽

State Of The Art ◽

Representation Learning ◽

Theoretical Background ◽

High Dimensional ◽

Visual Features ◽

Learning Approaches ◽

Previous Decade ◽

Asymptotic Complexity

Visual features and representation learning strategies experienced huge advances in the previous decade, mainly supported by deep learning approaches. However, retrieval tasks are still performed mainly based on traditional pairwise dissimilarity measures, while the learned representations lie on high dimensional manifolds. With the aim of going beyond pairwise analysis, post-processing methods have been proposed to replace pairwise measures by globally defined measures, capable of analyzing collections in terms of the underlying data manifold. The most representative approaches are diffusion and ranked-based methods. While the diffusion approaches can be computationally expensive, the rank-based methods lack theoretical background. In this paper, we propose an efficient Rank-based Diffusion Process which combines both approaches and avoids the drawbacks of each one. The obtained method is capable of efficiently approximating a diffusion process by exploiting rank-based information, while assuring its convergence. The algorithm exhibits very low asymptotic complexity and can be computed regionally, being suitable to outside of dataset queries. An experimental evaluation conducted for image retrieval and person re-ID tasks on diverse datasets demonstrates the effectiveness of the proposed approach with results comparable to the state-of-the-art.

Download Full-text

Investigation of Improved Cooperative Coevolution for Large-Scale Global Optimization Problems

Algorithms ◽

10.3390/a14050146 ◽

2021 ◽

Vol 14 (5) ◽

pp. 146

Author(s):

Aleksei Vakhnin ◽

Evgenii Sopov

Keyword(s):

Global Optimization ◽

Evolutionary Algorithms ◽

Numerical Experiments ◽

Large Scale ◽

Optimization Problems ◽

State Of The Art ◽

Fixed Number ◽

High Dimensional ◽

Cooperative Coevolution ◽

Large Scale Problems

Modern real-valued optimization problems are complex and high-dimensional, and they are known as “large-scale global optimization (LSGO)” problems. Classic evolutionary algorithms (EAs) perform poorly on this class of problems because of the curse of dimensionality. Cooperative Coevolution (CC) is a high-performed framework for performing the decomposition of large-scale problems into smaller and easier subproblems by grouping objective variables. The efficiency of CC strongly depends on the size of groups and the grouping approach. In this study, an improved CC (iCC) approach for solving LSGO problems has been proposed and investigated. iCC changes the number of variables in subcomponents dynamically during the optimization process. The SHADE algorithm is used as a subcomponent optimizer. We have investigated the performance of iCC-SHADE and CC-SHADE on fifteen problems from the LSGO CEC’13 benchmark set provided by the IEEE Congress of Evolutionary Computation. The results of numerical experiments have shown that iCC-SHADE outperforms, on average, CC-SHADE with a fixed number of subcomponents. Also, we have compared iCC-SHADE with some state-of-the-art LSGO metaheuristics. The experimental results have shown that the proposed algorithm is competitive with other efficient metaheuristics.

Download Full-text

Research of Medical High-Dimensional Imbalanced Data Classification Ensemble Feature Selection Algorithm with Random Forest

2017 International Conference on Smart Grid and Electrical Automation (ICSGEA) ◽

10.1109/icsgea.2017.158 ◽

2017 ◽

Cited By ~ 2

Author(s):

Min Zhu ◽

Bo Su ◽

Gangmin Ning

Keyword(s):

Feature Selection ◽

Random Forest ◽

Imbalanced Data ◽

Data Classification ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Imbalanced Data Classification

Download Full-text

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2012040103 ◽

2012 ◽

Vol 8 (2) ◽

pp. 44-63 ◽

Cited By ~ 30

Author(s):

Baoxun Xu ◽

Joshua Zhexue Huang ◽

Graham Williams ◽

Qiang Wang ◽

Yunming Ye

Keyword(s):

Random Forest ◽

High Dimensional Data ◽

Real Life ◽

Classification Performance ◽

Feature Weighting ◽

Random Forest Model ◽

High Dimensional ◽

Forest Model ◽

Forest Models ◽

Random Forest Models

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Download Full-text

Canopy Height Estimation at Landsat Resolution Using Convolutional Neural Networks

Machine Learning and Knowledge Extraction ◽

10.3390/make2010003 ◽

2020 ◽

Vol 2 (1) ◽

pp. 23-36

Author(s):

Syed Aamir Ali Shah ◽

Muhammad Asif Manzoor ◽

Abdul Bais

Keyword(s):

Neural Networks ◽

Random Forest ◽

Convolutional Neural Networks ◽

Forest Structure ◽

State Of The Art ◽

Spatial Association ◽

Canopy Height ◽

Landsat Images ◽

Structure Estimation ◽

Vegetation Height

Forest structure estimation is very important in geological, ecological and environmental studies. It provides the basis for the carbon stock estimation and effective means of sequestration of carbon sources and sinks. Multiple parameters are used to estimate the forest structure like above ground biomass, leaf area index and diameter at breast height. Among all these parameters, vegetation height has unique standing. In addition to forest structure estimation it provides the insight into long term historical changes and the estimates of stand age of the forests as well. There are multiple techniques available to estimate the canopy height. Light detection and ranging (LiDAR) based methods, being the accurate and useful ones, are very expensive to obtain and have no global coverage. There is a need to establish a mechanism to estimate the canopy height using freely available satellite imagery like Landsat images. Multiple studies are available which contribute in this area. The majority use Landsat images with random forest models. Although random forest based models are widely used in remote sensing applications, they lack the ability to utilize the spatial association of neighboring pixels in modeling process. In this research work, we define Convolutional Neural Network based model and analyze that model for three test configurations. We replicate the random forest based setup of Grant et al., which is a similar state-of-the-art study, and compare our results and show that the convolutional neural networks (CNN) based models not only capture the spatial association of neighboring pixels but also outperform the state-of-the-art.

Download Full-text

Random Forest with Adaptive Local Template for Pedestrian Detection

Mathematical Problems in Engineering ◽

10.1155/2015/767423 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Tao Xiang ◽

Tao Li ◽

Mao Ye ◽

Zijian Liu

Keyword(s):

Computer Vision ◽

Random Forest ◽

Classification Accuracy ◽

Template Matching ◽

Detection Method ◽

State Of The Art ◽

Pedestrian Detection ◽

Sliding Window ◽

Experimental Results ◽

Training Samples

Pedestrian detection with large intraclass variations is still a challenging task in computer vision. In this paper, we propose a novel pedestrian detection method based on Random Forest. Firstly, we generate a few local templates with different sizes and different locations in positive exemplars. Then, the Random Forest is built whose splitting functions are optimized by maximizing class purity of matching the local templates to the training samples, respectively. To improve the classification accuracy, we adopt a boosting-like algorithm to update the weights of the training samples in a layer-wise fashion. During detection, the trained Random Forest will vote the category when a sliding window is input. Our contributions are the splitting functions based on local template matching with adaptive size and location and iteratively weight updating method. We evaluate the proposed method on 2 well-known challenging datasets: TUD pedestrians and INRIA pedestrians. The experimental results demonstrate that our method achieves state-of-the-art or competitive performance.

Download Full-text

A hybrid feature selection algorithm combining ReliefF and Particle swarm optimization for high-dimensional medical data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202948 ◽

2021 ◽

pp. 1-15

Author(s):

Zhaozhao Xu ◽

Derong Shen ◽

Yue Kou ◽

Tiezheng Nie

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Random Forest ◽

Classification Accuracy ◽

Particle Swarm ◽

Medical Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Swarm Optimization

Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.

Download Full-text