scholarly journals Uniform Manifold Approximation and Projection for Clustering Taxa through Vocalizations in a Neotropical Passerine (Rough-Legged Tyrannulet, Phyllomyias burmeisteri)

Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1406
Author(s):  
Ronald M. Parra-Hernández ◽  
Jorge I. Posada-Quintero ◽  
Orlando Acevedo-Charry ◽  
Hugo F. Posada-Quintero

Vocalizations from birds are a fruitful source of information for the classification of species. However, currently used analyses are ineffective to determine the taxonomic status of some groups. To provide a clearer grouping of taxa for such bird species from the analysis of vocalizations, more sensitive techniques are required. In this study, we have evaluated the sensitivity of the Uniform Manifold Approximation and Projection (UMAP) technique for grouping the vocalizations of individuals of the Rough-legged Tyrannulet Phyllomyias burmeisteri complex. Although the existence of two taxonomic groups has been suggested by some studies, the species has presented taxonomic difficulties in classification in previous studies. UMAP exhibited a clearer separation of groups than previously used dimensionality-reduction techniques (i.e., principal component analysis), as it was able to effectively identify the two taxa groups. The results achieved with UMAP in this study suggest that the technique can be useful in the analysis of species with complex in taxonomy through vocalizations data as a complementary tool including behavioral traits such as acoustic communication.

2021 ◽  
Vol 54 (4) ◽  
pp. 1-34
Author(s):  
Felipe L. Gewers ◽  
Gustavo R. Ferreira ◽  
Henrique F. De Arruda ◽  
Filipi N. Silva ◽  
Cesar H. Comin ◽  
...  

Principal component analysis (PCA) is often applied for analyzing data in the most diverse areas. This work reports, in an accessible and integrated manner, several theoretical and practical aspects of PCA. The basic principles underlying PCA, data standardization, possible visualizations of the PCA results, and outlier detection are subsequently addressed. Next, the potential of using PCA for dimensionality reduction is illustrated on several real-world datasets. Finally, we summarize PCA-related approaches and other dimensionality reduction techniques. All in all, the objective of this work is to assist researchers from the most diverse areas in using and interpreting PCA.


2019 ◽  
Vol 8 (S3) ◽  
pp. 66-71
Author(s):  
T. Sudha ◽  
P. Nagendra Kumar

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.


2014 ◽  
Vol 55 ◽  
Author(s):  
Kotryna Paulauskienė ◽  
Olga Kurasova

In this paper, the projection evaluation measures such as stress function, Spearman’s rho, Konig’s topology preservation, silhouette and Renyi entropy have been analyzed. The principal component analysis (PCA) and part–linear multidimensional projection (PLMP) techniques are used to reduce the dimensionality of the initial data set. The experiments have been carried out with seven real and artificial datasets. The experimental investigation has shown that several quality evaluationmeasures have to be used when dimension reduction problem is solved.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Alexandra-Maria Tăuţan ◽  
Alessandro C. Rossi ◽  
Ruben de Francisco ◽  
Bogdan Ionescu

AbstractMethods developed for automatic sleep stage detection make use of large amounts of data in the form of polysomnographic (PSG) recordings to build predictive models. In this study, we investigate the effect of several dimensionality reduction techniques, i.e., principal component analysis (PCA), factor analysis (FA), and autoencoders (AE) on common classifiers, e.g., random forests (RF), multilayer perceptron (MLP), long-short term memory (LSTM) networks, for automated sleep stage detection. Experimental testing is carried out on the MGH Dataset provided in the “You Snooze, You Win: The PhysioNet/Computing in Cardiology Challenge 2018”. The signals used as input are the six available (EEG) electoencephalographic channels and combinations with the other PSG signals provided: ECG – electrocardiogram, EMG – electromyogram, respiration based signals – respiratory efforts and airflow. We observe that a similar or improved accuracy is obtained in most cases when using all dimensionality reduction techniques, which is a promising result as it allows to reduce the computational load while maintaining performance and in some cases also improves the accuracy of automated sleep stage detection. In our study, using autoencoders for dimensionality reduction maintains the performance of the model, while using PCA and FA the accuracy of the models is in most cases improved.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


Sign in / Sign up

Export Citation Format

Share Document