adjusted rand index
Recently Published Documents


TOTAL DOCUMENTS

76
(FIVE YEARS 48)

H-INDEX

8
(FIVE YEARS 3)

Computers ◽  
2022 ◽  
Vol 11 (1) ◽  
pp. 13
Author(s):  
Imran Zualkernan ◽  
Salam Dhou ◽  
Jacky Judas ◽  
Ali Reza Sajun ◽  
Brylle Ryan Gomez ◽  
...  

Camera traps deployed in remote locations provide an effective method for ecologists to monitor and study wildlife in a non-invasive way. However, current camera traps suffer from two problems. First, the images are manually classified and counted, which is expensive. Second, due to manual coding, the results are often stale by the time they get to the ecologists. Using the Internet of Things (IoT) combined with deep learning represents a good solution for both these problems, as the images can be classified automatically, and the results immediately made available to ecologists. This paper proposes an IoT architecture that uses deep learning on edge devices to convey animal classification results to a mobile app using the LoRaWAN low-power, wide-area network. The primary goal of the proposed approach is to reduce the cost of the wildlife monitoring process for ecologists, and to provide real-time animal sightings data from the camera traps in the field. Camera trap image data consisting of 66,400 images were used to train the InceptionV3, MobileNetV2, ResNet18, EfficientNetB1, DenseNet121, and Xception neural network models. While performance of the trained models was statistically different (Kruskal–Wallis: Accuracy H(5) = 22.34, p < 0.05; F1-score H(5) = 13.82, p = 0.0168), there was only a 3% difference in the F1-score between the worst (MobileNet V2) and the best model (Xception). Moreover, the models made similar errors (Adjusted Rand Index (ARI) > 0.88 and Adjusted Mutual Information (AMU) > 0.82). Subsequently, the best model, Xception (Accuracy = 96.1%; F1-score = 0.87; F1-Score = 0.97 with oversampling), was optimized and deployed on the Raspberry Pi, Google Coral, and Nvidia Jetson edge devices using both TenorFlow Lite and TensorRT frameworks. Optimizing the models to run on edge devices reduced the average macro F1-Score to 0.7, and adversely affected the minority classes, reducing their F1-score to as low as 0.18. Upon stress testing, by processing 1000 images consecutively, Jetson Nano, running a TensorRT model, outperformed others with a latency of 0.276 s/image (s.d. = 0.002) while consuming an average current of 1665.21 mA. Raspberry Pi consumed the least average current (838.99 mA) with a ten times worse latency of 2.83 s/image (s.d. = 0.036). Nano was the only reasonable option as an edge device because it could capture most animals whose maximum speeds were below 80 km/h, including goats, lions, ostriches, etc. While the proposed architecture is viable, unbalanced data remain a challenge and the results can potentially be improved by using object detection to reduce imbalances and by exploring semi-supervised learning.


Author(s):  
Mateusz Tomal ◽  
Marco Helbich

How the COVID-19 pandemic has altered the segmentation of residential rental markets is largely unknown. We therefore assessed rental housing submarkets before and during the pandemic in Cracow, Poland. We used geographically and temporally weighted regression to investigate the marginal prices of housing attributes over space–time. The marginal prices were further reduced to a few principal components per time period and spatially clustered to identify housing submarkets. Finally, we applied the adjusted Rand index to evaluate the spatiotemporal stability of the housing submarkets. The results revealed that the pandemic outbreak significantly lowered rents and modified the relevance of some housing characteristics for rental prices. Proximity to the university was no longer among the residential amenities during the pandemic. Similarly, the virus outbreak diminished the effect of a housing unit’s proximity to the city center. The market partitioning showed that the number of Cracow’s residential rental submarkets increased significantly as a result of the COVID-19 pandemic, as it enhanced the spatial variation in the marginal prices of covariates. Our findings suggest that the emergence of the coronavirus reshaped the residential rental market in three ways: Rents were decreased, the underlying rental price-determining factors changed, and the spatiotemporal submarket structure was altered.


Author(s):  
K Laskhmaiah ◽  
◽  
S Murali Krishna ◽  
B Eswara Reddy

From massive and complex spatial database, the useful information and knowledge are extracted using spatial data mining. To analyze the complexity, efficient clustering algorithm for spatial database has been used in this area of research. The geographic areas containing spatial points are discovered using clustering methods in many applications. With spatial attributes, the spatial clustering problem have been designed using many approaches, but nonoverlapping constraints are not considered. Most existing data mining algorithms suffer in high dimensions. With nonoverlapping named as Non Overlapping Constraint based Optimized K-Means with Density and Distance-based Clustering (NOC-OKMDDC),a multidimensional optimization clustering is designed to solve this problem by the proposed system and the clusters with diverse shapes and densities in spatial databases are fast found. Proposed method consists of three main phases. Using weighted convolutional Neural Networks(Weighted CNN), attributes are reduced from the multidimensional dataset in this first phase. A partition-based algorithm (K-means) used by Optimized KMeans with Density and Distance-based Clustering (OKMDD) and several relatively small spherical or ball-shaped sub clusters are made by Clustering the dataset in this second phase. The optimal sub cluster count is performed with the help of Adaptive Adjustment Factor based Glowworm Swarm Optimization algorithm (AAFGSO). Then the proposed system designed an Enhanced Penalized Spatial Distance (EPSD) Measure to satisfy the non-overlapping condition. According to the spatial attribute values, the spatial distance between two points are well adjusted to achieving the EPSD. In third phase, to merge sub clusters the proposed system utilizes the Density based clustering with relative distance scheme. In terms of adjusted rand index, rand index, mirkins index and huberts index, better performance is achieved by proposed system when compared to the existing system which is shown by experimental result.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chenyang Xu ◽  
Lei Cai ◽  
Jingyang Gao

Abstract Background Single-cell sequencing technology can address the amount of single-cell library data at the same time and display the heterogeneity of different cells. However, analyzing single-cell data is a computationally challenging problem. Because there are low counts in the gene expression region, it has a high chance of recognizing the non-zero entity as zero, which are called dropout events. At present, the mainstream dropout imputation methods cannot effectively recover the true expression of cells from dropout noise such as DCA, MAGIC, scVI, scImpute and SAVER. Results In this paper, we propose an autoencoder structure network, named GNNImpute. GNNImpute uses graph attention convolution to aggregate multi-level similar cell information and implements convolution operations on non-Euclidean space on scRNA-seq data. Distinct from current imputation tools, GNNImpute can accurately and effectively impute the dropout and reduce dropout noise. We use mean square error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC) and Cosine similarity (CS) to measure the performance of different methods with GNNImpute. We analyze four real datasets, and our results show that the GNNImpute achieves 3.0130 MSE, 0.6781 MAE, 0.9073 PCC and 0.9134 CS. Furthermore, we use Adjusted rand index (ARI) and Normalized mutual information (NMI) to measure the clustering effect. The GNNImpute achieves 0.8199 (ARI) and 0.8368 (NMI), respectively. Conclusions In this investigation, we propose a single-cell dropout imputation method (GNNImpute), which effectively utilizes shared information for imputing the dropout of scRNA-seq data. We test it with different real datasets and evaluate its effectiveness in MSE, MAE, PCC and CS. The results show that graph attention convolution and autoencoder structure have great potential in single-cell dropout imputation.


2021 ◽  
Vol 28 (4) ◽  
pp. 255-267
Author(s):  
Ruizhe Ma ◽  
Xiaoping Zhu ◽  
Li Yan

Information uncertainty extensively exists in the real-world applications, and uncertain data process and analysis have been a crucial issue in the area of data and knowledge engineering. In this paper, we concentrate on uncertain time series data clustering, in which the uncertain values at time points are represented by probability density function. We propose a hybrid clustering approach for uncertain time series. Our clustering approach first partitions the uncertain time series data into a set of micro-clusters and then merges the micro-clusters following the idea of hierarchical clustering. We evaluate our approach with experiments. The experimental results show that, compared with the traditional UK-means clustering algorithm, the Adjusted Rand Index (ARI) of our clustering results have an obviously higher accuracy. In addition, the time efficiency of our clustering approach is significantly improved.


Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1249
Author(s):  
Jinwon Heo ◽  
Jangsun Baek

Along with advances in technology, matrix data, such as medical/industrial images, have emerged in many practical fields. These data usually have high dimensions and are not easy to cluster due to their intrinsic correlated structure among rows and columns. Most approaches convert matrix data to multi dimensional vectors and apply conventional clustering methods to them, and thus, suffer from an extreme high-dimensionality problem as well as a lack of interpretability of the correlated structure among row/column variables. Recently, a regularized model was proposed for clustering matrix-valued data by imposing a sparsity structure for the mean signal of each cluster. We extend their approach by regularizing further on the covariance to cope better with the curse of dimensionality for large size images. A penalized matrix normal mixture model with lasso-type penalty terms in both mean and covariance matrices is proposed, and then an expectation maximization algorithm is developed to estimate the parameters. The proposed method has the competence of both parsimonious modeling and reflecting the proper conditional correlation structure. The estimators are consistent, and their limiting distributions are derived. We applied the proposed method to simulated data as well as real datasets and measured its clustering performance with the clustering accuracy (ACC) and the adjusted rand index (ARI). The experiment results show that the proposed method performed better with higher ACC and ARI than those of conventional methods.


Author(s):  
Vivek Mehta ◽  
Seema Bawa ◽  
Jasmeet Singh

AbstractA massive amount of textual data now exists in digital repositories in the form of research articles, news articles, reviews, Wikipedia articles, and books, etc. Text clustering is a fundamental data mining technique to perform categorization, topic extraction, and information retrieval. Textual datasets, especially which contain a large number of documents are sparse and have high dimensionality. Hence, traditional clustering techniques such as K-means, Agglomerative clustering, and DBSCAN cannot perform well. In this paper, a clustering technique especially suitable to large text datasets is proposed that overcome these limitations. The proposed technique is based on word embeddings derived from a recent deep learning model named “Bidirectional Encoders Representations using Transformers”. The proposed technique is named as WEClustering. The proposed technique deals with the problem of high dimensionality in an effective manner, hence, more accurate clusters are formed. The technique is validated on several datasets of varying sizes and its performance is compared with other widely used and state of the art clustering techniques. The experimental comparison shows that the proposed clustering technique gives a significant improvement over other techniques as measured by metrics such Purity and Adjusted Rand Index.


2021 ◽  
pp. 1-13
Author(s):  
Li Yihong ◽  
Wang Yunpeng ◽  
Li Tao ◽  
Lan Xiaolong ◽  
Song Han

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sana Akbar ◽  
Sri Khetwat Saritha

AbstractCommunity detection remains little explored in the analysis of biodiversity change. The challenges linked with global biodiversity change have also multiplied manifold in the past few decades. Moreover, most studies concerning biodiversity change lack the quantitative treatment central to species distribution modeling. Empirical analysis of species distribution and abundance is thus integral to the study of biodiversity loss and biodiversity alterations. Community detection is therefore expected to efficiently model the topological aspect of biodiversity change driven by land-use conversion and climate change; given that it has already proven superior for diverse problems in the domain of social network analysis and subgroup discovery in complex systems. Thus, quantum inspired community detection is proposed as a novel technique to predict biodiversity change considering tiger population in eighteen states of India; leading to benchmarking of two novel datasets. Elements of land-use conversion and climate change are explored to design these datasets viz.—Landscape based distribution and Number of tiger reserves based distribution respectively; for predicting regions expected to maximize Tiger population growth. Furthermore, validation of the proposed framework on the said datasets is performed using standard community detection metrics like—Modularity, Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), Degree distribution, Degree centrality and Edge-betweenness centrality. Quantum inspired community detection has also been successful in demonstrating an association between biodiversity change, land-use conversion and climate change; validated statistically by Pearson’s correlation coefficient and p value test. Finally, modularity distribution based on parameter tuning establishes the superiority of the second dataset based on the number of Tiger reserves—in predicting regions maximizing Tiger population growth fostering species distribution and abundance; apart from scripting a stronger correlation of biodiversity change with land-use conversion.


2021 ◽  
Author(s):  
Daniel J Nieves ◽  
Jeremy A. Pike ◽  
Florian Levet ◽  
Juliette Griffié ◽  
Daniel Sage ◽  
...  

Single molecule localisation microscopy (SMLM) generates data in the form of Cartesian coordinates of localised fluorophores. Cluster analysis is an attractive route for extracting biologically meaningful information from such data and has been widely applied. Despite the range of developed cluster analysis algorithms, there exists no consensus framework for the evaluation of their performance. Here, we use a systematic approach based on two metrics, the Adjusted Rand Index (ARI) and Intersection over Union (IoU), to score the success of clustering algorithms in diverse simulated clustering scenarios mimicking experimental data. We demonstrate the framework using three analysis algorithms: DBSCAN, ToMATo and KDE, show how to deduce optimal analysis parameters and how they are affected by fluorophore multiple blinking. We propose that these standard conditions and metrics become the basis for future analysis algorithm development and evaluation.


Sign in / Sign up

Export Citation Format

Share Document