Automatically Discovering Mechanical Functions From Physical Behaviors via Clustering

Mapping Intimacies ◽

10.1115/detc2021-69328 ◽

2021 ◽

Author(s):

Kevin N. Chiu ◽

David Anderson ◽

Mark D. Fuge

Keyword(s):

Spatial Clustering ◽

A Priori ◽

Clustering Algorithms ◽

Synthetic Data ◽

Principal Component ◽

Logic Gates ◽

Computational Design ◽

Simulation Data ◽

New Device ◽

Novel Device

Abstract Computational design methods provide opportunities to discover novel and diverse designs that traditional optimization approaches cannot find or that use physical phenomena in ways that engineers have overlooked. However, existing methods require supervised objectives to search or optimize for explicit behaviors or functions — e.g., optimizing aerodynamic lift. In contrast, this paper unpacks what it means to discover interesting behaviors or functions we do not know about a priori using data from experiments or simulation in a fully unsupervised way. Doing so enables computers to invent or re-invent new or existing mechanical functions given only measurements of physical fields (e.g., pressure or electromagnetic fields) without directly specifying a set of objectives to optimize. This paper explores this approach via two related parts. First, we study clustering algorithms that can detect novel device families from simulation data. Specifically, we contribute a modification to the Hierarchical Density-Based Spatial Clustering of Applications with Noise algorithm via the use of the silhouette score to reduce excessively granular clusters. Second, we study multiple ways by which we preprocess simulation data to increase its discriminatory power in the context of clustering device behavior. This leads to an insight regarding the important role that a design’s representation has in compactly encoding its behavior. We test our contributions via the task of discovering designs that function as fluidic logic gates. We generate synthetic data that mimics fluidic devices and show that our proposed contributions better discover logic gates, as measured by adjusted Rand score. Specifically, combining our Resolution Selection preprocessing and principal component analysis resulted in the highest and tightest spread of adjusted Rand scores on our tested datasets. This opens up new avenues of research wherein computers can automatically explore different types of physics and then derive new device functions, behaviors, and structures without the need for human labels or guidance.

Download Full-text

On the MDBSCAN Algorithm in a Spatial Data Mining Context

Geographic Information Analysis for Sustainable Development and Economic Planning - Advances in Geospatial Technologies ◽

10.4018/978-1-4666-1924-1.ch018 ◽

2013 ◽

pp. 263-273

Author(s):

Gabriella Schoier

Keyword(s):

Spatial Data ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Synthetic Data ◽

Spatial Data Mining ◽

Data Sets ◽

Data Set ◽

Spatial Variables ◽

Spatial Objects ◽

Spatial Data Sets

The rapid developments in the availability and access to spatially referenced information in a variety of areas, has induced the need for better analysis techniques to understand the various phenomena. In particular, spatial clustering algorithms, which group similar spatial objects into classes, can be used for the identification of areas sharing common characteristics. The aim of this chapter is to present a density based algorithm for the discovery of clusters of units in large spatial data sets (MDBSCAN). This algorithm is a modification of the DBSCAN algorithm (see Ester (1996)). The modifications regard the consideration of spatial and non spatial variables and the use of a Lagrange-Chebychev metrics instead of the usual Euclidean one. The applications concern a synthetic data set and a data set of satellite images

Download Full-text

On the MDBSCAN Algorithm in a Spatial Data Mining Context

Data Mining ◽

10.4018/978-1-4666-2455-9.ch021 ◽

2013 ◽

pp. 435-444

Author(s):

Gabriella Schoier

Keyword(s):

Spatial Data ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Synthetic Data ◽

Spatial Data Mining ◽

Data Sets ◽

Data Set ◽

Spatial Variables ◽

Spatial Objects ◽

Spatial Data Sets

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

Spatial clustering analysis of green economy based on knowledge graph

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219074 ◽

2021 ◽

pp. 1-10

Author(s):

Shiyuan Zhou ◽

Xiaoqin Yang ◽

Qianli Chang

Keyword(s):

Spatial Data ◽

Clustering Analysis ◽

Spatial Clustering ◽

Evaluation Model ◽

Principal Component ◽

Green Economy ◽

Membership Degree ◽

Two Dimensional ◽

Dimensional Graph ◽

Improved Algorithm

By organically combining principal component analysis, spatial autocorrelation algorithm and two-dimensional graph theory clustering algorithm, the comprehensive evaluation model of regional green economy is explored and established. Based on the evaluation index system of regional green economy, this paper evaluates the development of regional green economy comprehensively by using principal component analysis, and evaluates the competitive advantage of green economy and analyzes the spatial autocorrelation based on the evaluation results. Finally, the green economy and local index score as observed values, by using the method of two-dimensional graph clustering analysis of spatial clustering. In view of the fuzzy k –modes cluster membership degree measure method without considering the defects of the spatial distribution of object, double the distance and density measurement of measure method is introduced into the fuzzy algorithm of k –modes, thus in a more reasonable way to update the membership degree of the object. Vote, MUSH-ROOM and ZOO data sets in UCI machine learning library were used for testing, and the F value of the improved algorithm was better than that of the previous one, indicating that the improved algorithm had good clustering effect. Finally, the improved algorithm is applied to the spatial data collected from Baidu Map to cluster, and a good clustering result is obtained, which shows the feasibility and effectiveness of the algorithm applied to spatial data. Results show that the development of green economy using the analysis method of combining quantitative analysis and qualitative analysis, explores the connotation of green economy with space evaluation model is feasible, small make up for the qualitative analysis of the green economy in the past, can objective system to reflect the regional green economic development level, will help policy makers scientific formulating regional economic development strategy, green integrated development of regional green economy from the macroscopic Angle, the development of network system.

Download Full-text

A Priori Modeling of NO Formation with Principal Component Analysis and the Convolutional Neural Network in the Context of Large Eddy Simulation

Energy & Fuels ◽

10.1021/acs.energyfuels.1c02332 ◽

2021 ◽

Author(s):

Jiahao Ren ◽

Haiou Wang ◽

Kun Luo ◽

Jianren Fan

Keyword(s):

Neural Network ◽

Principal Component Analysis ◽

Large Eddy Simulation ◽

Convolutional Neural Network ◽

A Priori ◽

Principal Component ◽

Component Analysis ◽

Eddy Simulation ◽

No Formation ◽

Large Eddy

Download Full-text

Functional Data Analysis in Sport Science: Example of Swimmers’ Progression Curves Clustering

Applied Sciences ◽

10.3390/app8101766 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1766 ◽

Cited By ~ 2

Author(s):

Arthur Leroy ◽

Andy MARC ◽

Olivier DUPAS ◽

Jean Lionel REY ◽

Servane Gey

Keyword(s):

Data Analysis ◽

Functional Data Analysis ◽

Functional Data ◽

Clustering Algorithms ◽

Principal Component ◽

Continuous Functions ◽

Functional Principal Component Analysis ◽

Data Set ◽

Functional Clustering ◽

Sport Science

Many data collected in sport science come from time dependent phenomenon. This article focuses on Functional Data Analysis (FDA), which study longitudinal data by modelling them as continuous functions. After a brief review of several FDA methods, some useful practical tools such as Functional Principal Component Analysis (FPCA) or functional clustering algorithms are presented and compared on simulated data. Finally, the problem of the detection of promising young swimmers is addressed through a curve clustering procedure on a real data set of performance progression curves. This study reveals that the fastest improvement of young swimmers generally appears before 16 years old. Moreover, several patterns of improvement are identified and the functional clustering procedure provides a useful detection tool.

Download Full-text

Continuum Power CCA: A Unified Approach for Isolating Coupled Modes

Journal of Climate ◽

10.1175/jcli-d-14-00451.1 ◽

2015 ◽

Vol 28 (3) ◽

pp. 1016-1030 ◽

Cited By ~ 2

Author(s):

Erik Swenson

Keyword(s):

Signal To Noise Ratio ◽

Full Range ◽

Synthetic Data ◽

Principal Component Regression ◽

Principal Component ◽

Accurate Estimate ◽

Unified Approach ◽

Coupled Modes ◽

Multivariate Statistical ◽

Sample Covariance

Abstract Various multivariate statistical methods exist for analyzing covariance and isolating linear relationships between datasets. The most popular linear methods are based on singular value decomposition (SVD) and include canonical correlation analysis (CCA), maximum covariance analysis (MCA), and redundancy analysis (RDA). In this study, continuum power CCA (CPCCA) is introduced as one extension of continuum power regression for isolating pairs of coupled patterns whose temporal variation maximizes the squared covariance between partially whitened variables. Similar to the whitening transformation, the partial whitening transformation acts to decorrelate individual variables but only to a partial degree with the added benefit of preconditioning sample covariance matrices prior to inversion, providing a more accurate estimate of the population covariance. CPCCA is a unified approach in the sense that the full range of solutions bridges CCA, MCA, RDA, and principal component regression (PCR). Recommended CPCCA solutions include a regularization for CCA, a variance bias correction for MCA, and a regularization for RDA. Applied to synthetic data samples, such solutions yield relatively higher skill in isolating known coupled modes embedded in noise. Provided with some crude prior expectation of the signal-to-noise ratio, the use of asymmetric CPCCA solutions may be justifiable and beneficial. An objective parameter choice is offered for regularization with CPCCA based on the covariance estimate of O. Ledoit and M. Wolf, and the results are quite robust. CPCCA is encouraged for a range of applications.

Download Full-text

Comparison of dimensionality reduction and clustering methods for SARS-CoV-2 genome

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i4.2803 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2170-2180

Author(s):

Untari N. Wisesty ◽

Tati Rajab Mengko

Keyword(s):

Dimensionality Reduction ◽

Dimensional Reduction ◽

Clustering Algorithm ◽

Sequence Data ◽

Clustering Algorithms ◽

Gaussian Mixture Models ◽

Reduction Process ◽

Principal Component ◽

Gaussian Mixture ◽

Clustering Methods

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.

Download Full-text

Posture similarity index: a method to compare hand postures in synergy space

PeerJ ◽

10.7717/peerj.6078 ◽

2018 ◽

Vol 6 ◽

pp. e6078 ◽

Cited By ~ 1

Author(s):

Nayan Bhatt ◽

Varadhan SKM

Keyword(s):

Experimental Data ◽

Tracking System ◽

Synthetic Data ◽

Similarity Index ◽

Principal Component ◽

Synthetic Dataset ◽

Human Hand ◽

Task Requirements ◽

Kinematic Synergies ◽

The Central Nervous System

Background The human hand can perform a range of manipulation tasks, from holding a pen to holding a hammer. The central nervous system (CNS) uses different strategies in different manipulation tasks based on task requirements. Attempts to compare postures of the hand have been made for use in robotics and animation industries. In this study, we developed an index called the posture similarity index to quantify the similarity between two human hand postures. Methods Twelve right-handed volunteers performed 70 postures, and lifted and held 30 objects (total of 100 different postures, each performed five times). A 16-sensor electromagnetic tracking system captured the kinematics of individual finger phalanges (segments). We modeled the hand as a 21-DoF system and computed the corresponding joint angles. We used principal component analysis to extract kinematic synergies from this 21-DoF data. We developed a posture similarity index (PSI), that represents the similarity between posture in the synergy (Principal component) space. First, we tested the performance of this index using a synthetic dataset. After confirming that it performs well with the synthetic dataset, we used it to analyze the experimental data. Further, we used PSI to identify postures that are “representative” in the sense that they have a greater overlap (in synergy space) with a large number of postures. Results Our results confirmed that PSI is a relatively accurate index of similarity in synergy space both with synthetic data and real experimental data. Also, more special postures than common postures were found among “representative” postures. Conclusion We developed an index for comparing posture similarity in synergy space and demonstrated its utility by using synthetic dataset and experimental dataset. Besides, we found that “special” postures are actually “special” in the sense that there are more of them in the “representative” postures as identified by our posture similarity index.

Download Full-text

On the reconstruction of the shape of a seismic streamer

Geophysics ◽

10.1190/1.3267855 ◽

2010 ◽

Vol 75 (1) ◽

pp. H1-H6

Author(s):

Bruno Goutorbe ◽

Violaine Combier

Keyword(s):

A Priori ◽

Synthetic Data ◽

Seismic Survey ◽

Random Errors ◽

Generalized Inversion ◽

Reconstruction Methods ◽

Seismic Acquisition ◽

Marine Seismic ◽

Gps Devices ◽

Vessel Reconstruction

In the frame of 3D seismic acquisition, reconstructing the shape of the streamer(s) for each shot is an essential step prior to data processing. Depending on the survey, several kinds of constraints help achieve this purpose: local azimuths given by compasses, absolute positions recorded by global positioning system (GPS) devices, and distances calculated between pairs of acoustic ranging devices. Most reconstruction methods are restricted to work on a particular type of constraint and do not estimate the final uncertainties. The generalized inversion formalism using the least-squares criterion can provide a robust framework to solve such a problem — handling several kinds of constraints together, not requiring an a priori parameterization of the streamer shape, naturally extending to any configuration of streamer(s), and giving rigorous uncertainties. We explicitly derive the equations governing the algorithm corresponding to a marine seismic survey using a single streamer with compasses distributed all along it and GPS devices located on the tail buoy and on the vessel. Reconstruction tests conducted on several synthetic examples show that the algorithm performs well, with a mean error of a few meters in realistic cases. The accuracy logically degrades if higher random errors are added to the synthetic data or if deformations of the streamer occur at a short length scale.

Download Full-text