Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

With the development of modern science and technology, it is easy to obtain a large number of high-dimensional datasets, which are related but different. Classical unimodel analysis is less likely to capture potential links between the different datasets. Recently, a collaborative regression model based on least square (LS) method for this problem has been proposed. In this paper, we propose a robust collaborative regression based on the least absolute deviation (LAD). We give the statistical interpretation of the LS-collaborative regression and LAD-collaborative regression. Then we design an efficient symmetric Gauss–Seidel-based alternating direction method of multipliers algorithm to solve the two models, which has the global convergence and the Q-linear rate of convergence. Finally we report numerical experiments to illustrate the efficiency of the proposed methods.

Download Full-text

Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-030-01851-1_23 ◽

2018 ◽

pp. 236-246

Author(s):

Ilari Kampman ◽

Tapio Elomaa

Keyword(s):

Dimensionality Reduction ◽

Hierarchical Clustering ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Scalable hierarchical clustering by composition rank vector encoding and tree structure

10.1101/2020.04.12.038026 ◽

2020 ◽

Author(s):

Xiao Lai ◽

Pu Tian

Keyword(s):

Machine Learning ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

High Dimensional Data ◽

Machine Learning Algorithms ◽

Tree Structure ◽

Supervised Machine Learning ◽

High Dimensional ◽

Rank Vector ◽

Nonlinear Correlations

AbstractSupervised machine learning, especially deep learning based on a wide variety of neural network architectures, have contributed tremendously to fields such as marketing, computer vision and natural language processing. However, development of un-supervised machine learning algorithms has been a bottleneck of artificial intelligence. Clustering is a fundamental unsupervised task in many different subjects. Unfortunately, no present algorithm is satisfactory for clustering of high dimensional data with strong nonlinear correlations. In this work, we propose a simple and highly efficient hierarchical clustering algorithm based on encoding by composition rank vectors and tree structure, and demonstrate its utility with clustering of protein structural domains. No record comparison, which is an expensive and essential common step to all present clustering algorithms, is involved. Consequently, it achieves linear time and space computational complexity hierarchical clustering, thus applicable to arbitrarily large datasets. The key factor in this algorithm is definition of composition, which is dependent upon physical nature of target data and therefore need to be constructed case by case. Nonetheless, the algorithm is general and applicable to any high dimensional data with strong nonlinear correlations. We hope this algorithm to inspire a rich research field of encoding based clustering well beyond composition rank vector trees.

Download Full-text

(Auto)Biographical reflections on the contributions of William F. Loomis (1940-2016) to Dictyostelium biology

The International Journal of Developmental Biology ◽

10.1387/ijdb.190224ak ◽

2019 ◽

Vol 63 (8-9-10) ◽

pp. 343-357

Author(s):

Adam Kuspa ◽

Gad Shaulsky

Keyword(s):

Cell Differentiation ◽

Molecular Biology ◽

Genetic Control ◽

Dictyostelium Discoideum ◽

University Of California ◽

High Dimensional ◽

Social Amoeba ◽

The Social ◽

The University ◽

High Dimensional Datasets

William Farnsworth Loomis studied the social amoeba Dictyostelium discoideum for more than fifty years as a professor of biology at the University of California, San Diego, USA. This biographical reflection describes Dr. Loomis’ major scientific contributions to the field within a career arc that spanned the early days of molecular biology up to the present day where the acquisition of high-dimensional datasets drive research. Dr. Loomis explored the genetic control of social amoeba development, delineated mechanisms of cell differentiation, and significantly advanced genetic and genomic technology for the field. The details of Dr. Loomis’ multifaceted career are drawn from his published work, from an autobiographical essay that he wrote near the end of his career and from extensive conversations between him and the two authors, many of which took place on the deck of his beachfront home in Del Mar, California.

Download Full-text

Probing Multiscale Disorder in Pyrochlore and Related Complex Oxides in the Transmission Electron Microscope: A Review

Frontiers in Chemistry ◽

10.3389/fchem.2021.743025 ◽

2021 ◽

Vol 9 ◽

Author(s):

Jenna L. Wardini ◽

Hasti Vahidi ◽

Huiming Guo ◽

William J. Bowman

Keyword(s):

Complex Oxides ◽

Atomic Scale ◽

High Dimensional ◽

Detection Systems ◽

Spatially Resolved ◽

Chemical Ordering ◽

Transmission Electron ◽

The Many ◽

High Dimensional Datasets

Transmission electron microscopy (TEM), and its counterpart, scanning TEM (STEM), are powerful materials characterization tools capable of probing crystal structure, composition, charge distribution, electronic structure, and bonding down to the atomic scale. Recent (S)TEM instrumentation developments such as electron beam aberration-correction as well as faster and more efficient signal detection systems have given rise to new and more powerful experimental methods, some of which (e.g., 4D-STEM, spectrum-imaging, in situ/operando (S)TEM)) facilitate the capture of high-dimensional datasets that contain spatially-resolved structural, spectroscopic, time- and/or stimulus-dependent information across the sub-angstrom to several micrometer length scale. Thus, through the variety of analysis methods available in the modern (S)TEM and its continual development towards high-dimensional data capture, it is well-suited to the challenge of characterizing isometric mixed-metal oxides such as pyrochlores, fluorites, and other complex oxides that reside on a continuum of chemical and spatial ordering. In this review, we present a suite of imaging and diffraction (S)TEM techniques that are uniquely suited to probe the many types, length-scales, and degrees of disorder in complex oxides, with a focus on disorder common to pyrochlores, fluorites and the expansive library of intermediate structures they may adopt. The application of these techniques to various complex oxides will be reviewed to demonstrate their capabilities and limitations in resolving the continuum of structural and chemical ordering in these systems.

Download Full-text

Diabetes and its Complication Prediction using Multi-Task Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2821.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1426-1430

Keyword(s):

Risk Factors ◽

Diabetes Patient ◽

Prediction Performance ◽

Multitask Learning ◽

High Dimensional ◽

Disease Prediction ◽

Healthcare Applications ◽

Future Health ◽

High Dimensional Datasets

Diabetes is a long-term disease that ends up in multiple side-effects. It has now become a reticent exterminator in society because it doesn’t reveal any signs hitherto to the patients until it’s too late. It leads to many complications to other organs, such as kidney, cardiovascular, liver or blood pressure [1]. This work tends to apply a unique multitask learning [2] to synchronously map the relation between manifold complications wherever every task conforms to risks of modelling of complications [3]. It also uses feature selection to reduce the set of risk factors from high-dimensional datasets. Then using the concept of correlation, it finds the degree of relativity among various sideeffects. The proposed method is able to identify the possible future health hazards identified with the diabetes patient. This will enable us to explain medical conditions and can improves healthcare applications which would help to improve disease prediction performance.

Download Full-text

DPM: Fast and scalable clustering algorithm for large scale high dimensional datasets

2014 10th International Computer Engineering Conference (ICENCO) ◽

10.1109/icenco.2014.7050427 ◽

2014 ◽

Author(s):

Tamer F. Ghanem ◽

Wail S. Elkilani ◽

Hatem S. Ahmed ◽

Mohiy M. Hadhoud

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional ◽

Scalable Clustering ◽

High Dimensional Datasets

Download Full-text