Common Information Components Analysis

Wyner’s common information is a measure that quantifies and assesses the commonality between two random variables. Based on this, we introduce a novel two-step procedure to construct features from data, referred to as Common Information Components Analysis (CICA). The first step can be interpreted as an extraction of Wyner’s common information. The second step is a form of back-projection of the common information onto the original variables, leading to the extracted features. A free parameter γ controls the complexity of the extracted features. We establish that, in the case of Gaussian statistics, CICA precisely reduces to Canonical Correlation Analysis (CCA), where the parameter γ determines the number of CCA components that are extracted. In this sense, we establish a novel rigorous connection between information measures and CCA, and CICA is a strict generalization of the latter. It is shown that CICA has several desirable features, including a natural extension to beyond just two data sets.

Download Full-text

Common Information Components Analysis

2020 Information Theory and Applications Workshop (ITA) ◽

10.1109/ita50056.2020.9244993 ◽

2020 ◽

Author(s):

Michael Gastpar ◽

Erixhen Sula

Keyword(s):

Common Information ◽

Components Analysis ◽

Information Components

Download Full-text

Late Quaternary Climatic Change in France Estimated from Multivariate Pollen Time Series

Quaternary Research ◽

10.1016/0033-5894(87)90036-6 ◽

1987 ◽

Vol 28 (1) ◽

pp. 100-118 ◽

Cited By ~ 85

Author(s):

Joël Guiot

Keyword(s):

Late Quaternary ◽

Analytical Techniques ◽

Original Method ◽

Second Step ◽

Common Information ◽

Modern Pollen ◽

Temperature And Precipitation ◽

Subsequent Decrease ◽

Modern Level ◽

The Common

AbstractIn regions like southern France, where usual analytical techniques are unsatisfactory due to the heavy influence of human activity and the existence of a complex climate, the quantitative reconstruction of climate from pollen is particularly difficult. That is why an original method has been performed. The first step of the method, based on best analogs estimation and multiple regression, is to calculate a relationship between climate (monthly temperature and precipitation) and modern pollen spectra for data from 182 sites. The result is called the “analog climate.” The second step of the method, an original combination of canonical correlation and principal components analyses, extracts the common information from several fossil pollen sequences to produce a so-called “paleobioclimate.” This step removes the effects of local differences in the vegetation among nearby sites and also reduces the effects of human disturbance. In the third step, the signals obtained from the first two steps are merged, by Kalman filter, into a final reconstruction where the noise is reduced by around 37%. The results, presented with a 95% confidence level, suggest that the Pleni-Würm was 7° to 13°C colder than present and had 20 to 60% less precipitation than today (this large confidence takes into account the climatic diversity of the region). At 13,000 yr B.P., the climate reached the modern level of precipitation but the temperature was just less than present. The subsequent decrease of precipitation is less important than that of temperature. This fact may explain the advance of alpine glaciers during the Younger Dryas. Preboreal warming was abrupt (3° to 4°C per 500 yr) and precipitation increased more slowly.

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text

Fabrication of Superhydrophobic Ti–6Al–4V Surfaces with Single-Scale Micotextures by using Two-Step Laser Irradiation and Silanization

Materials ◽

10.3390/ma13173816 ◽

2020 ◽

Vol 13 (17) ◽

pp. 3816

Author(s):

Haidong He ◽

Risheng Hua ◽

Xuan Li ◽

Chunju Wang ◽

Xuezhong Ning ◽

...

Keyword(s):

Laser Irradiation ◽

Laser Fluence ◽

Rolling Direction ◽

Contact Angles ◽

Second Step ◽

Water Contact ◽

Single Scale ◽

The Common ◽

The Rich ◽

Hierarchical Multiscale

Laser irradiation is a popular method to produce microtextures on metal surfaces. However, the common laser-produced microtextures were hierarchical (multiscale), which may limit their applicability. In this paper, a method of two-step laser irradiation, combining first-step strong ablation and sequentially second-step gentle ablation, was presented to produce micron-rough surface with single-scale microtextures. The effect of laser fluence on the Ti–6Al–4V surface morphology and wettability were investigated in detail. The morphology results revealed that the microtextures produced using this method gradually evolved from multiscale to single-scale meanwhile from microprotrusions to microholes with increasing the second-step laser fluence from 0.0 to 2.4 J/cm2. The wettability and EDS/XPS results indicated that attributing to the rich TiO2 content and micron roughness produced by laser irradiation, all the two-step laser-irradiated surfaces exhibited superhydrophilicity. In addition, after silanization, all these superhydrophilic surfaces immediately turned to be superhydrophobic with close water contact angles of 155–162°. However, due to the absence of nanotextures, the water-rolling angle on the superhydrophobic surfaces with single-scale microtextures distinctly larger than those with multiscale ones. Finally, using the two-step laser-irradiation method and assisted with silanization, multifunctional superhydrophobic Ti–6Al–4V surfaces were achieved, including self-cleaning, guiding of the water-rolling direction and anisotropic water-rolling angles (like the rice-leaf), etc.

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

An Interoperability Platform Enabling Reuse of Electronic Health Records for Signal Verification Studies

BioMed Research International ◽

10.1155/2016/6741418 ◽

2016 ◽

Vol 2016 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Mustafa Yuksel ◽

Suat Gonul ◽

Gokce Banu Laleci Erturkmen ◽

Ali Anil Sinaci ◽

Paolo Invernizzi ◽

...

Keyword(s):

Real Life ◽

Case Series ◽

Background Information ◽

Data Sets ◽

Local Data ◽

Lombardy Region ◽

Common Information ◽

Spontaneous Reports ◽

Wide Range ◽

Common Information Model

Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information.

Download Full-text

Retraction: Mitigating the Common Information Sampling Bias Inherent in Small-Group Discussion

Behavioral Research in Accounting ◽

10.2308/bria-10445 ◽

2015 ◽

Vol 28 (2) ◽

pp. 93-93

Author(s):

James E. Hunton

Keyword(s):

Small Group ◽

Group Discussion ◽

Sampling Bias ◽

Small Group Discussion ◽

Common Information ◽

Information Sampling ◽

The Common

Download Full-text

Problems of the Common Information Space of the EAEU

Voprosy novoi ekonomiki ◽

10.52170/1994-0556_2021_59_4 ◽

2021 ◽

pp. 4-12

Author(s):

E. F. Avdokushin ◽

Baurzhan Shakirtkhanov

Keyword(s):

Information Space ◽

Common Information ◽

Common Information Space ◽

The Common

Download Full-text

AA-HMM: An Anti-Adversarial Hidden Markov Model for Network-Based Intrusion Detection

Applied Sciences ◽

10.3390/app8122421 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2421 ◽

Cited By ~ 1

Author(s):

Chongya Song ◽

Alexander Pons ◽

Kang Yen

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Performance Metrics ◽

Hidden Markov ◽

Data Sets ◽

Learning Abilities ◽

Benchmark Data ◽

Malicious Behavior ◽

Network Intrusion ◽

The Common

In the field of network intrusion, malware usually evades anomaly detection by disguising malicious behavior as legitimate access. Therefore, detecting these attacks from network traffic has become a challenge in this an adversarial setting. In this paper, an enhanced Hidden Markov Model, called the Anti-Adversarial Hidden Markov Model (AA-HMM), is proposed to effectively detect evasion pattern, using the Dynamic Window and Threshold techniques to achieve adaptive, anti-adversarial, and online-learning abilities. In addition, a concept called Pattern Entropy is defined and acts as the foundation of AA-HMM. We evaluate the effectiveness of our approach employing two well-known benchmark data sets, NSL-KDD and CTU-13, in terms of the common performance metrics and the algorithm’s adaptation and anti-adversary abilities.

Download Full-text

A proposed procedure for the analysis of large phytosociological data sets in the classification of South African grasslands

Koedoe ◽

10.4102/koedoe.v38i1.303 ◽

1995 ◽

Vol 38 (1) ◽

Cited By ~ 6

Author(s):

G.J. Bredenkamp ◽

H. Bezuidenhout

Keyword(s):

South African ◽

Plant Communities ◽

Vegetation Type ◽

Second Step ◽

Data Sets ◽

Vegetation Types ◽

Related Plant ◽

Single Cluster ◽

Synoptic Table

A procedure for the effective classification of large phytosociological data sets, and the combination of many data sets from various parts of the South African grasslands is demonstrated. The procedure suggests a region by region or project by project treatment of the data. The analyses are performed step by step to effectively bring together all releves of similar or related plant communities. The first step involves a separate numerical classification of each subset (region), and subsequent refinement by Braun- Blanquet procedures. The resulting plant communities are summarised in a single synoptic table, by calculating a synoptic value for each species in each community. In the second step all communities in the synoptic table are classified by numerical analysis, to bring related communities from different regions or studies together in a single cluster. After refinement of these clusters by Braun-Blanquet procedures, broad vegetation types are identified. As a third step phytosociological tables are compiled for each iden- tified broad vegetation type, and a comprehensive abstract hierarchy constructed.

Download Full-text