scholarly journals Common Information Components Analysis

Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 151
Author(s):  
Erixhen Sula ◽  
Michael C. Gastpar

Wyner’s common information is a measure that quantifies and assesses the commonality between two random variables. Based on this, we introduce a novel two-step procedure to construct features from data, referred to as Common Information Components Analysis (CICA). The first step can be interpreted as an extraction of Wyner’s common information. The second step is a form of back-projection of the common information onto the original variables, leading to the extracted features. A free parameter γ controls the complexity of the extracted features. We establish that, in the case of Gaussian statistics, CICA precisely reduces to Canonical Correlation Analysis (CCA), where the parameter γ determines the number of CCA components that are extracted. In this sense, we establish a novel rigorous connection between information measures and CCA, and CICA is a strict generalization of the latter. It is shown that CICA has several desirable features, including a natural extension to beyond just two data sets.

1987 ◽  
Vol 28 (1) ◽  
pp. 100-118 ◽  
Author(s):  
Joël Guiot

AbstractIn regions like southern France, where usual analytical techniques are unsatisfactory due to the heavy influence of human activity and the existence of a complex climate, the quantitative reconstruction of climate from pollen is particularly difficult. That is why an original method has been performed. The first step of the method, based on best analogs estimation and multiple regression, is to calculate a relationship between climate (monthly temperature and precipitation) and modern pollen spectra for data from 182 sites. The result is called the “analog climate.” The second step of the method, an original combination of canonical correlation and principal components analyses, extracts the common information from several fossil pollen sequences to produce a so-called “paleobioclimate.” This step removes the effects of local differences in the vegetation among nearby sites and also reduces the effects of human disturbance. In the third step, the signals obtained from the first two steps are merged, by Kalman filter, into a final reconstruction where the noise is reduced by around 37%. The results, presented with a 95% confidence level, suggest that the Pleni-Würm was 7° to 13°C colder than present and had 20 to 60% less precipitation than today (this large confidence takes into account the climatic diversity of the region). At 13,000 yr B.P., the climate reached the modern level of precipitation but the temperature was just less than present. The subsequent decrease of precipitation is less important than that of temperature. This fact may explain the advance of alpine glaciers during the Younger Dryas. Preboreal warming was abrupt (3° to 4°C per 500 yr) and precipitation increased more slowly.


Author(s):  
Thomas W. Shattuck ◽  
James R. Anderson ◽  
Neil W. Tindale ◽  
Peter R. Buseck

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.


Materials ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 3816
Author(s):  
Haidong He ◽  
Risheng Hua ◽  
Xuan Li ◽  
Chunju Wang ◽  
Xuezhong Ning ◽  
...  

Laser irradiation is a popular method to produce microtextures on metal surfaces. However, the common laser-produced microtextures were hierarchical (multiscale), which may limit their applicability. In this paper, a method of two-step laser irradiation, combining first-step strong ablation and sequentially second-step gentle ablation, was presented to produce micron-rough surface with single-scale microtextures. The effect of laser fluence on the Ti–6Al–4V surface morphology and wettability were investigated in detail. The morphology results revealed that the microtextures produced using this method gradually evolved from multiscale to single-scale meanwhile from microprotrusions to microholes with increasing the second-step laser fluence from 0.0 to 2.4 J/cm2. The wettability and EDS/XPS results indicated that attributing to the rich TiO2 content and micron roughness produced by laser irradiation, all the two-step laser-irradiated surfaces exhibited superhydrophilicity. In addition, after silanization, all these superhydrophilic surfaces immediately turned to be superhydrophobic with close water contact angles of 155–162°. However, due to the absence of nanotextures, the water-rolling angle on the superhydrophobic surfaces with single-scale microtextures distinctly larger than those with multiscale ones. Finally, using the two-step laser-irradiation method and assisted with silanization, multifunctional superhydrophobic Ti–6Al–4V surfaces were achieved, including self-cleaning, guiding of the water-rolling direction and anisotropic water-rolling angles (like the rice-leaf), etc.


Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


2016 ◽  
Vol 2016 ◽  
pp. 1-18 ◽  
Author(s):  
Mustafa Yuksel ◽  
Suat Gonul ◽  
Gokce Banu Laleci Erturkmen ◽  
Ali Anil Sinaci ◽  
Paolo Invernizzi ◽  
...  

Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information.


2018 ◽  
Vol 8 (12) ◽  
pp. 2421 ◽  
Author(s):  
Chongya Song ◽  
Alexander Pons ◽  
Kang Yen

In the field of network intrusion, malware usually evades anomaly detection by disguising malicious behavior as legitimate access. Therefore, detecting these attacks from network traffic has become a challenge in this an adversarial setting. In this paper, an enhanced Hidden Markov Model, called the Anti-Adversarial Hidden Markov Model (AA-HMM), is proposed to effectively detect evasion pattern, using the Dynamic Window and Threshold techniques to achieve adaptive, anti-adversarial, and online-learning abilities. In addition, a concept called Pattern Entropy is defined and acts as the foundation of AA-HMM. We evaluate the effectiveness of our approach employing two well-known benchmark data sets, NSL-KDD and CTU-13, in terms of the common performance metrics and the algorithm’s adaptation and anti-adversary abilities.


Koedoe ◽  
1995 ◽  
Vol 38 (1) ◽  
Author(s):  
G.J. Bredenkamp ◽  
H. Bezuidenhout

A procedure for the effective classification of large phytosociological data sets, and the combination of many data sets from various parts of the South African grasslands is demonstrated. The procedure suggests a region by region or project by project treatment of the data. The analyses are performed step by step to effectively bring together all releves of similar or related plant communities. The first step involves a separate numerical classification of each subset (region), and subsequent refinement by Braun- Blanquet procedures. The resulting plant communities are summarised in a single synoptic table, by calculating a synoptic value for each species in each community. In the second step all communities in the synoptic table are classified by numerical analysis, to bring related communities from different regions or studies together in a single cluster. After refinement of these clusters by Braun-Blanquet procedures, broad vegetation types are identified. As a third step phytosociological tables are compiled for each iden- tified broad vegetation type, and a comprehensive abstract hierarchy constructed.


Sign in / Sign up

Export Citation Format

Share Document