Projections of Tropical Fermat-Weber Points

This paper is motivated by the difference between the classical principal component analysis (PCA) in a Euclidean space and the tropical PCA in a tropical projective torus as follows. In Euclidean space, the projection of the mean point of a given data set on the principle component is the mean point of the projection of the data set. However, in tropical projective torus, it is not guaranteed that the projection of a Fermat-Weber point of a given data set on a tropical polytope is a Fermat-Weber point of the projection of the data set. This is caused by the difference between the Euclidean metric and the tropical metric. In this paper, we focus on the projection on the tropical triangle (the three-point tropical convex hull), and we develop one algorithm and its improved version, such that for a given data set in the tropical projective torus, these algorithms output a tropical triangle, on which the projection of a Fermat-Weber point of the data set is a Fermat-Weber point of the projection of the data set. We implement these algorithms in R language and test how they work with random data sets. We also use R language for numerical computation. The experimental results show that these algorithms are stable and efficient, with a high success rate.

Download Full-text

Dimensionality and Its Reduction

Statistics, Data Mining, and Machine Learning in Astronomy ◽

10.23943/princeton/9780691151687.003.0007 ◽

2014 ◽

Author(s):

Andrew J. Connolly ◽

Jacob T. VanderPlas ◽

Alexander Gray ◽

Andrew J. Connolly ◽

Jacob T. VanderPlas ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Reduction Technique ◽

High Dimensional ◽

Data Sets ◽

Data Set ◽

Gaussian Distributions ◽

Dimensionality Reduction Technique ◽

Alternative Techniques ◽

New Generation

With the dramatic increase in data available from a new generation of astronomical telescopes and instruments, many analyses must address the question of the complexity as well as size of the data set. This chapter deals with how we can learn which measurements, properties, or combinations thereof carry the most information within a data set. It describes techniques that are related to concepts discussed when describing Gaussian distributions, density estimation, and the concepts of information content. The chapter begins with an exploration of the problems posed by high-dimensional data. It then describes the data sets used in this chapter, and introduces perhaps the most important and widely used dimensionality reduction technique, principal component analysis (PCA). The remainder of the chapter discusses several alternative techniques which address some of the weaknesses of PCA.

Download Full-text

Separation of the daily quiet variation from the geomagnetic field observations with the principal component analysis

10.5194/egusphere-egu2020-3423 ◽

2020 ◽

Author(s):

Anna Morozova ◽

Rania Rebbah ◽

M. Alexandra Pais

Keyword(s):

Principal Component Analysis ◽

Geomagnetic Field ◽

Daily Variation ◽

Extraction Procedure ◽

Principal Component ◽

Component Analysis ◽

Activity Level ◽

Data Series ◽

Data Sets ◽

Data Set

Geomagnetic field (GMF) variations from external sources are classified as regular diurnal or occurring during periods of disturbances. The most significant regular variations are the quiet solar daily variation (Sq) and the disturbance daily variation (SD). These variations have well recognized daily cycles and need to be accounted for before the analysis of the disturbed field. Preliminary analysis of the GMF variations shows that the principal component analysis (PCA) is a useful tool for extraction of regular variations of GMF; however the requirements to the data set length, geomagnetic activity level etc. need to be established.Here we present preliminary results of the PCA-based Sq extraction procedure based on the analysis of the Coimbra Geomagnetic Observatory (COI) measurements of the geomagnetic field components H, X, Y and Z between 2007 and 2015. The PCA-based Sq curves are compared with the standard ones obtained using 5 IQD per month. PCA was applied to data sets of different length: either 1 month-long data set for one of 2007-2015 years or data series for the same month but from different years (2007-2015) combined together. For most of the analyzed years the first PCA mode (PC1) was identified as SD variation and the second mode (PC2) was identified as Sq variation.

Download Full-text

Combining multiway principal component analysis (MPCA) and clustering for efficient data mining of historical data sets of SBR processes

Water Science & Technology ◽

10.2166/wst.2008.143 ◽

2008 ◽

Vol 57 (10) ◽

pp. 1659-1666 ◽

Cited By ~ 30

Author(s):

Kris Villez ◽

Magda Ruiz ◽

Gürkan Sin ◽

Joan Colomer ◽

Christian Rosén ◽

...

Keyword(s):

Principal Component Analysis ◽

Clustering Algorithm ◽

Process Analysis ◽

Principal Component ◽

Component Analysis ◽

Data Sets ◽

Process Data ◽

Nitrogen And Phosphorus ◽

Online Data ◽

Data Set

A methodology based on Principal Component Analysis (PCA) and clustering is evaluated for process monitoring and process analysis of a pilot-scale SBR removing nitrogen and phosphorus. The first step of this method is to build a multi-way PCA (MPCA) model using the historical process data. In the second step, the principal scores and the Q-statistics resulting from the MPCA model are fed to the LAMDA clustering algorithm. This procedure is iterated twice. The first iteration provides an efficient and effective discrimination between normal and abnormal operational conditions. The second iteration of the procedure allowed a clear-cut discrimination of applied operational changes in the SBR history. Important to add is that this procedure helped identifying some changes in the process behaviour, which would not have been possible, had we only relied on visually inspecting this online data set of the SBR (which is traditionally the case in practice). Hence the PCA based clustering methodology is a promising tool to efficiently interpret and analyse the SBR process behaviour using large historical online data sets.

Download Full-text

Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0066 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Shofiqul Islam ◽

Sonia Anand ◽

Jemila Hamid ◽

Lehana Thabane ◽

Joseph Beyene

Keyword(s):

Principal Component Analysis ◽

Data Integration ◽

Principal Components ◽

Mirna Expression ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Sets ◽

Data Set ◽

Multiple Data Sets

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.

Download Full-text

A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis

Journal of Official Statistics ◽

10.1515/jos-2017-0002 ◽

2017 ◽

Vol 33 (1) ◽

pp. 15-41 ◽

Cited By ~ 6

Author(s):

Aida Calviño

Keyword(s):

Principal Component Analysis ◽

Principal Components ◽

Synthetic Data ◽

Principal Component ◽

Component Analysis ◽

Data Sets ◽

Simple Method ◽

Processing Times ◽

The Mean ◽

Mean Vector

Abstract In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.

Download Full-text

Denoising Swarm Geomagnetic Virtual Observatories using principal component analysis

10.5194/egusphere-egu2020-9957 ◽

2020 ◽

Author(s):

Grace Cox ◽

Will Brown ◽

Ciaran Beggan ◽

Magnus Hammer ◽

Chris Finlay

Keyword(s):

Principal Component Analysis ◽

Local Time ◽

Principal Component ◽

Component Analysis ◽

Small Subset ◽

Time Sampling ◽

Quiet Time ◽

Data Set ◽

The Mean ◽

Virtual Observatories

Geomagnetic Virtual Observatories (GVOs) use satellite measurements to provide estimates of the mean internally-generated magnetic field (MF) over a specified period (usually one or four months) at a fixed location in space, mimicking the mean values obtained at ground-based observatories (GOs). These permit secular variation (SV) estimates anywhere on the globe, thereby mitigating the effects of uneven GO coverage. Current GVO estimates suffer from two key contamination sources: first, local time sampling biases due to satellite orbital dynamics, and second, MFs generated in regions external to the Earth such as the magnetosphere and ionosphere. Current methods to alleviate this contamination have drawbacks:Averaging over four months removes the local time sampling bias at the cost of reduced temporal resolution<ol><li>Stringent data selection criteria such as night-time, quiet-time only data greatly reduce, but do not entirely remove, external MF contamination and result in a small subset (<5%) of the available data being used</li> <li>Removing model predictions for external MFs from the measurements also reduces noise, however such parameterisations cannot fully describe these physical systems and some of their signal remains in the data.</li> </ol>Here we present an alternative approach to denoising GVOs that uses principal component analysis (PCA). This method retains monthly resolution, uses all available vector satellite data and removes contamination from orbital effects and external MFs. We present an application of PCA, implemented in an open-source Python package called MagPySV, to new GVOs calculated as part of a Swarm DISC project. &#160;The denoised data will be incorporated into a new GVO data set that will be available to the geomagnetism community as an official Swarm product. &#160;

Download Full-text

Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

Ekológia (Bratislava) ◽

10.1515/eko-2016-0014 ◽

2016 ◽

Vol 35 (2) ◽

pp. 173-190 ◽

Cited By ~ 13

Author(s):

S. Shahid Shaukat ◽

Toqeer Ahmed Rao ◽

Moazzam A. Khan

Keyword(s):

Principal Component Analysis ◽

Sample Size ◽

Principal Component ◽

Component Analysis ◽

Small Sample ◽

Environmental Data ◽

Data Matrix ◽

Data Sets ◽

Data Set ◽

The Impact

AbstractIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

Download Full-text

Evaluation of the Aging Behavior of High Density Polyethylene in Thermal Oxidative Environment by Principal Component Analysis

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.727.447 ◽

2017 ◽

Vol 727 ◽

pp. 447-449 ◽

Cited By ~ 1

Author(s):

Jun Dai ◽

Hua Yan ◽

Jian Jian Yang ◽

Jun Jun Guo

Keyword(s):

Principal Component Analysis ◽

Impact Strength ◽

High Density Polyethylene ◽

Tensile Modulus ◽

Principal Component ◽

Component Analysis ◽

High Density ◽

Aging Behavior ◽

Data Set ◽

The Impact

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.

Download Full-text

Voxelwise Principal Component Analysis of Dynamic [S-Methyl-11C]Methionine PET Data in Glioma Patients

Cancers ◽

10.3390/cancers13102342 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2342

Author(s):

Corentin Martens ◽

Olivier Debeir ◽

Christine Decaestecker ◽

Thierry Metens ◽

Laetitia Lebrun ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Added Value ◽

Time Activity ◽

Positron Emission ◽

Activity Curve ◽

The Mean ◽

Mean Time ◽

Met Pet

Recent works have demonstrated the added value of dynamic amino acid positron emission tomography (PET) for glioma grading and genotyping, biopsy targeting, and recurrence diagnosis. However, most of these studies are based on hand-crafted qualitative or semi-quantitative features extracted from the mean time activity curve within predefined volumes. Voxelwise dynamic PET data analysis could instead provide a better insight into intra-tumor heterogeneity of gliomas. In this work, we investigate the ability of principal component analysis (PCA) to extract relevant quantitative features from a large number of motion-corrected [S-methyl-11C]methionine ([11C]MET) PET frames. We first demonstrate the robustness of our methodology to noise by means of numerical simulations. We then build a PCA model from dynamic [11C]MET acquisitions of 20 glioma patients. In a distinct cohort of 13 glioma patients, we compare the parametric maps derived from our PCA model to these provided by the classical one-compartment pharmacokinetic model (1TCM). We show that our PCA model outperforms the 1TCM to distinguish characteristic dynamic uptake behaviors within the tumor while being less computationally expensive and not requiring arterial sampling. Such methodology could be valuable to assess the tumor aggressiveness locally with applications for treatment planning and response evaluation. This work further supports the added value of dynamic over static [11C]MET PET in gliomas.

Download Full-text

POSTPARTUM AMENORRHOEA IN RURAL EASTERN UTTAR PRADESH, INDIA

Journal of Biosocial Science ◽

10.1017/s0021932098002272 ◽

1998 ◽

Vol 30 (2) ◽

pp. 227-243

Author(s):

K. N. S. YADAVA ◽

S. K. JAIN

Keyword(s):

Higher Education ◽

Uttar Pradesh ◽

Current Status Data ◽

North India ◽

Current Status ◽

Data Sets ◽

Survival Status ◽

The Mean ◽

The Difference ◽

Using Data

This paper calculates the mean duration of the postpartum amenorrhoea (PPA) and examines its demographic, and socioeconomic correlates in rural north India, using data collected through 'retrospective' (last but one child) as well as 'current status' (last child) reporting of the duration of PPA.The mean duration of PPA was higher in the current status than in the retrospective data;n the difference being statistically significant. However, for the same mothers who gave PPA information in both the data sets, the difference in mean duration of PPA was not statistically significant. The correlates were identical in both the data sets. The current status data were more complete in terms of the coverage, and perhaps less distorted by reporting errors caused by recall lapse.A positive relationship of the mean duration of PPA was found with longer breast-feeding, higher parity and age of mother at the birth of the child, and the survival status of the child. An inverse relationship was found with higher education of a woman, higher education of her husband and higher socioeconomic status of her household, these variables possibly acting as proxies for women's better nutritional status.

Download Full-text