scholarly journals Projections of Tropical Fermat-Weber Points

Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3102
Author(s):  
Weiyi Ding ◽  
Xiaoxian Tang

This paper is motivated by the difference between the classical principal component analysis (PCA) in a Euclidean space and the tropical PCA in a tropical projective torus as follows. In Euclidean space, the projection of the mean point of a given data set on the principle component is the mean point of the projection of the data set. However, in tropical projective torus, it is not guaranteed that the projection of a Fermat-Weber point of a given data set on a tropical polytope is a Fermat-Weber point of the projection of the data set. This is caused by the difference between the Euclidean metric and the tropical metric. In this paper, we focus on the projection on the tropical triangle (the three-point tropical convex hull), and we develop one algorithm and its improved version, such that for a given data set in the tropical projective torus, these algorithms output a tropical triangle, on which the projection of a Fermat-Weber point of the data set is a Fermat-Weber point of the projection of the data set. We implement these algorithms in R language and test how they work with random data sets. We also use R language for numerical computation. The experimental results show that these algorithms are stable and efficient, with a high success rate.

Author(s):  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
Alexander Gray ◽  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
...  

With the dramatic increase in data available from a new generation of astronomical telescopes and instruments, many analyses must address the question of the complexity as well as size of the data set. This chapter deals with how we can learn which measurements, properties, or combinations thereof carry the most information within a data set. It describes techniques that are related to concepts discussed when describing Gaussian distributions, density estimation, and the concepts of information content. The chapter begins with an exploration of the problems posed by high-dimensional data. It then describes the data sets used in this chapter, and introduces perhaps the most important and widely used dimensionality reduction technique, principal component analysis (PCA). The remainder of the chapter discusses several alternative techniques which address some of the weaknesses of PCA.


2020 ◽  
Author(s):  
Anna Morozova ◽  
Rania Rebbah ◽  
M. Alexandra Pais

<p>Geomagnetic field (GMF) variations from external sources are classified as regular diurnal or occurring during periods of disturbances. The most significant regular variations are the quiet solar daily variation (Sq) and the disturbance daily variation (SD). These variations have well recognized daily cycles and need to be accounted for before the analysis of the disturbed field. Preliminary analysis of the GMF variations shows that the principal component analysis (PCA) is a useful tool for extraction of regular variations of GMF; however the requirements to the data set length, geomagnetic activity level etc. need to be established.</p><p>Here we present preliminary results of the PCA-based Sq extraction procedure based on the analysis of the Coimbra Geomagnetic Observatory (COI) measurements of the geomagnetic field components H, X, Y and Z between 2007 and 2015. The PCA-based Sq curves are compared with the standard ones obtained using 5 IQD per month. PCA was applied to data sets of different length: either 1 month-long data set for one of 2007-2015 years or data series for the same month but from different years (2007-2015) combined together. For most of the analyzed years the first PCA mode (PC1) was identified as SD variation and the second mode (PC2) was identified as Sq variation.</p>


2008 ◽  
Vol 57 (10) ◽  
pp. 1659-1666 ◽  
Author(s):  
Kris Villez ◽  
Magda Ruiz ◽  
Gürkan Sin ◽  
Joan Colomer ◽  
Christian Rosén ◽  
...  

A methodology based on Principal Component Analysis (PCA) and clustering is evaluated for process monitoring and process analysis of a pilot-scale SBR removing nitrogen and phosphorus. The first step of this method is to build a multi-way PCA (MPCA) model using the historical process data. In the second step, the principal scores and the Q-statistics resulting from the MPCA model are fed to the LAMDA clustering algorithm. This procedure is iterated twice. The first iteration provides an efficient and effective discrimination between normal and abnormal operational conditions. The second iteration of the procedure allowed a clear-cut discrimination of applied operational changes in the SBR history. Important to add is that this procedure helped identifying some changes in the process behaviour, which would not have been possible, had we only relied on visually inspecting this online data set of the SBR (which is traditionally the case in practice). Hence the PCA based clustering methodology is a promising tool to efficiently interpret and analyse the SBR process behaviour using large historical online data sets.


Author(s):  
Shofiqul Islam ◽  
Sonia Anand ◽  
Jemila Hamid ◽  
Lehana Thabane ◽  
Joseph Beyene

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.


2017 ◽  
Vol 33 (1) ◽  
pp. 15-41 ◽  
Author(s):  
Aida Calviño

Abstract In this article we propose a simple and versatile method for limiting disclosure in continuous microdata based on Principal Component Analysis (PCA). Instead of perturbing the original variables, we propose to alter the principal components, as they contain the same information but are uncorrelated, which permits working on each component separately, reducing processing times. The number and weight of the perturbed components determine the level of protection and distortion of the masked data. The method provides preservation of the mean vector and the variance-covariance matrix. Furthermore, depending on the technique chosen to perturb the principal components, the proposed method can provide masked, hybrid or fully synthetic data sets. Some examples of application and comparison with other methods previously proposed in the literature (in terms of disclosure risk and data utility) are also included.


2020 ◽  
Author(s):  
Grace Cox ◽  
Will Brown ◽  
Ciaran Beggan ◽  
Magnus Hammer ◽  
Chris Finlay

<p>Geomagnetic Virtual Observatories (GVOs) use satellite measurements to provide estimates of the mean internally-generated magnetic field (MF) over a specified period (usually one or four months) at a fixed location in space, mimicking the mean values obtained at ground-based observatories (GOs). These permit secular variation (SV) estimates anywhere on the globe, thereby mitigating the effects of uneven GO coverage. Current GVO estimates suffer from two key contamination sources: first, local time sampling biases due to satellite orbital dynamics, and second, MFs generated in regions external to the Earth such as the magnetosphere and ionosphere. Current methods to alleviate this contamination have drawbacks:Averaging over four months removes the local time sampling bias at the cost of reduced temporal resolution</p><ol><li>Stringent data selection criteria such as night-time, quiet-time only data greatly reduce, but do not entirely remove, external MF contamination and result in a small subset (<5%) of the available data being used</li> <li>Removing model predictions for external MFs from the measurements also reduces noise, however such parameterisations cannot fully describe these physical systems and some of their signal remains in the data.</li> </ol><p>Here we present an alternative approach to denoising GVOs that uses principal component analysis (PCA). This method retains monthly resolution, uses all available vector satellite data and removes contamination from orbital effects and external MFs. We present an application of PCA, implemented in an open-source Python package called MagPySV, to new GVOs calculated as part of a Swarm DISC project.  The denoised data will be incorporated into a new GVO data set that will be available to the geomagnetism community as an official Swarm product.  </p>


2016 ◽  
Vol 35 (2) ◽  
pp. 173-190 ◽  
Author(s):  
S. Shahid Shaukat ◽  
Toqeer Ahmed Rao ◽  
Moazzam A. Khan

AbstractIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.


2017 ◽  
Vol 727 ◽  
pp. 447-449 ◽  
Author(s):  
Jun Dai ◽  
Hua Yan ◽  
Jian Jian Yang ◽  
Jun Jun Guo

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.


Cancers ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2342
Author(s):  
Corentin Martens ◽  
Olivier Debeir ◽  
Christine Decaestecker ◽  
Thierry Metens ◽  
Laetitia Lebrun ◽  
...  

Recent works have demonstrated the added value of dynamic amino acid positron emission tomography (PET) for glioma grading and genotyping, biopsy targeting, and recurrence diagnosis. However, most of these studies are based on hand-crafted qualitative or semi-quantitative features extracted from the mean time activity curve within predefined volumes. Voxelwise dynamic PET data analysis could instead provide a better insight into intra-tumor heterogeneity of gliomas. In this work, we investigate the ability of principal component analysis (PCA) to extract relevant quantitative features from a large number of motion-corrected [S-methyl-11C]methionine ([11C]MET) PET frames. We first demonstrate the robustness of our methodology to noise by means of numerical simulations. We then build a PCA model from dynamic [11C]MET acquisitions of 20 glioma patients. In a distinct cohort of 13 glioma patients, we compare the parametric maps derived from our PCA model to these provided by the classical one-compartment pharmacokinetic model (1TCM). We show that our PCA model outperforms the 1TCM to distinguish characteristic dynamic uptake behaviors within the tumor while being less computationally expensive and not requiring arterial sampling. Such methodology could be valuable to assess the tumor aggressiveness locally with applications for treatment planning and response evaluation. This work further supports the added value of dynamic over static [11C]MET PET in gliomas.


1998 ◽  
Vol 30 (2) ◽  
pp. 227-243
Author(s):  
K. N. S. YADAVA ◽  
S. K. JAIN

This paper calculates the mean duration of the postpartum amenorrhoea (PPA) and examines its demographic, and socioeconomic correlates in rural north India, using data collected through 'retrospective' (last but one child) as well as 'current status' (last child) reporting of the duration of PPA.The mean duration of PPA was higher in the current status than in the retrospective data;n the difference being statistically significant. However, for the same mothers who gave PPA information in both the data sets, the difference in mean duration of PPA was not statistically significant. The correlates were identical in both the data sets. The current status data were more complete in terms of the coverage, and perhaps less distorted by reporting errors caused by recall lapse.A positive relationship of the mean duration of PPA was found with longer breast-feeding, higher parity and age of mother at the birth of the child, and the survival status of the child. An inverse relationship was found with higher education of a woman, higher education of her husband and higher socioeconomic status of her household, these variables possibly acting as proxies for women's better nutritional status.


Sign in / Sign up

Export Citation Format

Share Document