scholarly journals Dimension Reduction and Analysis of a 10-Year Physicochemical and Biological Water Database Applied to Water Resources Intended for Human Consumption in the Provence-Alpes-Côte d’Azur Region, France

Water ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 525 ◽  
Author(s):  
Abdessamad Tiouiouine ◽  
Suzanne Yameogo ◽  
Vincent Valles ◽  
Laurent Barbiero ◽  
Fabrice Dassonville ◽  
...  

The SISE-Eaux database of water intended for human consumption, archived by the French Regional Health Agency (ARS) since 1990, is a rich source of information. However, more or less regular monitoring over almost 30 years and the multiplication of parameters lead to a sparse matrix (observations × parameters) and a large dimension of the hyperspace of data. These characteristics make it difficult to exploit this database for a synthetic mapping of water quality, and to identify of the processes responsible for its diversity in a complex geological context and anthropized environment. A 10-year period (2006–2016) was selected from the Provence-Alpes- Côte d’Azur region database (PACA, southeastern France). We extracted 5,295 water samples, each with 15 parameters. A treatment by principal component analysis (PCA) followed with orthomax rotation allows for identifying and ranking six principal components (PCs) totaling 75% of the initial information. The association of the parameters with the principal components, and the regional distribution of the PCs make it possible to identify water-rock interactions, bacteriological contamination, redox processes and arsenic occurrence as the main sources of variability. However, the results also highlight a decrease of useful information, a constraint linked to the vast size and diversity of the study area. The development of a relevant tool for the protecting and managing of water resources will require identifying of subsets based on functional landscape units or the grouping of groundwater bodies.

2016 ◽  
Vol 16 (4) ◽  
pp. 1102-1109
Author(s):  
Xinghua Fan ◽  
Huihui Xu ◽  
Cheng Ning ◽  
Liangjie Wu

Water use intensity (WUI) reveals water withdrawals with respect to economic output. Decomposing WUI into factors provides inner-system information affecting the indicator. The present study investigates variability in WUI among provinces in China by clustering the principal components of the decomposed factors. Motivated by the index decomposition method, the authors decomposed WUI into seven factors: water use in agricultural, industrial, household and ecological sectors, exploitation rate of water resources, per capita water resources and population intensity. Those seven factors condense into four principal components under application of principal component analysis. Comprehensive WUI is calculated by these four components. Then the cluster analysis is applied to get different patterns in WUI. The principal components and the comprehensive intensity are taken as cluster variables. The number of clusters is determined to be three by applying the k-means clustering method and the F-statistic value. Variability in WUI is detected by implementing three clustering algorithms, namely k-means, fuzzy c-means and the Gaussian mixture model. WUI in China is clustered into three clusters by the k-means clustering method. Characteristics of each cluster are analyzed.


Author(s):  
Janja Jerebic ◽  
Špela Kajzer ◽  
Anja Goričan ◽  
Drago Bokal

The management of fishing fleets is an important factor in the sustainable exploitation of marine organisms for human consumption. Therefore, regulatory services monitor catches and limit them based on data. In this paper, we analyze North Atlantic Fishing Organization (NAFO) data on North Atlantic catches to direct the effectiveness of fishing stakeholders. Data on fishing time (month and year), equipment, location, type of catch, and, for us, the most interesting, data on the fishing effort are given, and their quality is analyzed. In the last part, The Principal Component Analysis for individual activities, among which fishing stakeholders can decide, is performed on a selected data sample. The complexity of the connections between the set of observed activities is explained by new uncorrelated variables - principal components - that are important for achieving the expected fishing catch. We find that the proportions of variance explained by the individual principal components are low, which indicates the high complexity of the topic discussed.


2006 ◽  
Vol 27 (2) ◽  
pp. 87-92 ◽  
Author(s):  
Willem K.B. Hofstee ◽  
Dick P.H. Barelds ◽  
Jos M.F. Ten Berge

Hofstee and Ten Berge (2004a) have proposed a new look at personality assessment data, based on a bipolar proportional (-1, .. . 0, .. . +1) scale, a corresponding coefficient of raw-scores likeness L = ΢XY/N, and raw-scores principal component analysis. In a normal sample, the approach resulted in a structure dominated by a first principal component, according to which most people are faintly to mildly socially desirable. We hypothesized that a more differentiated structure would arise in a clinical sample. We analyzed the scores of 775 psychiatric clients on the 132 items of the Dutch Personality Questionnaire (NPV). In comparison to a normative sample (N = 3140), the eigenvalue for the first principal component appeared to be 1.7 times as small, indicating that such clients have less personality (social desirability) in common. Still, the match between the structures in the two samples was excellent after oblique rotation of the loadings. We applied the abridged m-dimensional circumplex design, by which persons are typed by their two highest scores on the principal components, to the scores on the first four principal components. We identified five types: Indignant (1-), Resilient (1-2+), Nervous (1-2-), Obsessive-Compulsive (1-3-), and Introverted (1-4-), covering 40% of the psychiatric sample. Some 26% of the individuals had negligible scores on all type vectors. We discuss the potential and the limitations of our approach in a clinical context.


Methodology ◽  
2016 ◽  
Vol 12 (1) ◽  
pp. 11-20 ◽  
Author(s):  
Gregor Sočan

Abstract. When principal component solutions are compared across two groups, a question arises whether the extracted components have the same interpretation in both populations. The problem can be approached by testing null hypotheses stating that the congruence coefficients between pairs of vectors of component loadings are equal to 1. Chan, Leung, Chan, Ho, and Yung (1999) proposed a bootstrap procedure for testing the hypothesis of perfect congruence between vectors of common factor loadings. We demonstrate that the procedure by Chan et al. is both theoretically and empirically inadequate for the application on principal components. We propose a modification of their procedure, which constructs the resampling space according to the characteristics of the principal component model. The results of a simulation study show satisfactory empirical properties of the modified procedure.


2006 ◽  
Vol 1 (1) ◽  
Author(s):  
K. Katayama ◽  
K. Kimijima ◽  
O. Yamanaka ◽  
A. Nagaiwa ◽  
Y. Ono

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.


2017 ◽  
Vol 921 (3) ◽  
pp. 24-29 ◽  
Author(s):  
S.I. Lesnykh ◽  
A.K. Cherkashin

The proposed procedure of integral mapping is based on calculation of evaluation functions on the integral indicators (II) taking into account the feature of the local geographical environment, when geosystems in the same states in the different environs have various estimates. Calculation of II is realized with application of a Principal Component Analysis for processing of the forest database, allowing to consider in II the weight of each indicator (attribute). The final value of II is equal to a difference of the first (condition of geosystem) and the second (condition of environmental background) principal components. The evaluation functions are calculated on this value for various problems of integral mapping. The environmental factors of variability is excluded from final value of II, therefore there is an opportunity to find the invariant evaluation function and to determine coefficients of this function. Concepts and functions of the theory of reliability for making the evaluation maps of the hazard of functioning and stability of geosystems are used.


2021 ◽  
pp. 000370282098784
Author(s):  
James Renwick Beattie ◽  
Francis Esmonde-White

Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal Components Analysis (PCA) is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning PCA is not well understood by many applied analytical scientists and spectroscopists who use PCA. The meaning of features identified through PCA are often unclear. This manuscript traces the journey of the spectra themselves through the operations behind PCA, with each step illustrated by simulated spectra. PCA relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of PCA, such the scores representing ‘concentration’ or ‘weights’. The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a PCA model shows how to interpret application specific chemical meaning of the PCA loadings and how to analyze scores. A critical benefit of PCA is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.


Molecules ◽  
2021 ◽  
Vol 26 (7) ◽  
pp. 1879
Author(s):  
Oladipupo Q. Adiamo ◽  
Yasmina Sultanbawa ◽  
Daniel Cozzolino

In recent times, the popularity of adding value to under-utilized legumes have increased to enhance their use for human consumption. Acacia seed (AS) is an underutilized legume with over 40 edible species found in Australia. The study aimed to qualitatively characterize the chemical composition of 14 common edible AS species from 27 regions in Australia using mid-infrared (MIR) spectroscopy as a rapid tool. Raw and roasted (180 °C, 5, 7, and 9 min) AS flour were analysed using MIR spectroscopy. The wavenumbers (1045 cm−1, 1641 cm−1, and 2852–2926 cm−1) in the MIR spectra show the main components in the AS samples. Principal component analysis (PCA) of the MIR data displayed the clustering of samples according to species and roasting treatment. However, regional differences within the same AS species have less of an effect on the components, as shown in the PCA plot. Statistical analysis of absorbance at specific wavenumbers showed that roasting significantly (p < 0.05) reduced the compositions of some of the AS species. The results provided a foundation for hypothesizing the compositional similarity and/or differences among AS species before and after roasting.


1988 ◽  
Vol 18 (1) ◽  
pp. 211-218 ◽  
Author(s):  
J. L. Vazquez-Barquero ◽  
P. Williams ◽  
J. F. Diez-Manrique ◽  
J. Lequerica ◽  
A. Arenal

SynopsisThe factor structure of the 60-item version of the General Health Questionnaire was explored, using data collected in a community study in a rural area of northern Spain. Six principal components, similar to those previously reported with this instrument, were found to provide a good description of the data structure.The 30-item and 12-item versions of the GHQ were then disembedded from the parent version, and further principal components analyses carried out. Again, the results were similar to previous studies: in each of the three versions analysed here, the two most important components represented a disturbance of mood (‘general dysphoria’)– including aspects of anxiety, depression and irritability– and a disturbance of social performance (‘social function/optimism’).The principal component structure of the GHQ-60 was then utilized to calculate factor scores, and these were compared with PSE ratings using Relative Operating Characteristic (ROC) analysis. While four of the six factors discriminated well (area under the ROC curve 0–75 or more) between PSE ‘cases’ and ‘non-cases’, only one, depressive thoughts, was a good discriminator between depressed and non-depressed PSE ‘cases’.


1995 ◽  
Vol 7 (6) ◽  
pp. 1191-1205 ◽  
Author(s):  
Colin Fyfe

A review is given of a new artificial neural network architecture in which the weights converge to the principal component subspace. The weights learn by only simple Hebbian learning yet require no clipping, normalization or weight decay. The net self-organizes using negative feedback of activation from a set of "interneurons" to the input neurons. By allowing this negative feedback from the interneurons to act on other interneurons we can introduce the necessary asymmetry to cause convergence to the actual principal components. Simulations and analysis confirm such convergence.


Sign in / Sign up

Export Citation Format

Share Document