Sparse STATIS-Dual via Elastic Net

Multi-set multivariate data analysis methods provide a way to analyze a series of tables together. In particular, the STATIS-dual method is applied in data tables where individuals can vary from one table to another, but the variables that are analyzed remain fixed. However, when you have a large number of variables or indicators, interpretation through traditional multiple-set methods is complex. For this reason, in this paper, a new methodology is proposed, which we have called Sparse STATIS-dual. This implements the elastic net penalty technique which seeks to retain the most important variables of the model and obtain more precise and interpretable results. As a complement to the new methodology and to materialize its application to data tables with fixed variables, a package is created in the R programming language, under the name Sparse STATIS-dual. Finally, an application to real data is presented and a comparison of results is made between the STATIS-dual and the Sparse STATIS-dual. The proposed method improves the informative capacity of the data and offers more easily interpretable solutions.

Download Full-text

Data Visualization as Helping Technique for Data Analysis, Trend Detection and Correlation of Variables Using R Programming Language

2019 8th Mediterranean Conference on Embedded Computing (MECO) ◽

10.1109/meco.2019.8760004 ◽

2019 ◽

Cited By ~ 2

Author(s):

Ilir Keka ◽

Betim Cico

Keyword(s):

Data Analysis ◽

Data Visualization ◽

Programming Language ◽

Trend Detection ◽

R Programming Language ◽

R Programming

Download Full-text

Problem of data analysis and forecasting using decision trees method

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.02-03.220 ◽

2016 ◽

pp. 220-226

Author(s):

T.I. Lytvynenko ◽

Keyword(s):

Data Analysis ◽

Decision Tree ◽

Data Processing ◽

Programming Language ◽

Statistical Computing ◽

Software Environment ◽

R Programming Language ◽

Improve Accuracy ◽

R Programming ◽

Tree Approach

This study describes an application of the decision tree approach to the problem of data analysis and forecasting. Data processing bases on the real observations that represent sales level in the period between 2006 and 2009. R (programming language and software environment) is used as a tool for statistical computing. Paper includes comparison of the method with well-known approaches and solutions in order to improve accuracy of the gained consequences.

Download Full-text

FuzzyR: An Extended Fuzzy Logic Toolbox for the R Programming Language

2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz48607.2020.9177780 ◽

2020 ◽

Author(s):

Chao Chen ◽

Tajul Rosli Razak ◽

Jonathan M. Garibaldi

Keyword(s):

Fuzzy Logic ◽

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Understanding the Behavior of Zadeh’s Extension Principle for One-to-One Functions by R Programming Language

Advances in Intelligent Systems and Computing - Intelligent and Fuzzy Techniques: Smart and Innovative Solutions ◽

10.1007/978-3-030-51156-2_153 ◽

2020 ◽

pp. 1309-1315

Author(s):

Abbas Parchami ◽

Parisa Khalilpoor

Keyword(s):

Programming Language ◽

Extension Principle ◽

R Programming Language ◽

One To One ◽

R Programming

Download Full-text

Introduction to the R Programming Language

Handbook of Educational Measurement and Psychometrics Using R ◽

10.1201/b20498-1 ◽

2018 ◽

pp. 1-29

Author(s):

Christopher D. Desjardins ◽

Okan Bulut

Keyword(s):

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Developing codes for validation of PM10, PM2.5, and O3 datasets using R programming language

Journal of Air Pollution and Health ◽

10.18502/japh.v4i1.604 ◽

2019 ◽

Author(s):

Ramin Nabizadeh ◽

Mostafa Hadei

Keyword(s):

Air Pollution ◽

Programming Language ◽

Assessment Method ◽

Daily Maximum ◽

Data Handling ◽

R Programming Language ◽

Wide Range ◽

Us Epa ◽

R Programming ◽

Pm 10

Introduction: The wide range of studies on air pollution requires accurate and reliable datasets. However, due to many reasons, the measured concentra-tions may be incomplete or biased. The development of an easy-to-use and reproducible exposure assessment method is required for researchers. There-fore, in this article, we describe and present a series of codes written in R Programming Language for data handling, validating and averaging of PM10, PM2.5, and O3 datasets. Findings: These codes can be used in any types of air pollution studies that seek for PM and ozone concentrations that are indicator of real concentra-tions. We used and combined criteria from several guidelines proposed by US EPA and APHEKOM project to obtain an acceptable methodology. Separate .csv files for PM 10, PM 2.5 and O3 should be prepared as input file. After the file was imported to the R Programming software, first, negative and zero values of concentrations within all the dataset will be removed. Then, only monitors will be selected that have at least 75% of hourly concentrations. Then, 24-h averages and daily maximum of 8-h moving averages will be calculated for PM and ozone, respectively. For output, the codes create two different sets of data. One contains the hourly concentrations of the interest pollutant (PM10, PM2.5, or O3) in valid stations and their average at city level. Another is the final 24-h averages of city for PM10 and PM2.5 or the final daily maximum 8-h averages of city for O3. Conclusion: These validated codes use a reliable and valid methodology, and eliminate the possibility of wrong or mistaken data handling and averaging. The use of these codes are free and without any limitation, only after the cita-tion to this article.

Download Full-text

Spatial variation of physicochemical parameters in a constructed wetland for wastewater treatment: An example of the use of the R programming language

UNED Research Journal ◽

10.22458/urj.v13i1.3294 ◽

2021 ◽

Vol 13 (1) ◽

pp. 15

Author(s):

Junior Pastor Pérez-Molina ◽

Carola Scholz ◽

Roy Pérez-Salazar ◽

Carolina Alfaro-Chinchilla ◽

Ana Abarca Méndez ◽

...

Keyword(s):

Wastewater Treatment ◽

Spatial Variation ◽

Water Flow ◽

Programming Language ◽

Constructed Wetland ◽

Physicochemical Parameters ◽

Preferential Flow ◽

Oxygen Demand ◽

R Programming Language ◽

R Programming

Introduction: The implementation of wastewater treatment systems such as constructed wetlands has a growing interest in the last decade due to its low cost and high effectiveness in treating industrial and residential wastewater. Objective: To evaluate the spatial variation of physicochemical parameters in a constructed wetland system of sub-superficial flow of Pennisetum alopecuroides (Pennisetum) and a Control (unplanted). The purpose is to provide an analysis of spatial dynamic of physicochemical parameters using R programming language. Methods: Each of the cells (Pennisetum and Control) had 12 piezometers, organized in three columns and four rows with a separation distance of 3,25m and 4,35m, respectively. The turbidity, biochemical oxygen demand (BOD), chemical oxygen demand (COD), total Kjeldahl nitrogen (TKN), ammoniacal nitrogen (N-NH4), organic nitrogen (N-org.) and phosphorous (P-PO4-3) were measured in water under in-flow and out-flow of both conditions Control and Pennisetum (n= 8). Additionally, the oxidation-reduction potential (ORP), dissolved oxygen (DO), conductivity, pH and water temperature, were measured (n= 167) in the piezometers. Results: No statistically significant differences between cells for TKN, N-NH4, conductivity, turbidity, BOD, and COD were found; but both Control and Pennisetum cells showed a significant reduction in these parameters (P<0,05). Overall, TKN and N-NH4 removal were from 65,8 to 84,1% and 67,5 to 90,8%, respectively; and decrease in turbidity, conductivity, BOD, and COD, were between 95,1-95,4%; 15-22,4%; 65,2-77,9% and 57,4-60,3% respectively. Both cells showed ORP increasing gradient along the water-flow direction, contrary to conductivity (p<0,05). However, OD, pH and temperature were inconsistent in the direction of the water flow in both cells. Conclusions: Pennisetum demonstrated pollutant removal efficiency, but presented results similar to the control cells, therefore, remains unclear if it is a superior option or not. Spatial variation analysis did not reflect any obstruction of flow along the CWs; but some preferential flow paths can be distinguished. An open-source repository of R was provided.

Download Full-text

A UNIFIED STUDY OF MULTIVARIATE DATA ANALYSIS METHODS BY NONLINEAR FORMULATIONS AND UNDERLYING PROBABILISTIC STRUCTURES

Recent Developments in Clustering and Data Analysis ◽

10.1016/b978-0-12-215485-0.50012-0 ◽

1988 ◽

pp. 97-102 ◽

Cited By ~ 2

Author(s):

Nobuyuki Otsu ◽

Takio Kurita ◽

Hideki Asoh

Keyword(s):

Data Analysis ◽

Multivariate Data Analysis ◽

Multivariate Data ◽

Analysis Methods ◽

Unified Study ◽

Data Analysis Methods

Download Full-text

The challenge raised by Gaia

Proceedings of the International Astronomical Union ◽

10.1017/s1743921310008586 ◽

2009 ◽

Vol 5 (H15) ◽

pp. 174-175

Author(s):

Annie C. Robin

Keyword(s):

Data Analysis ◽

Multivariate Data Analysis ◽

Milky Way ◽

Dynamical Evolution ◽

High Quality ◽

3D Kinematics ◽

Galactic Potential ◽

The Galaxy ◽

High Level ◽

Data Analysis Methods

AbstractGaia will perform an unprecedented high quality survey of the Milky Way. Distances, 3D kinematics, ages and abundances will be obtained, giving access to the overall mass distribution and to the Galactic potential. Gaia data analysis will involve a high level of complexity requiring new and efficient multivariate data analysis methods, improved modelling of the stellar populations and dynamical approaches to the interpretation of the data in terms of the chemical and dynamical evolution of the Galaxy.

Download Full-text

Progress in the R ecosystem for representing and handling spatial data

Journal of Geographical Systems ◽

10.1007/s10109-020-00336-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Roger S. Bivand

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

Spatial Data ◽

Spatial Data Analysis ◽

Data Handling ◽

Good Match ◽

R Programming Language ◽

R Packages ◽

R Programming

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.

Download Full-text