Problem of data analysis and forecasting using decision trees method

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.02-03.220 ◽

2016 ◽

pp. 220-226

Author(s):

T.I. Lytvynenko ◽

Keyword(s):

Data Analysis ◽

Decision Tree ◽

Data Processing ◽

Programming Language ◽

Statistical Computing ◽

Software Environment ◽

R Programming Language ◽

Improve Accuracy ◽

R Programming ◽

Tree Approach

This study describes an application of the decision tree approach to the problem of data analysis and forecasting. Data processing bases on the real observations that represent sales level in the period between 2006 and 2009. R (programming language and software environment) is used as a tool for statistical computing. Paper includes comparison of the method with well-known approaches and solutions in order to improve accuracy of the gained consequences.

Download Full-text

Ensuring Scalability of a Cognitive Multiple-Choice Test through the Mokken Package in R Programming Language

Education Sciences ◽

10.3390/educsci11120794 ◽

2021 ◽

Vol 11 (12) ◽

pp. 794

Author(s):

Musa Adekunle Ayanwale ◽

Mdutshekelwa Ndlovu

Keyword(s):

Programming Language ◽

Multiple Choice ◽

West African ◽

Choice Test ◽

Statistical Computing ◽

Multiple Choice Test ◽

R Programming Language ◽

Lagos State ◽

R Programming ◽

K 12

This study investigated the scalability of a cognitive multiple-choice test through the Mokken package in the R programming language for statistical computing. A 2019 mathematics West African Examinations Council (WAEC) instrument was used to gather data from randomly drawn K-12 participants (N = 2866; Male = 1232; Female = 1634; Mean age = 16.5 years) in Education District I, Lagos State, Nigeria. The results showed that the monotone homogeneity model (MHM) was consistent with the empirical dataset. However, it was observed that the test could not be scaled unidimensionally due to the low scalability of some items. In addition, the test discriminated well and had low accuracy for item-invariant ordering (IIO). Thus, items seriously violated the IIO property and scalability criteria when the HT coefficient was estimated. Consequently, the test requires modification in order to provide monotonic characteristics. This has implications for public examining bodies when endeavouring to assess the IIO assumption of their items in order to boost the validity of testing.

Download Full-text

Sparse STATIS-Dual via Elastic Net

Mathematics ◽

10.3390/math9172094 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2094

Author(s):

Carmen C. Rodríguez-Martínez ◽

Mitzi Cubilla-Montilla ◽

Purificación Vicente-Galindo ◽

Purificación Galindo-Villardón

Keyword(s):

Data Analysis ◽

Programming Language ◽

Multivariate Data Analysis ◽

Real Data ◽

Elastic Net ◽

R Programming Language ◽

Data Tables ◽

R Programming ◽

Penalty Technique ◽

Data Analysis Methods

Multi-set multivariate data analysis methods provide a way to analyze a series of tables together. In particular, the STATIS-dual method is applied in data tables where individuals can vary from one table to another, but the variables that are analyzed remain fixed. However, when you have a large number of variables or indicators, interpretation through traditional multiple-set methods is complex. For this reason, in this paper, a new methodology is proposed, which we have called Sparse STATIS-dual. This implements the elastic net penalty technique which seeks to retain the most important variables of the model and obtain more precise and interpretable results. As a complement to the new methodology and to materialize its application to data tables with fixed variables, a package is created in the R programming language, under the name Sparse STATIS-dual. Finally, an application to real data is presented and a comparison of results is made between the STATIS-dual and the Sparse STATIS-dual. The proposed method improves the informative capacity of the data and offers more easily interpretable solutions.

Download Full-text

Data Visualization as Helping Technique for Data Analysis, Trend Detection and Correlation of Variables Using R Programming Language

2019 8th Mediterranean Conference on Embedded Computing (MECO) ◽

10.1109/meco.2019.8760004 ◽

2019 ◽

Cited By ~ 2

Author(s):

Ilir Keka ◽

Betim Cico

Keyword(s):

Data Analysis ◽

Data Visualization ◽

Programming Language ◽

Trend Detection ◽

R Programming Language ◽

R Programming

Download Full-text

Appendix B: R (Programming Language and Software Environment)

Mathematical Modeling and Simulation ◽

10.1002/9783527627608.app2 ◽

2009 ◽

pp. 321-322

Keyword(s):

Programming Language ◽

Software Environment ◽

R Programming Language ◽

R Programming

Download Full-text

FuzzyR: An Extended Fuzzy Logic Toolbox for the R Programming Language

2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz48607.2020.9177780 ◽

2020 ◽

Author(s):

Chao Chen ◽

Tajul Rosli Razak ◽

Jonathan M. Garibaldi

Keyword(s):

Fuzzy Logic ◽

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Understanding the Behavior of Zadeh’s Extension Principle for One-to-One Functions by R Programming Language

Advances in Intelligent Systems and Computing - Intelligent and Fuzzy Techniques: Smart and Innovative Solutions ◽

10.1007/978-3-030-51156-2_153 ◽

2020 ◽

pp. 1309-1315

Author(s):

Abbas Parchami ◽

Parisa Khalilpoor

Keyword(s):

Programming Language ◽

Extension Principle ◽

R Programming Language ◽

One To One ◽

R Programming

Download Full-text

Introduction to the R Programming Language

Handbook of Educational Measurement and Psychometrics Using R ◽

10.1201/b20498-1 ◽

2018 ◽

pp. 1-29

Author(s):

Christopher D. Desjardins ◽

Okan Bulut

Keyword(s):

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Developing codes for validation of PM10, PM2.5, and O3 datasets using R programming language

Journal of Air Pollution and Health ◽

10.18502/japh.v4i1.604 ◽

2019 ◽

Author(s):

Ramin Nabizadeh ◽

Mostafa Hadei

Keyword(s):

Air Pollution ◽

Programming Language ◽

Assessment Method ◽

Daily Maximum ◽

Data Handling ◽

R Programming Language ◽

Wide Range ◽

Us Epa ◽

R Programming ◽

Pm 10

Introduction: The wide range of studies on air pollution requires accurate and reliable datasets. However, due to many reasons, the measured concentra-tions may be incomplete or biased. The development of an easy-to-use and reproducible exposure assessment method is required for researchers. There-fore, in this article, we describe and present a series of codes written in R Programming Language for data handling, validating and averaging of PM10, PM2.5, and O3 datasets. Findings: These codes can be used in any types of air pollution studies that seek for PM and ozone concentrations that are indicator of real concentra-tions. We used and combined criteria from several guidelines proposed by US EPA and APHEKOM project to obtain an acceptable methodology. Separate .csv files for PM 10, PM 2.5 and O3 should be prepared as input file. After the file was imported to the R Programming software, first, negative and zero values of concentrations within all the dataset will be removed. Then, only monitors will be selected that have at least 75% of hourly concentrations. Then, 24-h averages and daily maximum of 8-h moving averages will be calculated for PM and ozone, respectively. For output, the codes create two different sets of data. One contains the hourly concentrations of the interest pollutant (PM10, PM2.5, or O3) in valid stations and their average at city level. Another is the final 24-h averages of city for PM10 and PM2.5 or the final daily maximum 8-h averages of city for O3. Conclusion: These validated codes use a reliable and valid methodology, and eliminate the possibility of wrong or mistaken data handling and averaging. The use of these codes are free and without any limitation, only after the cita-tion to this article.

Download Full-text

Spatial variation of physicochemical parameters in a constructed wetland for wastewater treatment: An example of the use of the R programming language

UNED Research Journal ◽

10.22458/urj.v13i1.3294 ◽

2021 ◽

Vol 13 (1) ◽

pp. 15

Author(s):

Junior Pastor Pérez-Molina ◽

Carola Scholz ◽

Roy Pérez-Salazar ◽

Carolina Alfaro-Chinchilla ◽

Ana Abarca Méndez ◽

...

Keyword(s):

Wastewater Treatment ◽

Spatial Variation ◽

Water Flow ◽

Programming Language ◽

Constructed Wetland ◽

Physicochemical Parameters ◽

Preferential Flow ◽

Oxygen Demand ◽

R Programming Language ◽

R Programming

Introduction: The implementation of wastewater treatment systems such as constructed wetlands has a growing interest in the last decade due to its low cost and high effectiveness in treating industrial and residential wastewater. Objective: To evaluate the spatial variation of physicochemical parameters in a constructed wetland system of sub-superficial flow of Pennisetum alopecuroides (Pennisetum) and a Control (unplanted). The purpose is to provide an analysis of spatial dynamic of physicochemical parameters using R programming language. Methods: Each of the cells (Pennisetum and Control) had 12 piezometers, organized in three columns and four rows with a separation distance of 3,25m and 4,35m, respectively. The turbidity, biochemical oxygen demand (BOD), chemical oxygen demand (COD), total Kjeldahl nitrogen (TKN), ammoniacal nitrogen (N-NH4), organic nitrogen (N-org.) and phosphorous (P-PO4-3) were measured in water under in-flow and out-flow of both conditions Control and Pennisetum (n= 8). Additionally, the oxidation-reduction potential (ORP), dissolved oxygen (DO), conductivity, pH and water temperature, were measured (n= 167) in the piezometers. Results: No statistically significant differences between cells for TKN, N-NH4, conductivity, turbidity, BOD, and COD were found; but both Control and Pennisetum cells showed a significant reduction in these parameters (P<0,05). Overall, TKN and N-NH4 removal were from 65,8 to 84,1% and 67,5 to 90,8%, respectively; and decrease in turbidity, conductivity, BOD, and COD, were between 95,1-95,4%; 15-22,4%; 65,2-77,9% and 57,4-60,3% respectively. Both cells showed ORP increasing gradient along the water-flow direction, contrary to conductivity (p<0,05). However, OD, pH and temperature were inconsistent in the direction of the water flow in both cells. Conclusions: Pennisetum demonstrated pollutant removal efficiency, but presented results similar to the control cells, therefore, remains unclear if it is a superior option or not. Spatial variation analysis did not reflect any obstruction of flow along the CWs; but some preferential flow paths can be distinguished. An open-source repository of R was provided.

Download Full-text

Progress in the R ecosystem for representing and handling spatial data

Journal of Geographical Systems ◽

10.1007/s10109-020-00336-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Roger S. Bivand

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

Spatial Data ◽

Spatial Data Analysis ◽

Data Handling ◽

Good Match ◽

R Programming Language ◽

R Packages ◽

R Programming

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.

Download Full-text