Comparative Approaches to Using R and Python for Statistical Data Analysis - Advances in Systems Analysis, Software Engineering, and High Performance Computing
Latest Publications


TOTAL DOCUMENTS

9
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By IGI Global

9781683180166, 9781522519898

Descriptive statistics is the initial stage of analysis used to describe and summarize data. The availability of a large amount of data and very efficient computational methods strengthened this area of the statistic. In this chapter, we introduce the main concepts related to descriptive analysis. These will provide a vast quantity of knowledge to perform a high-quality descriptive analysis.


Cluster analysis, which we approach in this chapter, is the task of grouping a set of objects in such a way that objects in the same group or cluster are more similar to each other than to those in other groups or clusters. It is a common technique for statistical data analysis. Cluster analysis can be achieved by various algorithms that might differ significantly. Therefore, cluster analysis as such is not a trivial task. It is an interactive multi-objective optimization that involves trial and error. Therefore, in cluster analysis, the clustering of subjects or variables are made from similarity measures or dissimilarity (distance) between two subjects initially, and later between two clusters. These groups can be done using hierarchical or non-hierarchical techniques.


Factor analysis is a statistical method used to describe variability among observed, correlated variables. The goal of performing factor analysis is to search for some unobserved variables called factors. The observed variables are modelled as linear combinations of the possible factors, added the error quantification of this approximation. This added information about the interaction of observed variables could be used for further analysis of the importance of each variable in the context of the dataset. This way, some observed variables are substituted by a set of latent variables in a lower amount, and that, therefore, represents the data in a summarized fashion.


Statistical inference allows drawing conclusions from data. These analyses use a random sample of data taken from a population to describe and make inferences about the population. Inferential statistics are valuable when it is not convenient or possible to examine each member of an entire population. In this chapter, some concepts like ANOVA, Student's t-test, Chi-Square test, Mann-Whitney test and Kruskal-Wallis test will be presented. Given the insight of a particular phenomenon, after reading this chapter, the analyst will be able to, from that knowledge, infer possible new results.


In this chapter, we present the dataset used in the course of this book. Our case study is built upon fictional data. This process implied “collecting” data, and we will explain each of the chosen variables with more detail.


Statistics is a set of methods used to analyze data. This chapter presents the main concepts used in statistics, learning from data is one of the most critical challenges. In general, we can say that statistic based on the theory of probability, provides techniques and methods for data analysis, which help the decision-making process in various problems where there is uncertainty.


Keyword(s):  

After all of the material we covered throughout this book, this chapter ends the book with a discussion and conclusion about the document's purpose. Thus, in this chapter, we try to clearly state the reasons why we have used the tools we chosen for the statistical analysis tasks and finally conclude the comparison between them.


In statistical modelling, regression analysis is a statistical process for estimating the relationships among variables. More specifically, regression analysis helps the reader understand how the dependent variable changes when any of the independent variables is varied. Thus, regression analysis estimates the average value of the dependent variable when the independent variables are fixed. Therefore, the estimation target is a function of the independent variables called regression function. In limited circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. Nonetheless, caution has to be taken since correlation might not signify causality. Regression analysis techniques are varied. Nevertheless, in this chapter, we will present only the fundamental analysis.


Keyword(s):  

This chapter introduces the basic concepts of the languages we propose to use in the data analysis tasks. Thus, we first introduce some features of R and Python.


Sign in / Sign up

Export Citation Format

Share Document