statistical programming
Recently Published Documents


TOTAL DOCUMENTS

83
(FIVE YEARS 46)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 20 ◽  
pp. 177-184
Author(s):  
Ozer Ozdemir ◽  
Simgenur Cerman

In data mining, one of the commonly-used techniques is the clustering. Clustering can be done by the different algorithms such as hierarchical, partitioning, grid, density and graph based algorithms. In this study first of all the concept of data mining explained, then giving information the aims of using data mining and the areas of using and then clustering and clustering algorithms that used in data mining are explained theoretically. Ultimately within the scope of this study, "Mall Customers" data set that taken from Kaggle database, based partitioned clustering and hierarchical clustering algorithms aimed at the separation of clusters according to their costumers features. In the clusters obtained by the partitional clustering algorithms, the similarity within the cluster is maximum and the similarity between the clusters is minimum. The hierarchical clustering algorithms is based on the gathering of similar features or vice versa. The partitional clustering algorithms used; k-means and PAM, hierarchical clustering algorithms used; AGNES and DIANA are algorithms. In this study, R statistical programming language was used in the application of algorithms. At the end of the study, the data set was run with clustering algorithms and the obtained analysis results were interpreted.


Author(s):  
Piotr Śpiewanowski ◽  
Oleksandr Talavera ◽  
Linh Vi

The 21st-century economy is increasingly built around data. Firms and individuals upload and store enormous amount of data. Most of the produced data is stored on private servers, but a considerable part is made publicly available across the 1.83 billion websites available online. These data can be accessed by researchers using web-scraping techniques. Web scraping refers to the process of collecting data from web pages either manually or using automation tools or specialized software. Web scraping is possible and relatively simple thanks to the regular structure of the code used for websites designed to be displayed in web browsers. Websites built with HTML can be scraped using standard text-mining tools, either scripts in popular (statistical) programming languages such as Python, Stata, R, or stand-alone dedicated web-scraping tools. Some of those tools do not even require any prior programming skills. Since about 2010, with the omnipresence of social and economic activities on the Internet, web scraping has become increasingly more popular among academic researchers. In contrast to proprietary data, which might not be feasible due to substantial costs, web scraping can make interesting data sources accessible to everyone. Thanks to web scraping, the data are now available in real time and with significantly more details than what has been traditionally offered by statistical offices or commercial data vendors. In fact, many statistical offices have started using web-scraped data, for example, for calculating price indices. Data collected through web scraping has been used in numerous economic and finance projects and can easily complement traditional data sources.


Author(s):  
Charles Auerbach ◽  
Wendy Zeitlin

Single-subject research designs have been used to build evidence to the effective treatment of problems across various disciplines, including social work, psychology, psychiatry, medicine, allied health fields, juvenile justice, and special education. This book serves as a guide for those desiring to conduct single-subject data analysis. The aim of this text is to introduce readers to the various functions available in SSD for R, a new, free, and innovative software package written in R, the robust open-source statistical programming language written by the book’s authors. SSD for R has the most comprehensive functionality specifically designed for the analysis of single-subject research data currently available. SSD for R has numerous graphing and charting functions to conduct robust visual analysis. Besides the ability to create simple line graphs, features are available to add mean, median, and standard deviation lines across phases to help better visualize change over time. Graphs can be annotated with text. SSD for R contains a wide variety of functions to conduct statistical analyses traditionally conducted with single-subject data. These include numerous descriptive statistics and effect size functions and tests of statistical significance, such as t tests, chi-squares, and the conservative dual criteria. Finally, SSD for R has the capability of analyzing group-level data. Readers are led step by step through the analytical process based on the characteristics of their data. Numerous examples and illustrations are provided to help readers understand the wide range of functions available in SSD for R and their application to data analysis and interpretation.


2021 ◽  
Author(s):  
Qianrao Fu

It is a tradition that goes back to Jacob Cohen to calculate the sample size before collecting data. The most commonly asked question is: "How many subjects do we need to obtain a significant result if we use the p-value to evaluate the hypothesis if an effect size exists?" In the Bayesian framework, we may want to know how many subjects are needed to get convincing evidence if we use the Bayes factor to evaluate the hypothesis. This paper proposes a solution to the above question by reaching two goals: firstly, the size of the Bayes factor reaches a given threshold, and secondly the probability that the Bayes factor exceeds the given threshold reaches a required value. Researchers can express their expectations through the order or the sign hypothesis of the parameters in a linear regression model. For example, the researchers may expect the regression coefficient to be $\beta_1>\beta_2>\beta_3$, which is an order constrained hypothesis; or the researchers may expect a regression coefficient $\beta_1>0$, which is a sign hypothesis. The greatest advantage of using a specific hypothesis is that the sample size required is reduced compared to an unconstrained hypothesis to achieve the same probability that the Bayes factor exceeds some threshold. This article provides sample size tables for the null hypothesis, order hypothesis, sign hypothesis, complement hypothesis, and unconstrained hypothesis. To enhance the applicability, an R package is developed via a Monte Carlo simulation, which can facilitate psychologists while planning the sample size even if they do not have any statistical programming background.


2021 ◽  
Author(s):  
Paul Brennan

Data visualization is an extremely valuable skill in science, finance and journalism. Learning to program will help reproducible data analysis and will increase the different types of visualization that can be generated. The statistical programming language R is a very useful programming language. The R community is friendly, supportive and very diverse including students, academics, health scientists, journalists and professional data scientists. An experience of R or another programming language such as Python or JavaScript will improve your science and your employment opportunities in and outside of research. Programming is a useful skill in education, finance, journalism and other areas too.


2021 ◽  
Vol 19 (3) ◽  
pp. 688-693
Author(s):  
Tommy C. Efrata ◽  
◽  
Wirawan E. D. Radianto ◽  
Junko A. Effendy ◽  
◽  
...  

Studies on individual entrepreneurial orientation as well as the relationship between innovativeness, proactiveness, and risk-taking have not received much attention in the entrepreneurship literature. Therefore, this study aims to explore the relationship between the components of individual entrepreneurial orientation and examine the relationship between entrepreneurship education, individual entrepreneurial orientation, and entrepreneurial intention. The model developed was tested on 231 management and business students who have completed an entrepreneurship education program in the university. The data obtained were processed using PLS-SEM statistical programming to evaluate the outer and inner structure of the model. This study indicates that most of the arguments of the model compilers as explicitly observed in determining the effect of proactiveness on innovation have been confirmed. At the same time, risk-taking was discovered not to affect personal innovativeness. Entrepreneurship education was also proven to have affected individual entrepreneurial orientation, while only innovativeness and risk-taking were confirmed to have the ability to increase entrepreneurial intention. The findings succeeded in filling the void related to the study on the relationship dynamics between the dimensions forming individual entrepreneurial orientation. They also comprehensively complement the study model designed in the scope of individual entrepreneurial orientation, which was partially completed. Therefore, the results are expected to provide direction for educators and scholars in the area of entrepreneurship.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lihan Chen ◽  
Victoria Savalei

In missing data analysis, the reporting of missing rates is insufficient for the readers to determine the impact of missing data on the efficiency of parameter estimates. A more diagnostic measure, the fraction of missing information (FMI), shows how the standard errors of parameter estimates increase from the information loss due to ignorable missing data. FMI is well-known in the multiple imputation literature (Rubin, 1987), but it has only been more recently developed for full information maximum likelihood (Savalei and Rhemtulla, 2012). Sample FMI estimates using this approach have since then been made accessible as part of the lavaan package (Rosseel, 2012) in the R statistical programming language. However, the properties of FMI estimates at finite sample sizes have not been the subject of comprehensive investigation. In this paper, we present a simulation study on the properties of three sample FMI estimates from FIML in two common models in psychology, regression and two-factor analysis. We summarize the performance of these FMI estimates and make recommendations on their application.


Author(s):  
Oana Stoicescu ◽  
Eija Ferreira ◽  
Satu Tamminen ◽  
Pekka Siirtola ◽  
Gunjan Chandra ◽  
...  

Analyzing clinical data comes with many challenges. Medical expertise combined with statistical and programming knowledge must go hand-in-hand when applying data mining methods on clinical datasets. This work aims at bridging the gap between clinical expertise and computer science knowledge by providing an application for clinical data analysis with no requirement for statistical programming knowledge. Our tool allows clinical researchers to conduct data processing and visualization in an interactive environment, thus providing an assisting tool for clinical studies. The application was experimentally evaluated with an analysis of Type 1 Diabetes clinical data. The results obtained with the tool are in line with the domain literature, demonstrating the value of our application in data exploration and hypothesis testing.


Significance ◽  
2021 ◽  
Vol 18 (3) ◽  
pp. 6-7
Author(s):  
Simon Schwab ◽  
Leonhard Held

2021 ◽  
Author(s):  
W. John Braun ◽  
Duncan J. Murdoch

This third edition of Braun and Murdoch's bestselling textbook now includes discussion of the use and design principles of the tidyverse packages in R, including expanded coverage of ggplot2, and R Markdown. The expanded simulation chapter introduces the Box–Muller and Metropolis–Hastings algorithms. New examples and exercises have been added throughout. This is the only introduction you'll need to start programming in R, the computing standard for analyzing data. This book comes with real R code that teaches the standards of the language. Unlike other introductory books on the R system, this book emphasizes portable programming skills that apply to most computing languages and techniques used to develop more complex projects. Solutions, datasets, and any errata are available from www.statprogr.science. Worked examples - from real applications - hundreds of exercises, and downloadable code, datasets, and solutions make a complete package for anyone working in or learning practical data science.


Sign in / Sign up

Export Citation Format

Share Document