scholarly journals Motivation, values, and work design as drivers of participation in the R open source project for statistical computing

2015 ◽  
Vol 112 (48) ◽  
pp. 14788-14792 ◽  
Author(s):  
Patrick Mair ◽  
Eva Hofmann ◽  
Kathrin Gruber ◽  
Reinhold Hatzinger ◽  
Achim Zeileis ◽  
...  

One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10849
Author(s):  
Maximilian Knoll ◽  
Jennifer Furkel ◽  
Juergen Debus ◽  
Amir Abdollahi

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).


Author(s):  
Sri Nathasya Br Sitepu ◽  
Angelica Irene Christina

This research attempts to examine the impact of the coffee shop characteristics towards the consumers experience when they visit the coffee shop. The coffee shop characteristics including functional, atmosphere, design, and social characteristics. The population of this study are all Surabaya productive age residents, and the sample of this study was determined using Quota Sampling and the Isaac and Michael formula with the respondents requirements are those who had been visiting and/or consuming products directly at Starbucks Surabaya on maximum of 2 -3 months before filling out the questionnaire, with total of 384 respondents needed to be obtained. The questionnaire was distributed online and offline, with total 369 questionnaires are used in this study. This research uses SEM analysis. This research found that only the functional and social characteristics of the coffee shop have significant effect on the experience gained by its consumers; while the atmosphere and design characteristics have no significant effect, as the design characteristics have negative effect on the consumers experience. The practical contribution of research for the coffee shop owner are to maintains functional and social aspects as well as, improving aspects of design characteristics and atmosphere so that consumers gain experience when visiting.


Author(s):  
Carlos E. Galván-Tejada ◽  
Laura A. Zanella-Calzada ◽  
Karen E. Villagrana-Bañuelos ◽  
Arturo Moreno-Báez ◽  
Huizilopoztli Luna-García ◽  
...  

The Word Health Organization (WHO) declared in March 2020 that we are facing a pandemic designated as COVID-19, which is the acronym of coronavirus disease 2019, caused by a new virus know as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In Mexico, the first cases of COVID-19, was reported by the Secretary of Health on 28 February 2020. More than sixteen thousand cases and more than fifteen thousand deaths have been reported in Mexico, and it continues to rise; therefore, this article proposes two online visualization tools (a web platform) that allow the analysis of demographic data and comorbidities of the Mexican population. The objective of these tools is to provide graphic information, fast and updated, based on dataset obtained directly from National Governments Health Secretary (Secretaría de Salud, SSA) which is daily refreshed with the information related to SARS-CoV-2. To allow a dynamical update and friendly interface, and approach with R-project, a well-known Open Source language and environment for statistical computing and Shiny package, were implemented. The dataset is loaded automatically from the latest version released by the federal government of Mexico. Users can choose to study particular groups determined by gender, entity, type of result (positive, negative, pending outcome) and comorbidity. The image results are plots that can be instantly interpreted and supported by the text summary. This tool, in addition to being a consultation for the general public, is useful in Public Health to facilitate the visualization of the data, allowing its timely interpretation due to the changing nature of COVID-19, it can even be used for decision-making by leaders, for the benefit of the health of the community.


2017 ◽  
Vol 35 (0) ◽  
Author(s):  
R.M.A. ALVES ◽  
M.B. ALBUQUERQUE ◽  
L.G. BARBOSA

ABSTRACT The species of the Urochloa genus, exotic and infesting in Brazilian waters, are known to be invasive and dominant, occupying from humid, shallow areas, and irrigation canals to margins of deep reservoirs. This paper hypothesis that less depth reservoirs have higher infestation rate and higher biomass of U. arrecta. The objectives were to measure the percentage of occurrence of exotic macrophyte U. arrecta in 40 ecosystems from the Mamanguape basin (Paraíba, Brazil) and determine the infestation of the species in two reservoirs. The acquired data were geo-referenced with the ArcGIS software (v. 9.3). A covariance analysis was performed using the R program (The R project is Statistical Computing). The results showed large spatial distribution of the species, indicating that reservoirs may act as steppingstones in the landscape, in a regional scale. The hypothesis of biotic acceptance is seen as a relevant factor in explaining the presence of the species with low percentage of occurrence in 37 out of the 40 sampled ecosystems, being observed only in areas prone to the colonization of native and naturalized macrophytes, in banks and points of lower declivity, in both spatial scales studied. Thus, factors such as environmental instability (promoted by intermittent or prolonged desiccation of the habitat), shadowing and declivity of the reservoirs synergistically acted on exotic and native species.


2020 ◽  
pp. 155868982093788
Author(s):  
Kirstie L. Bash ◽  
Michelle C. Howell Smith ◽  
Pam S. Trantham

The use of advanced quantitative methods within mixed methods research has been investigated in a limited capacity. In particular, hierarchical linear models are a popular approach to account for multilevel data, such as students within schools, but its use and value as the quantitative strand in a mixed methods study remains unknown. This article examines the role of hierarchical linear modeling in mixed methods research with emphasis on design choice, priority, and rationales. The results from this systematic methodological review suggest that hierarchical linear modeling does not overshadow the contributions of the qualitative strand. Our study contributes to the field of mixed methods research by offering recommendations for the use of hierarchical linear modeling as the quantitative strand in mixed methods studies.


2016 ◽  
Vol 21 (3) ◽  
Author(s):  
Andrés L. Cárdenas Rozo ◽  
Peter J. Harries

Cárdenas Rozo AL, Harries PJ. Planktic foraminiferal diversity: logistic growth overprinted by a varying environment. Acta biol. Colomb. 2016;21(3):501-508. The statistical analyses, were done using R (The R Project for Statistical Computing, www.r-project.org). This appendix includes: Supplementary data Supplementary methods Tables 1 to 11 Figures 1 to 4 Supplementary references


2019 ◽  
Vol 2 (2) ◽  
pp. 169-187 ◽  
Author(s):  
Ruben C. Arslan

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort that would be necessary to document existing data well in other duties, such as writing and collecting more data. Codebooks therefore tend to be unstandardized and stored in proprietary formats, and they are rarely properly indexed in search engines. This means that rich data sets are sometimes used only once—by their creators—and left to disappear into oblivion. Even if they can find an existing data set, researchers are unlikely to publish analyses based on it if they cannot be confident that they understand it well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. It uses metadata from existing sources and automates some tedious tasks, such as documenting psychological scales and reliabilities, summarizing descriptive statistics, and identifying patterns of missingness. The codebook R package and Web app make it possible to generate a rich codebook in a few minutes and just three clicks. Over time, its use could lead to psychological data becoming findable, accessible, interoperable, and reusable, thereby reducing research waste and benefiting both its users and the scientific community as a whole.


Sign in / Sign up

Export Citation Format

Share Document