r programming language
Recently Published Documents


TOTAL DOCUMENTS

148
(FIVE YEARS 100)

H-INDEX

8
(FIVE YEARS 3)

Author(s):  
Devon DeRaad

Here I describe the novel R package SNPfiltR and demonstrate its functionalities as the backbone of a customizable, reproducible SNP filtering pipeline implemented exclusively via the widely adopted R programming language. SNPfiltR extends existing SNP filtering functionalities by automating the visualization of key parameters such as depth, quality, and missing data, then allowing users to set filters based on optimized thresholds, all within a single, cohesive working environment. All SNPfiltR functions require a vcfR object as input, which can be easily generated by reading a SNP dataset stored as a standard vcf file into an R working environment using the function read.vcfR() from the R package vcfR. Performance benchmarking reveals that for moderately sized SNP datasets (up to 50M genotypes with associated quality information), SNPfiltR performs filtering with comparable efficiency to current state of the art command-line-based programs. These benchmarking results indicate that for most reduced-representation genomic datasets, SNPfiltR is an ideal choice for investigating, visualizing, and filtering SNPs as part of a cohesive and easily documentable bioinformatic pipeline. The SNPfiltR package can be downloaded from CRAN with the command [install.packages(“SNPfiltR”)], and a development version is available from GitHub at: (github.com/DevonDeRaad/SNPfiltR). Additionally, thorough documentation for SNPfiltR, including multiple comprehensive vignettes, is available at the website: (devonderaad.github.io/SNPfiltR/).


2021 ◽  
Vol 11 (12) ◽  
pp. 794
Author(s):  
Musa Adekunle Ayanwale ◽  
Mdutshekelwa Ndlovu

This study investigated the scalability of a cognitive multiple-choice test through the Mokken package in the R programming language for statistical computing. A 2019 mathematics West African Examinations Council (WAEC) instrument was used to gather data from randomly drawn K-12 participants (N = 2866; Male = 1232; Female = 1634; Mean age = 16.5 years) in Education District I, Lagos State, Nigeria. The results showed that the monotone homogeneity model (MHM) was consistent with the empirical dataset. However, it was observed that the test could not be scaled unidimensionally due to the low scalability of some items. In addition, the test discriminated well and had low accuracy for item-invariant ordering (IIO). Thus, items seriously violated the IIO property and scalability criteria when the HT coefficient was estimated. Consequently, the test requires modification in order to provide monotonic characteristics. This has implications for public examining bodies when endeavouring to assess the IIO assumption of their items in order to boost the validity of testing.


2021 ◽  
Vol 20 (9) ◽  
pp. 55-67
Author(s):  
Timofey V. Timkin

The paper deals with the phonetics of Yugan idiom of Surgut Khanty. The research is a part of the project aimed at describing Surgut Khanty phonetics. The Yugan idiom has significant differences from the Tromyegan idiom described before. The analysis is based on the data collected during the expedition to the settlement Ugut in 2019. The experimental part includes 130 words list read out three times by four native speakers from different traditional settlements on the Malyi Yugan river and on the Bolshoi Yugan river. The research was conducted using experimental techniques: Praat, Emu-SDMS software. The main technique was a formant analysis that deals with resonant frequencies in vowel spectra to obtain data on articulation features. Statistical evaluations and visualization were established via R programming language. We found differences between the Malyi Yugan river and the Bol’shoi Yugan river idioms. 12 vowel phonemes were found in the Malyi Yugan idiom. Compared to the Tromyegan system the phoneme /ɔ/ (traditionally /ȯ̆/) is absent. It was replaced by /ɛ/ (traditionally /ȧ̆/) or /o/ (traditionally /ŏ/). The phoneme u̇ described in previous literature on the topic disappeared and was replaced by /iː/. The Bolshoi Yugan vowel system includes these phonemes and also diphthongs [ui], [ɔɛ]. They appear after [k] where etimological u̇, ȯ̆ used to be. They probably are the realizations of the phonemes /iː/, /ɛ/ in the position after labialized k, which has become a phoneme. Non-initial [w] is reported to be specific Jugan feature and appears to have parallels in Tromyegan idiom too. It is an evidence for the rearranging of the Surgut idioms. In this pronunciation type /w/ is realized as a labial approximant in an initial position and after not-rounded vowels in a non-final position. After not-rounded vowels in a final position it comes as an initial-voiced fricative evoking preceding vowel diphthongization. After rounded vowels it is labiovelar [γʷ] or non-syllabic [ʊ] (before consonants). This pronunciation type is similar to the Tromyegan type, but it differs from the Pim type where /w/ comes as a labial approximant consistently. The disappearance of labial fricatives is a new phenomenon which has not been described properly. Territorial and social factors for this process are given. The Malyi Yugan speakers use lateral fricatives /ł/, /ʎ̥/ and the Bolshoi Yugan speakers replace it by /t/, /c͡c̦/. In the settlement Ugut where Bolshoi and Malyi Yugan natives contact in Russian-spoken environment both variants are used with t-pronunciation evaluated by speakers as new and declining from the ‘'right’ speech.


2021 ◽  
Vol 2 (4) ◽  
pp. e410
Author(s):  
Julia Bahia Adams ◽  
Carlos Augusto Jardim Chiarelli

Social media platforms represent a deep resource for academic research and a wide range of untapped possibilities for linguists (D'ARCY; YOUNG, 2012). This rapidly developing field presents various ethical issues and unique challenges regarding methods to retrieve and analyze data. This tutorial provides a straightforward guide to harvesting and tidying Twitter data, focused mainly on the Tweets' text, by using the R programming language (R CORE TEAM, 2020) via Twitter's APIs. The R code was developed in Adams (2020), based on the rtweet package (KEARNEY, 2018), and successfully resulted in a script for corpora compilation. In this tutorial, we discuss limitations, problems, and solutions in our framework for conducting ethical research on this social networking site. Our ethical concerns go beyond what we "agree to" in terms of use and privacy policies, that is, we argue that their content does not contemplate all the concerns researchers need to attend to. Additionally, our aim is to show that using Twitter as a data source does not require advanced computational skills.


2021 ◽  
Author(s):  
Dele Fei ◽  
Yu Sun

This is a data science project for a manufacturing company in China [1]. The task was to forecast the likelihood that each product would need repair or service by a technician in order to forecast how often the products would need to be serviced after they were installed. That forecast could then be used to estimate the correct price for selling a product warranty [2]. The underlying forecast model in the R Programming language for all of the companies products is established. In addition, an interactive web app using R Shiny is developed so the business could see the forecast and recommended warranty price for each of their products and customer types [3]. The user can select a product and customer type and input the number of products and the web app displays charts and tables that show the probability of the product needing service over time, the forecasted costs of service, along with potential income and the recommended warranty price.


2021 ◽  
Vol 10 (4) ◽  
Author(s):  
Ben B. Chiewphasa ◽  
Anna K. Moeller

Objectives: As certified Carpentries instructors, the authors organized and co-taught the University of Montana’s first in-person Carpentries workshop focused on the R programming language during early 2020. Due to the COVID-19 pandemic, a repeated workshop was postponed to the fall of 2020 and was adapted for a fully online setting. The authors share their Carpentries journey from in-person to online instruction, hoping to inspire those interested in organizing Carpentries at their institution for the first time and those interested in improving their existing Carpentries presence. Methods: The authors reflected on their experience facilitating the same Carpentries workshop in-person and online. They used this unique opportunity to compare the effectiveness of a face-to-face environment versus a virtual modality for delivering an interactive workshop. Results: When teaching in the online setting, the authors learned to emphasize the basics, create many opportunities for feedback using formative assessments, reduce the amount of material presented, and include helpers who are familiar with technology and troubleshooting. Conclusions: Although the online environment came with challenges (i.e., Zoom logistics and challenges, the need to further condense curricula, etc.), the instructors were surprised at the many advantages of hosting an online workshop. With some adaptations, Carpentries workshops work well in online delivery.


2021 ◽  
Vol 2090 (1) ◽  
pp. 012002
Author(s):  
Marina A. Nikitina ◽  
Irina M. Chernukha

Abstract Information technologies of biotechnological processes are based on the use of mathematical models to describe microbiological synthesis. Application of digital technologies in analysis of microbial growth patterns is mainly determined by the ability of modern programming languages to numerically integrate systems of differential equations describing the development of the microbial process in time. In Jupyter Notebook environment in the R programming language, the solution of the kinetic growth model of the E.coli microbial population was shown. Two solution methods were used - the one-step Runge-Kutta method of the fourth order of accuracy and the universal solver ODE (General Solver for Ordinary Differential Equations). Initial data of the problem in question: K s S 0 = 2 (Ks is substrate affinity S 0 constant for the biomass (microorganism), S0 is initial concentration of substrate); replicating cells m a0 = 0.01; total number of cells m 0 = 0.05; stoichiometric ratio Ys = 0.5; various ratios 1) 1 ) λ μ m = 0.0357 ; 2 ) λ μ m = 0.0714 ; 3 ) λ μ m = 0.1071 ; 4 ) λ μ m = 0.1428 ; 5 ) λ μ m = 0.2142 (λ is specific growth rate of dividing cells, μm is inactivation rate constant). As a result, the simulation and verification of microbial biomass growth process - its visual representation in the form of tabular and graphical data were carried out. In the process of simulation of E.coli growth the following peculiarity was revealed. In addition to cell division, a fairly intensive loss of their ability to divide occurs. This process is supposedly determinant in population development and limits the growth and ultimate density of the culture. Thus, information technology will help the researcher not only in studying the process, establishing patterns and predicting results, but also in making reasoned decisions.


Author(s):  
Timothy E. Essington

Modern practice of ecology, conservation, and resource management demands unprecedented levels of quantitative proficiency in mathematical modeling and statistics. This text provides foundational training in the concepts and methods of mathematical and statistical modeling used in ecology, for readers with all levels of quantitative proficiency and confidence. The first chapter presents a generalized approach to develop ecological models and introduces the “describe, explain, and interpret” framework for linking the model world to the real world. Detailed treatment of population models illustrates the myriad ways in which one can develop a model, shows how modeling choices are informed by the ecological question at hand, and emphasizes the epistemology of quantitative techniques. The second part of the book illustrates how to estimate parameters of models from data, and how to use mathematical models combined with statistics to test hypotheses. The third part of the book is devoted to an in-depth development of technical skills to implement models in two common platforms: spreadsheets and the R programming language. The book concludes by demonstrating a quantitative approach to addressing a question that spans density-dependent versus density-independent population models, fitting models to data, evaluating the strength for density dependence using model selection, and evaluating the types of dynamic behaviors that the population might exhibit.


Sign in / Sign up

Export Citation Format

Share Document