bddashboard: An infrastructure for biodiversity dashboards in R

The bdverse is a collection of packages that form a general framework for facilitating biodiversity science in R (programming language). Exploratory and diagnostic visualization can unveil hidden patterns and anomalies in data and allow quick and efficient exploration of massive datasets. The development of an interactive yet flexible dashboard that can be easily deployed locally or remotely is a highly valuable biodiversity informatics tool. To this end, we have developed 'bddashboard', which serves as an agile framework for biodiversity dashboard development. This project is built in R, using the Shiny package (RStudio, Inc 2021) that helps build interactive web apps in R. The following key components were developed: Core Interactive Components The basic building blocks of every dashboard are interactive plots, maps, and tables. We have explored all major visualization libraries in R and have concluded that 'plotly' (Sievert 2020) is the most mature and showcases the best value for effort. Additionally, we have concluded that 'leaflet' (Graul 2016) shows the most diverse and high-quality mapping features, and DT (DataTables library) (Xie et al. 2021) is best for rendering tabular data. Each component was modularized to better adjust it for biodiversity data and to enhance its flexibility. Field Selector The field selector is a unique module that makes each interactive component much more versatile. Users have different data and needs; thus, every combination or selection of fields can tell a different story. The field selector allows users to change the X and Y axis on plots, to choose the columns that are visible on a table, and to easily control map settings. All that in real-time, without reloading the page or disturbing the reactivity. The field selector automatically detects how many columns a plot needs and what type of columns can be passed to the X-axis or Y-axis. The field selector also displays the completeness of each field. Plot Navigation We developed the plot navigation module to prevent unwanted extreme cases. Technically, drawing 1,000 bars on a single bar plot is possible, but this visualization is not human-friendly. Navigation allows users to decide how many values they want to see on a single plot. This technique allows for fast drawing of extensive datasets without affecting page reactivity, dramatically improving performance and functioning as a fail-safe mechanism. Reactivity Reactivity creates the connection between different components. The changes in input values automatically flow to the plots, text, maps, and tables that use the input, and cause them to update. Reactivity facilitates drilling down functionality, which enhances the user’s ability to explore and investigate the data. We developed a novel and robust reactivity technique that allows us to add a new component and effectively connect it with all existing components within a dashboard tab, using only one line of code. Generic Biodiversity Tabs We developed five useful dashboard tabs (Fig. 1): (i) the Data Summary tab to give a quick overview of a dataset; (ii) the Data Completeness tab helps users get valuable information about missing records and missing Darwin Core fields; (iii) the Spatial tab is dedicated to spatial visualizations; (iv) the Taxonomic tab is designed to visualize taxonomy; and (v) the Temporal tab is designed to visualize time-related aspects. Performance and Agility To make a dashboard work smoothly and react quickly, hundreds of small and large modules, functions, and techniques must work together. Our goal was to minimize dashboard latency and maximize its data capacity. We used asynchronous modules to write non-blocking code, clusters in map components, and preprocessing and filtering data before passing it to plots to reduce the load. The 'bddashboard' package modularized architecture allows us to develop completely different interactive and reactive dashboards within mere minutes.

Download Full-text

SciKit-GStat: A scipy flavored geostatistical analysis toolbox written in Python

10.5194/egusphere-egu2020-6678 ◽

2020 ◽

Author(s):

Mirko Mälicke

Keyword(s):

Programming Languages ◽

Programming Language ◽

Building Blocks ◽

Geostatistical Analysis ◽

Geostatistical Methods ◽

R Programming Language ◽

The Past ◽

Spatio Temporal ◽

R Programming ◽

Python Programming

Geostatistical and spatio-temporal methods and applications have made major advances during the past decades. New data sources became available and more powerful and available computer systems fostered the development of more sophisticated analysis frameworks. However, the building blocks for these developments, geostatistical packages available in a multitude of programming languages, have not experienced the same attention. Although there are some examples, like the gstat package available for the R programming language, that are used as a de-facto standard for geostatistical analysis, many languages are still missing such implementations. During the past decade, the Python programming language has gained a lot of visibility and became an integral part of many geoscientist&#8217;s tool belts. Unfortunately, Python is missing a standard library for geostatistics. This leads to a new technical implementation of geostatistical methods with almost any new publication that uses Python. Thus, reproducing results and reusing codes is often cumbersome and can be error-prone.During the past three years I developed scikit-gstat, a scipy flavored geostatistical toolbox written in Python to tackle these challenges. Scipy flavored means, that it uses classes, interfaces and implementation rules from the very popular scipy package for scientific Python, to make scikit-gstat fit into existing analysis workflows as seamlessly as possible. Scikit-gstat is open source and hosted on Github. It is well documented and well covered by unit tests. The tutorials made available along with the code are styled as lecture notes and are open to everyone. The package is extensible, to make it as easy as possible for other researchers to build new models on top, even without experience in Python. Additionally, scikit-gstat has an interface to the scikit-learn package, which makes it usable in existing data analysis workflows that involve machine learning. During the development of scikit-gstat a few other geostatistical packages evolved, namely pykrige for Kriging and gstools mainly for geostatistical simulations and random field generations. Due to overlap and to reduce development efforts, the author has made effort to implement interfaces to these libraries. This way, scikit-gstat understands other developments not as competing solutions, but as parts of an evolving geostatistical framework in Python that should be more streamlined in the future.

Download Full-text

FuzzyR: An Extended Fuzzy Logic Toolbox for the R Programming Language

2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz48607.2020.9177780 ◽

2020 ◽

Author(s):

Chao Chen ◽

Tajul Rosli Razak ◽

Jonathan M. Garibaldi

Keyword(s):

Fuzzy Logic ◽

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Overexpression of S-adenosylmethionine decarboxylase (SAMDC) in Xeno-pus embryos activates maternal program of apoptosis as a “fail-safe” mechanism of early embryogenesis

Cell Research ◽

10.1038/sj.cr.7290159 ◽

2003 ◽

Vol 13 (3) ◽

pp. 147-158 ◽

Cited By ~ 10

Author(s):

Masatake KAI ◽

Chikara KAITO ◽

Hiroshi FUKAMACHI ◽

Takayasu HIGO ◽

Eiji TA-KAYAMA ◽

...

Keyword(s):

Early Embryogenesis ◽

Adenosylmethionine Decarboxylase ◽

Safe Mechanism ◽

Fail Safe

Download Full-text

Understanding the Behavior of Zadeh’s Extension Principle for One-to-One Functions by R Programming Language

Advances in Intelligent Systems and Computing - Intelligent and Fuzzy Techniques: Smart and Innovative Solutions ◽

10.1007/978-3-030-51156-2_153 ◽

2020 ◽

pp. 1309-1315

Author(s):

Abbas Parchami ◽

Parisa Khalilpoor

Keyword(s):

Programming Language ◽

Extension Principle ◽

R Programming Language ◽

One To One ◽

R Programming

Download Full-text

Introduction to the R Programming Language

Handbook of Educational Measurement and Psychometrics Using R ◽

10.1201/b20498-1 ◽

2018 ◽

pp. 1-29

Author(s):

Christopher D. Desjardins ◽

Okan Bulut

Keyword(s):

Programming Language ◽

R Programming Language ◽

R Programming

Download Full-text

Microrisk Lab: an online freeware for predictive microbiology

10.1101/2020.07.23.218909 ◽

2020 ◽

Author(s):

Yangtai Liu ◽

Xiang Wang ◽

Baolin Liu ◽

Qingli Dong

Keyword(s):

Parameter Estimation ◽

Bacterial Growth ◽

Model Simulation ◽

Parameter Determination ◽

Predictive Microbiology ◽

Isothermal Conditions ◽

R Programming Language ◽

Behavior Simulation ◽

R Programming ◽

And Behavior

AbstractMicrorisk Lab was designed as an interactive modeling freeware to realize parameter estimation and model simulation in predictive microbiology. This tool was developed based on the R programming language and ‘Shinyapps.io’ server, and designed as a fully responsive interface to the internet-connected devices. A total of 36 peer-reviewed models were integrated for parameter estimation (including primary models of bacterial growth/ inactivation under static and non-isothermal conditions, secondary models of specific growth rate, and competition models of two-flora growth) and model simulation (including integrated models of deterministic or stochastic bacterial growth/ inactivation under static and non-isothermal conditions) in Microrisk Lab. Each modeling section was designed to provide numerical and graphical results with comprehensive statistical indicators depending on the appropriate dataset and/ or parameter setting. In this research, six case studies were reproduced in Microrisk Lab and compared in parallel to DMFit, GInaFiT, IPMP 2013/ GraphPad Prism, Bioinactivation FE, and @Risk, respectively. The estimated and simulated results demonstrated that the performance of Microrisk Lab was statistically equivalent to that of other existing modeling system in most cases. Microrisk Lab allowed for uniform user experience to implement microbial predictive modeling by its friendly interfaces, high-integration, and interconnectivity. It might become a useful tool for the microbial parameter determination and behavior simulation. Non-commercial users could freely access this application at https://microrisklab.shinyapps.io/english/.

Download Full-text

Developing codes for validation of PM10, PM2.5, and O3 datasets using R programming language

Journal of Air Pollution and Health ◽

10.18502/japh.v4i1.604 ◽

2019 ◽

Author(s):

Ramin Nabizadeh ◽

Mostafa Hadei

Keyword(s):

Air Pollution ◽

Programming Language ◽

Assessment Method ◽

Daily Maximum ◽

Data Handling ◽

R Programming Language ◽

Wide Range ◽

Us Epa ◽

R Programming ◽

Pm 10

Introduction: The wide range of studies on air pollution requires accurate and reliable datasets. However, due to many reasons, the measured concentra-tions may be incomplete or biased. The development of an easy-to-use and reproducible exposure assessment method is required for researchers. There-fore, in this article, we describe and present a series of codes written in R Programming Language for data handling, validating and averaging of PM10, PM2.5, and O3 datasets. Findings: These codes can be used in any types of air pollution studies that seek for PM and ozone concentrations that are indicator of real concentra-tions. We used and combined criteria from several guidelines proposed by US EPA and APHEKOM project to obtain an acceptable methodology. Separate .csv files for PM 10, PM 2.5 and O3 should be prepared as input file. After the file was imported to the R Programming software, first, negative and zero values of concentrations within all the dataset will be removed. Then, only monitors will be selected that have at least 75% of hourly concentrations. Then, 24-h averages and daily maximum of 8-h moving averages will be calculated for PM and ozone, respectively. For output, the codes create two different sets of data. One contains the hourly concentrations of the interest pollutant (PM10, PM2.5, or O3) in valid stations and their average at city level. Another is the final 24-h averages of city for PM10 and PM2.5 or the final daily maximum 8-h averages of city for O3. Conclusion: These validated codes use a reliable and valid methodology, and eliminate the possibility of wrong or mistaken data handling and averaging. The use of these codes are free and without any limitation, only after the cita-tion to this article.

Download Full-text

Spatial variation of physicochemical parameters in a constructed wetland for wastewater treatment: An example of the use of the R programming language

UNED Research Journal ◽

10.22458/urj.v13i1.3294 ◽

2021 ◽

Vol 13 (1) ◽

pp. 15

Author(s):

Junior Pastor Pérez-Molina ◽

Carola Scholz ◽

Roy Pérez-Salazar ◽

Carolina Alfaro-Chinchilla ◽

Ana Abarca Méndez ◽

...

Keyword(s):

Wastewater Treatment ◽

Spatial Variation ◽

Water Flow ◽

Programming Language ◽

Constructed Wetland ◽

Physicochemical Parameters ◽

Preferential Flow ◽

Oxygen Demand ◽

R Programming Language ◽

R Programming

Introduction: The implementation of wastewater treatment systems such as constructed wetlands has a growing interest in the last decade due to its low cost and high effectiveness in treating industrial and residential wastewater. Objective: To evaluate the spatial variation of physicochemical parameters in a constructed wetland system of sub-superficial flow of Pennisetum alopecuroides (Pennisetum) and a Control (unplanted). The purpose is to provide an analysis of spatial dynamic of physicochemical parameters using R programming language. Methods: Each of the cells (Pennisetum and Control) had 12 piezometers, organized in three columns and four rows with a separation distance of 3,25m and 4,35m, respectively. The turbidity, biochemical oxygen demand (BOD), chemical oxygen demand (COD), total Kjeldahl nitrogen (TKN), ammoniacal nitrogen (N-NH4), organic nitrogen (N-org.) and phosphorous (P-PO4-3) were measured in water under in-flow and out-flow of both conditions Control and Pennisetum (n= 8). Additionally, the oxidation-reduction potential (ORP), dissolved oxygen (DO), conductivity, pH and water temperature, were measured (n= 167) in the piezometers. Results: No statistically significant differences between cells for TKN, N-NH4, conductivity, turbidity, BOD, and COD were found; but both Control and Pennisetum cells showed a significant reduction in these parameters (P<0,05). Overall, TKN and N-NH4 removal were from 65,8 to 84,1% and 67,5 to 90,8%, respectively; and decrease in turbidity, conductivity, BOD, and COD, were between 95,1-95,4%; 15-22,4%; 65,2-77,9% and 57,4-60,3% respectively. Both cells showed ORP increasing gradient along the water-flow direction, contrary to conductivity (p<0,05). However, OD, pH and temperature were inconsistent in the direction of the water flow in both cells. Conclusions: Pennisetum demonstrated pollutant removal efficiency, but presented results similar to the control cells, therefore, remains unclear if it is a superior option or not. Spatial variation analysis did not reflect any obstruction of flow along the CWs; but some preferential flow paths can be distinguished. An open-source repository of R was provided.

Download Full-text

Structure of the ATP synthase from Mycobacterium smegmatis provides targets for treating tuberculosis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2111899118 ◽

2021 ◽

Vol 118 (47) ◽

pp. e2111899118

Author(s):

Martin G. Montgomery ◽

Jessica Petri ◽

Tobias E. Spikes ◽

John E. Walker

Keyword(s):

Atp Synthase ◽

Mycobacterium Smegmatis ◽

Atp Hydrolysis ◽

Terminal Region ◽

Catalytic Cycle ◽

Α Subunit ◽

Antitubercular Drugs ◽

Safe Mechanism ◽

Fail Safe ◽

Α Subunits

The structure has been determined by electron cryomicroscopy of the adenosine triphosphate (ATP) synthase from Mycobacterium smegmatis. This analysis confirms features in a prior description of the structure of the enzyme, but it also describes other highly significant attributes not recognized before that are crucial for understanding the mechanism and regulation of the mycobacterial enzyme. First, we resolved not only the three main states in the catalytic cycle described before but also eight substates that portray structural and mechanistic changes occurring during a 360° catalytic cycle. Second, a mechanism of auto-inhibition of ATP hydrolysis involves not only the engagement of the C-terminal region of an α-subunit in a loop in the γ-subunit, as proposed before, but also a “fail-safe” mechanism involving the b′-subunit in the peripheral stalk that enhances engagement. A third unreported characteristic is that the fused bδ-subunit contains a duplicated domain in its N-terminal region where the two copies of the domain participate in similar modes of attachment of the two of three N-terminal regions of the α-subunits. The auto-inhibitory plus the associated “fail-safe” mechanisms and the modes of attachment of the α-subunits provide targets for development of innovative antitubercular drugs. The structure also provides support for an observation made in the bovine ATP synthase that the transmembrane proton-motive force that provides the energy to drive the rotary mechanism is delivered directly and tangentially to the rotor via a Grotthuss water chain in a polar L-shaped tunnel.

Download Full-text

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Frontiers in Pharmacology ◽

10.3389/fphar.2020.602030 ◽

2020 ◽

Vol 11 ◽

Author(s):

Maria-Theodora Pandi ◽

Peter J. van der Spek ◽

Maria Koromina ◽

George P. Patrinos

Keyword(s):

Text Mining ◽

Generalized Linear Models ◽

Linear Models ◽

Biomedical Literature ◽

Linear Kernel ◽

R Programming Language ◽

Research Areas ◽

Text Classifiers ◽

R Programming ◽

Further Development

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.

Download Full-text