Executing native Java code in R: an approach based on a local server

PeerJ Computer Science ◽

10.7717/peerj-cs.300 ◽

2020 ◽

Vol 6 ◽

pp. e300

Author(s):

Mathieu Fortin

Keyword(s):

Data Analysis ◽

Object Oriented ◽

R Package ◽

Complex Object ◽

Computationally Efficient ◽

R Language ◽

Alternative Approach ◽

Socket Connection ◽

Java Native Interface ◽

Java Code

The R language is widely used for data analysis. However, it does not allow for complex object-oriented implementation and it tends to be slower than other languages such as Java, C and C++. Consequently, it can be more computationally efficient to run native Java code in R. To do this, there exist at least two approaches. One is based on the Java Native Interface (JNI) and it has been successfully implemented in the rJava package. An alternative approach consists of running a local server in Java and linking it to an R environment through a socket connection. This alternative approach has been implemented in an R package called J4R. This article shows how this approach makes it possible to simplify the calls to Java methods and to integrate the R vectorization. The downside is a loss of performance. However, if the vectorization is used in conjunction with multithreading, this loss of performance can be compensated for.

Download Full-text

propr: An R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis

10.1101/104935 ◽

2017 ◽

Cited By ~ 4

Author(s):

Thomas Quinn ◽

Mark F. Richardson ◽

David Lovell ◽

Tamsyn Crowley

Keyword(s):

Data Analysis ◽

Relative Abundance ◽

Compositional Data ◽

Life Sciences ◽

Genomic Data ◽

R Package ◽

Compositional Data Analysis ◽

Computationally Efficient ◽

Abundance Data ◽

Relative Abundances

AbstractIn the life sciences, many assays measure only the relative abundances of components for each sample. These data, called compositional data, require special handling in order to avoid misleading conclusions. For example, in the case of correlation, treating relative data like absolute data can lead to the discovery of falsely positive associations. Recently, researchers have proposed proportionality as a valid alternative to correlation for calculating pairwise association in relative data. Although the question of how to best measure proportionality remains open, we present here a computationally efficient R package that implements two proposed measures of proportionality. In an effort to advance the understanding and application of proportionality analysis, we review the mathematics behind proportionality, demonstrate its application to genomic data, and discuss some ongoing challenges in the analysis of relative abundance data.

Download Full-text

MetaR: simple, high-level languages for data analysis with the R ecosystem

10.1101/030254 ◽

2015 ◽

Cited By ~ 2

Author(s):

Fabien Campagne ◽

William ER Digan ◽

Manuele Simi

Keyword(s):

Data Analysis ◽

User Interfaces ◽

Teaching Experience ◽

R Package ◽

Diverse Range ◽

R Language ◽

Analysis Tools ◽

Analysis Task ◽

Simple Language ◽

High Level

AbstractData analysis tools have become essential to the study of biology. Here, we applied language workbench technology (LWT) to create data analysis languages tailored for biologists with a diverse range of experience: from beginners with no programming experience to expert bioinformaticians and statisticians. A key novelty of our approach is its ability to blend user interface with scripting in a single platform. This feature helps beginners and experts alike analyze data more productively. This new approach has several advantages over state of the art approaches currently popular for data analysis: experts can design simplified data analysis languages that require no programming experience, and behave like graphical user interfaces, yet have the advantages of scripting. We report on such a simple language, called MetaR, which we have used to teach complete beginners how to call differentially expressed genes and build heatmaps. We found that beginners can complete this task in less than 2 hours with MetaR, when more traditional teaching with R and its packages would require several training sessions (6-24hrs). Furthermore, MetaR seamlessly integrates with docker to enable reproducibility of analyses and simplified R package installations during training sessions. We used the same approach to develop the first composable R language. A composable language is a language that can be extended with micro-languages. We illustrate this capability with a Biomart micro-language designed to compose with R and help R programmers query Biomart interactively to assemble specific queries to retrieve data, (The same micro-language also composes with MetaR to help beginners query Biomart.) Our teaching experience suggests that language design with LWT can be a compelling approach for developing intelligent data analysis tools and can accelerate training for common data analysis task. LWT offers an interactive environment with the potential to promote exchanges between beginner and expert data analysts.

Download Full-text

IRTree Model: An Alternative Approach for Self-Reported Ordinal Data Analysis

Korean Society for Educational Evaluation ◽

10.31158/jeev.2019.32.2.303 ◽

2019 ◽

Vol 32 (2) ◽

pp. 303-323

Author(s):

Yoonsun Jang ◽

Meereem Kim ◽

Juyeon Lee

Keyword(s):

Data Analysis ◽

Ordinal Data ◽

Alternative Approach ◽

Ordinal Data Analysis

Download Full-text

Discussion: Object-Oriented Data Analysis, Power Metrics, and Graph Laplacians

Journal of the American Statistical Association ◽

10.1080/01621459.2019.1635477 ◽

2019 ◽

Vol 114 (527) ◽

pp. 1097-1098

Author(s):

Ian L. Dryden ◽

Simon P. Preston ◽

Katie E. Severn

Keyword(s):

Data Analysis ◽

Object Oriented ◽

Graph Laplacians ◽

Object Oriented Data Analysis

Download Full-text

A computationally efficient estimator for mutual information

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2007.0196 ◽

2008 ◽

Vol 464 (2093) ◽

pp. 1203-1215 ◽

Cited By ~ 16

Author(s):

Dafydd Evans

Keyword(s):

Data Analysis ◽

Mutual Information ◽

Time Complexity ◽

Exploratory Data Analysis ◽

Nearest Neighbour ◽

Computationally Efficient ◽

One Dimensional ◽

Exploratory Data ◽

Efficient Alternative ◽

Computationally Expensive

Mutual information quantifies the determinism that exists in a relationship between random variables, and thus plays an important role in exploratory data analysis. We investigate a class of non-parametric estimators for mutual information, based on the nearest neighbour structure of observations in both the joint and marginal spaces. Unless both marginal spaces are one-dimensional, we demonstrate that a well-known estimator of this type can be computationally expensive under certain conditions, and propose a computationally efficient alternative that has a time complexity of order ( N log N ) as the number of observations N →∞.

Download Full-text

Physically-based segmentation of the Western Carpathians (Central Europe)

10.7287/peerj.preprints.27083v1 ◽

2018 ◽

Author(s):

Peter Bandura ◽

Jozef Minár ◽

Lucian Drăguţ

Keyword(s):

Central Europe ◽

Input Data ◽

Object Oriented ◽

Complex Object ◽

Suitability Evaluation ◽

Local Variance ◽

Carpathian Region ◽

Hierarchical Levels ◽

Physically Based ◽

Object Oriented Approach

Results of a physically-based methodology to delineate morphometrical-morphostructural subdivision of the Western Carpathian region (Central Europe) from DEMs and their derivatives are presented. Previous suitability evaluation of an object-oriented methodology showed its potential in recognition of morphostructural features. In this study we moved towards a more complex object-oriented approach – fusion of segmentation and classification on several hierarchical levels. In addition, physicallybased geomorphometric variables were used as input data, resulting in enhancement of subsequent morphotectonic interpretations. Decrease of local variance of the delineated objects in comparison with simple segmentations without these upgrades confirms the efficiency of our approach.

Download Full-text

archivist: An R Package for Managing, Recording and Restoring Data Analysis Results

Journal of Statistical Software ◽

10.18637/jss.v082.i11 ◽

2017 ◽

Vol 82 (11) ◽

Cited By ~ 4

Author(s):

Przemysaw Biecek ◽

Marcin Kosinski

Keyword(s):

Data Analysis ◽

R Package

Download Full-text

Functional data analysis techniques to improve the generalizability of near-infrared spectral data for monitoring mosquito populations

10.1101/2020.04.28.058495 ◽

2020 ◽

Cited By ~ 1

Author(s):

Pedro M. Esperança ◽

Dari F. Da ◽

Ben Lambert ◽

Roch K. Dabiré ◽

Thomas S. Churcher

Keyword(s):

Data Analysis ◽

Functional Data Analysis ◽

Functional Data ◽

Near Infrared ◽

R Package ◽

Mosquito Vector ◽

Modelling Framework ◽

Functional Representation ◽

Infrared Spectral ◽

Generalised Linear Modelling

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.

Download Full-text