scholarly journals Executing native Java code in R: an approach based on a local server

2020 ◽  
Vol 6 ◽  
pp. e300
Author(s):  
Mathieu Fortin

The R language is widely used for data analysis. However, it does not allow for complex object-oriented implementation and it tends to be slower than other languages such as Java, C and C++. Consequently, it can be more computationally efficient to run native Java code in R. To do this, there exist at least two approaches. One is based on the Java Native Interface (JNI) and it has been successfully implemented in the rJava package. An alternative approach consists of running a local server in Java and linking it to an R environment through a socket connection. This alternative approach has been implemented in an R package called J4R. This article shows how this approach makes it possible to simplify the calls to Java methods and to integrate the R vectorization. The downside is a loss of performance. However, if the vectorization is used in conjunction with multithreading, this loss of performance can be compensated for.

2017 ◽  
Author(s):  
Thomas Quinn ◽  
Mark F. Richardson ◽  
David Lovell ◽  
Tamsyn Crowley

AbstractIn the life sciences, many assays measure only the relative abundances of components for each sample. These data, called compositional data, require special handling in order to avoid misleading conclusions. For example, in the case of correlation, treating relative data like absolute data can lead to the discovery of falsely positive associations. Recently, researchers have proposed proportionality as a valid alternative to correlation for calculating pairwise association in relative data. Although the question of how to best measure proportionality remains open, we present here a computationally efficient R package that implements two proposed measures of proportionality. In an effort to advance the understanding and application of proportionality analysis, we review the mathematics behind proportionality, demonstrate its application to genomic data, and discuss some ongoing challenges in the analysis of relative abundance data.


2015 ◽  
Author(s):  
Fabien Campagne ◽  
William ER Digan ◽  
Manuele Simi

AbstractData analysis tools have become essential to the study of biology. Here, we applied language workbench technology (LWT) to create data analysis languages tailored for biologists with a diverse range of experience: from beginners with no programming experience to expert bioinformaticians and statisticians. A key novelty of our approach is its ability to blend user interface with scripting in a single platform. This feature helps beginners and experts alike analyze data more productively. This new approach has several advantages over state of the art approaches currently popular for data analysis: experts can design simplified data analysis languages that require no programming experience, and behave like graphical user interfaces, yet have the advantages of scripting. We report on such a simple language, called MetaR, which we have used to teach complete beginners how to call differentially expressed genes and build heatmaps. We found that beginners can complete this task in less than 2 hours with MetaR, when more traditional teaching with R and its packages would require several training sessions (6-24hrs). Furthermore, MetaR seamlessly integrates with docker to enable reproducibility of analyses and simplified R package installations during training sessions. We used the same approach to develop the first composable R language. A composable language is a language that can be extended with micro-languages. We illustrate this capability with a Biomart micro-language designed to compose with R and help R programmers query Biomart interactively to assemble specific queries to retrieve data, (The same micro-language also composes with MetaR to help beginners query Biomart.) Our teaching experience suggests that language design with LWT can be a compelling approach for developing intelligent data analysis tools and can accelerate training for common data analysis task. LWT offers an interactive environment with the potential to promote exchanges between beginner and expert data analysts.


Author(s):  
Dafydd Evans

Mutual information quantifies the determinism that exists in a relationship between random variables, and thus plays an important role in exploratory data analysis. We investigate a class of non-parametric estimators for mutual information, based on the nearest neighbour structure of observations in both the joint and marginal spaces. Unless both marginal spaces are one-dimensional, we demonstrate that a well-known estimator of this type can be computationally expensive under certain conditions, and propose a computationally efficient alternative that has a time complexity of order ( N  log  N ) as the number of observations N →∞.


2018 ◽  
Author(s):  
Peter Bandura ◽  
Jozef Minár ◽  
Lucian Drăguţ

Results of a physically-based methodology to delineate morphometrical-morphostructural subdivision of the Western Carpathian region (Central Europe) from DEMs and their derivatives are presented. Previous suitability evaluation of an object-oriented methodology showed its potential in recognition of morphostructural features. In this study we moved towards a more complex object-oriented approach – fusion of segmentation and classification on several hierarchical levels. In addition, physicallybased geomorphometric variables were used as input data, resulting in enhancement of subsequent morphotectonic interpretations. Decrease of local variance of the delineated objects in comparison with simple segmentations without these upgrades confirms the efficiency of our approach.


Author(s):  
Pedro M. Esperança ◽  
Dari F. Da ◽  
Ben Lambert ◽  
Roch K. Dabiré ◽  
Thomas S. Churcher

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.


Sign in / Sign up

Export Citation Format

Share Document