scholarly journals Use ggbreak to Effectively Utilize Plotting Space to Deal With Large Datasets and Outliers

2021 ◽  
Vol 12 ◽  
Author(s):  
Shuangbin Xu ◽  
Meijun Chen ◽  
Tingze Feng ◽  
Li Zhan ◽  
Lang Zhou ◽  
...  

With the rapid increase of large-scale datasets, biomedical data visualization is facing challenges. The data may be large, have different orders of magnitude, contain extreme values, and the data distribution is not clear. Here we present an R package ggbreak that allows users to create broken axes using ggplot2 syntax. It can effectively use the plotting area to deal with large datasets (especially for long sequential data), data with different magnitudes, and contain outliers. The ggbreak package increases the available visual space for a better presentation of the data and detailed annotation, thus improves our ability to interpret the data. The ggbreak package is fully compatible with ggplot2 and it is easy to superpose additional layers and applies scale and theme to adjust the plot using the ggplot2 syntax. The ggbreak package is open-source software released under the Artistic-2.0 license, and it is freely available on CRAN (https://CRAN.R-project.org/package=ggbreak) and Github (https://github.com/YuLab-SMU/ggbreak).

2020 ◽  
Author(s):  
Boris Leroy ◽  
Andrew M Kramer ◽  
Anne-Charlotte Vaissière ◽  
Franck Courchamp ◽  
Christophe Diagne

Aim: Large-scale datasets are becoming increasingly available for macroecological research from different disciplines. However, learning their specific extraction and analytical requirements can become prohibitively time-consuming for researchers. We argue that this issue can be tackled with the provision of methodological frameworks published in open-source software. We illustrate this solution with the invacost R package, an open-source software designed to query and analyse the global database on reported economic costs of invasive alien species, InvaCost. Innovations: First, the invacost package provides updates of this dynamic database directly in the analytical environment R. Second, it helps understand the nature of economic cost data for invasive species, their harmonisation process, and the inherent biases associated with such data. Third, it readily provides complementary methods to query and analyse the costs of invasive species at the global scale, all the while accounting for econometric statistical issues. Main conclusions: This tool will be useful for scientists working on invasive alien species, by (i) facilitating access and use to this multi-disciplinary data resource and (ii) providing a standard procedure which will facilitate reproducibility and comparability of studies, one of the major critics of this topic until now. We discuss how the development of this R package was designed as an enforcement of general recommendations for transparency, reproducibility and comparability of science in the era of big data in ecology.


2020 ◽  
Author(s):  
Arsenij Ustjanzew ◽  
Jens Preussner ◽  
Mette Bentsen ◽  
Carsten Kuenne ◽  
Mario Looso

AbstractData visualization and interactive data exploration are important aspects of illustrating complex concepts and results from analyses of omics data. A suitable visualization has to be intuitive and accessible. Web-based dashboards have become popular tools for the arrangement, consolidation and display of such visualizations. However, the combination of automated data processing pipelines handling omics data and dynamically generated, interactive dashboards is poorly solved. Here, we present i2dash, an R package intended to encapsulate functionality for programmatic creation of customized dashboards. It supports interactive and responsive (linked) visualizations across a set of predefined graphical layouts. i2dash addresses the needs of data analysts for a tool that is compatible and attachable to any R-based analysis pipeline, thereby fostering the separation of data visualization on one hand and data analysis tasks on the other hand. In addition, the generic design of i2dash enables data analysts to generate modular extensions for specific needs. As a proof of principle, we provide an extension of i2dash optimized for single-cell RNA-sequencing analysis, supporting the creation of dashboards for the visualization needs of single-cell sequencing experiments. Equipped with these features, i2dash is suitable for extensive use in large scale sequencing/bioinformatics facilities. Along this line, we provide i2dash as a containerized solution, enabling a straightforward large-scale deployment and sharing of dashboards using cloud services.i2dash is freely available via the R package archive CRAN.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zachary B. Abrams ◽  
Dwayne G. Tally ◽  
Lin Zhang ◽  
Caitlin E. Coombes ◽  
Philip R. O. Payne ◽  
...  

Abstract Background There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers. Results In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database. Conclusion Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Mohammadreza Yaghoobi ◽  
Krzysztof S. Stopka ◽  
Aaditya Lakshmanan ◽  
Veera Sundararaghavan ◽  
John E. Allison ◽  
...  

AbstractThe PRISMS-Fatigue open-source framework for simulation-based analysis of microstructural influences on fatigue resistance for polycrystalline metals and alloys is presented here. The framework uses the crystal plasticity finite element method as its microstructure analysis tool and provides a highly efficient, scalable, flexible, and easy-to-use ICME community platform. The PRISMS-Fatigue framework is linked to different open-source software to instantiate microstructures, compute the material response, and assess fatigue indicator parameters. The performance of PRISMS-Fatigue is benchmarked against a similar framework implemented using ABAQUS. Results indicate that the multilevel parallelism scheme of PRISMS-Fatigue is more efficient and scalable than ABAQUS for large-scale fatigue simulations. The performance and flexibility of this framework is demonstrated with various examples that assess the driving force for fatigue crack formation of microstructures with different crystallographic textures, grain morphologies, and grain numbers, and under different multiaxial strain states, strain magnitudes, and boundary conditions.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 602-602
Author(s):  
Oliver Huxhold ◽  
Svenja Spuling ◽  
Susanne Wurm

Abstract In recent years many studies have shown that adults with more positive self-perceptions of aging (SPA) increase their likelihood of aging healthily. Other studies have documented historical changes in individual resources and contextual conditions associated with aging. We explored how these historical changes are reflected in birth-cohort differences in aging trajectories of two aspects of SPA – viewing aging as ongoing development or as increasing physical losses. Using large-scale cohort-sequential data assessed across 21 years (N ≈ 19,000), the analyses modeled birth-cohort differences in aging trajectories of SPA from 40 to 85 years of age. The results illustrated differential birth-cohort differences: Later-born cohorts may experience more potential for ongoing development with advancing age than earlier-born cohorts. However, later-born cohorts seem to view their own aging as more negative than earlier-born cohorts during their early forties but may associate their aging less with physical losses after the age of fifty.


2021 ◽  
Vol 39 ◽  
pp. 100284
Author(s):  
Joseph Molloy ◽  
Felix Becker ◽  
Basil Schmid ◽  
Kay W. Axhausen

Sign in / Sign up

Export Citation Format

Share Document