scholarly journals ramr: an R package for detection of rare aberrantly methylated regions

2020 ◽  
Author(s):  
Oleksii Nikolaienko ◽  
Per Eystein Lønning ◽  
Stian Knappskog

AbstractMotivationWith recent advances in the field of epigenetics, the focus is widening from large and frequent disease- or phenotype-related methylation signatures to rare alterations transmitted mitotically or transgenerationally (constitutional epimutations). Merging evidence indicate that such constitutional alterations, albeit occurring at a low mosaic level, may confer risk of disease later in life. Given their inherently low incidence rate and mosaic nature, there is a need for bioinformatic tools specifically designed to analyse such events.ResultsWe have developed a method (ramr) to identify aberrantly methylated DNA regions (AMRs). ramr can be applied to methylation data obtained by array or next-generation sequencing techniques to discover AMRs being associated with elevated risk of cancer as well as other diseases. We assessed accuracy and performance metrics of ramr and confirmed its applicability for analysis of large public data sets. Using ramr we identified aberrantly methylated regions that are known or may potentially be associated with development of colorectal cancer and provided functional annotation of AMRs that arise at early developmental stages.Availability and implementationThe R package is freely available at https://github.com/BBCG/ramr

2020 ◽  
Author(s):  
Anna M. Sozanska ◽  
Charles Fletcher ◽  
Dóra Bihary ◽  
Shamith A. Samarajiwa

AbstractMore than three decades ago, the microarray revolution brought about high-throughput data generation capability to biology and medicine. Subsequently, the emergence of massively parallel sequencing technologies led to many big-data initiatives such as the human genome project and the encyclopedia of DNA elements (ENCODE) project. These, in combination with cheaper, faster massively parallel DNA sequencing capabilities, have democratised multi-omic (genomic, transcriptomic, translatomic and epigenomic) data generation leading to a data deluge in bio-medicine. While some of these data-sets are trapped in inaccessible silos, the vast majority of these data-sets are stored in public data resources and controlled access data repositories, enabling their wider use (or misuse). Currently, most peer reviewed publications require the deposition of the data-set associated with a study under consideration in one of these public data repositories. However, clunky and difficult to use interfaces, subpar or incomplete annotation prevent discovering, searching and filtering of these multi-omic data and hinder their re-purposing in other use cases. In addition, the proliferation of multitude of different data repositories, with partially redundant storage of similar data are yet another obstacle to their continued usefulness. Similarly, interfaces where annotation is spread across multiple web pages, use of accession identifiers with ambiguous and multiple interpretations and lack of good curation make these data-sets difficult to use. We have produced SpiderSeqR, an R package, whose main features include the integration between NCBI GEO and SRA databases, enabling an integrated unified search of SRA and GEO data-sets and associated annotations, conversion between database accessions, as well as convenient filtering of results and saving past queries for future use. All of the above features aim to promote data reuse to facilitate making new discoveries and maximising the potential of existing data-sets.Availabilityhttps://github.com/ss-lab-cancerunit/SpiderSeqR


Author(s):  
Michael Allen ◽  
Sebnem Baydere ◽  
Elena Gaura ◽  
Gurhan Kucuk

This chapter introduces a methodological approach to the evaluation of localization algorithms. The chapter contains a discussion of evaluation criteria and performance metrics followed by statistical/ empirical simulation models and parameters that affect the performance of the algorithms and hence their assessment. Two contrasting localization studies are presented and compared with reference to the evaluation criteria discussed throughout the chapter. The chapter concludes with a localization algorithm development cycle overview: from simulation to real deployment. The authors argue that algorithms should be simulated, emulated (on test beds or with empirical data sets) and subsequently implemented in hardware, in a realistic Wireless Sensor Network (WSN) deployment environment, as a complete test of their performance. It is hypothesised that establishing a common development and evaluation cycle for localization algorithms among researchers will lead to more realistic results and viable comparisons.


2020 ◽  
Author(s):  
Axel Lauer ◽  
Fernando Iglesias-Suarez ◽  
Veronika Eyring ◽  
the ESMValTool development team

<p>The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest version (2.0) of the ESMValTool has been developed as a large community effort to specifically target the increased data volume of the Coupled Model Intercomparison Project Phase 6 (CMIP6) and the related challenges posed by analysis and evaluation of output from multiple high-resolution and complex ESMs. For this, the core functionalities have been completely rewritten in order to take advantage of state-of-the-art computational libraries and methods to allow for efficient and user-friendly data processing. Common operations on the input data such as regridding or computation of multi-model statistics are now centralized in a highly optimized preprocessor written in Python. The diagnostic part of the ESMValTool includes a large collection of standard recipes for reproducing peer-reviewed analyses of many variables across atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation introduces the diagnostics newly implemented into ESMValTool v2.0 including an extended set of large-scale diagnostics for quasi-operational and comprehensive evaluation of ESMs, new diagnostics for extreme events, regional model and impact evaluation and analysis of ESMs, as well as diagnostics for emergent constraints and analysis of future projections from ESMs. The new diagnostics are illustrated with examples using results from the well-established CMIP5 and the newly available CMIP6 data sets.</p>


2021 ◽  
Author(s):  
Lisa Bock ◽  
Birgit Hassler ◽  
Axel Lauer ◽  

<p>The Earth System Model Evaluation Tool (ESMValTool) has been developed with the aim of taking model evaluation to the next level by facilitating analysis of many different ESM components, providing well-documented source code and scientific background of implemented diagnostics and metrics and allowing for traceability and reproducibility of results (provenance). This has been made possible by a lively and growing development community continuously improving the tool supported by multiple national and European projects. The latest major release (v2.0) of the ESMValTool has been officially introduced in August 2020 as a large community effort, and since then several additional smaller releases have followed.</p><p>The diagnostic part of the ESMValTool includes a large collection of standard “recipes” for reproducing peer-reviewed analyses of many variables across ESM compartments including atmosphere, ocean, and land domains, with diagnostics and performance metrics focusing on the mean-state, trends, variability and important processes, phenomena, as well as emergent constraints. While most of the diagnostics use observational data sets (in particular satellite and ground-based observations) or reanalysis products for model evaluation some are also based on model-to-model comparisons. This presentation gives an overview on the latest scientific diagnostics and metrics added during the last year including examples of applications of these diagnostics to CMIP6 model data.</p>


Author(s):  
Hong Xiong

The response rate and performance indicators of enterprise resource calls have become an important part of measuring the difference in enterprise user experience. An efficient corporate shared resource calling system can significantly improve the office efficiency of corporate users and significantly improve the fluency of corporate users' resource calling. Hadoop has powerful data integration and analysis capabilities in resource extraction, while R has excellent statistical capabilities and resource personalized decomposition and display capabilities in data calling. This article will propose an integration plan for enterprise shared resource invocation based on Hadoop and R to further improve the efficiency of enterprise users' shared resource utilization, improve the efficiency of system operation, and bring enterprise users a higher level of user experience. First, we use Hadoop to extract the corporate shared resources required by corporate users from the nearby resource storage computer room and terminal equipment to increase the call rate, and use the R function attribute to convert the user’s search results into linear correlations, according to the correlation The strong and weak principles are displayed in order to improve the corresponding speed and experience. This article proposes feasible solutions to the shortcomings in the current enterprise shared resource invocation. We can use public data sets to perform personalized regression analysis on user needs, and optimize and integrate most relevant information.


2021 ◽  
Vol 13 (13) ◽  
pp. 2451
Author(s):  
Huaiping Yan ◽  
Jun Wang ◽  
Lei Tang ◽  
Erlei Zhang ◽  
Kun Yan ◽  
...  

Most traditional hyperspectral image (HSI) classification methods relied on hand-crafted or shallow-based descriptors, which limits their applicability and performance. Recently, deep learning has gradually become the mainstream method of HSI classification, because it can automatically extract deep abstract features for classification. However, it remains a challenge to learn more meaningful features for HSI classification from a small training sample set. In this paper, a 3D cascaded spectral–spatial element attention network (3D-CSSEAN) is proposed to solve this issue. The 3D-CSSEAN integrates the spectral–spatial feature extraction and attention area extraction for HSI classification. Two element attention modules in the 3D-CSSEAN enable the deep network to focus on primary spectral features and meaningful spatial features. All attention modules are implemented though several simple activation operations and elementwise multiplication operations. In this way, the training parameters of the network are not added too much, which also makes the network structure suitable for small sample learning. The adopted module cascading pattern not only reduces the computational burden in the deep network but can also be easily operated via plug–expand–play. Experimental results on three public data sets show that the proposed 3D-CSSEAN achieved comparable performance with the state-of-the-art methods.


2016 ◽  
Vol 15 (2) ◽  
pp. 49-55
Author(s):  
Pala SuriyaKala ◽  
Ravi Aditya

Human resources is traditionally an area subject to measured changes but with Big data, data analytics, Human capital Management, Talent acquisition and performance metrics as new trends, there is bound to be a sea change in this function. This paper is conceptual and tries to introspect and outline the challenges that HRM faces in Big Data. Big Data is as one knows the world of enormous generation which is revolutionizing the world with data sets at exabytes. This has been the driving force behind how governments, companies and functions will come to perform in the decades to come. The immense amount of information if properly utilized can lead to efficiency in various fields like never before. But to do this the cloud of suspicion, fear and uncertainty regarding the use of Big Data has to be removed from those who can use it to the benefit of their respective areas of application.HR traditionally has never been very data centric in the analysis of its decisions unlike marketing, finance, etc.


2015 ◽  
Author(s):  
Matthew W. Pennell ◽  
Richard G. FitzJohn ◽  
William K. Cornwell

Biologists are increasingly using curated, public data sets to conduct phylogenetic comparative analyses. Unfortunately, there is often a mismatch between species for which there is phylogenetic data and those for which other data is available. As a result, researchers are commonly forced to either drop species from analyses entirely or else impute the missing data. Here we outline a simple solution to increase the overlap while avoiding potential the biases introduced by imputing data. If some external topological or taxonomic information is available, this can be used to maximize the overlap between the data and the phylogeny. We develop an algorithm that replaces a species lacking data with a species that has data. This swap can be made because for those two species, all phylogenetic relationships are exactly equivalent. We have implemented our method in a new R package phyndr, which will allow researchers to apply our algorithm to empirical data sets. It is relatively efficient such that taxon swaps can be quickly computed, even for large trees. To facilitate the use of taxonomic knowledge we created a separate data package taxonlookup; it contains a curated, versioned taxonomic lookup for land plants and is interoperable with phyndr. Emerging online databases and statistical advances are making it possible for researchers to investigate evolutionary questions at unprecedented scales. However, in this effort species mismatch among data sources will increasingly be a problem; evolutionary informatics tools, such as phyndr and taxonlookup, can help alleviate this issue.


2015 ◽  
Vol 24 (4) ◽  
pp. 467-477
Author(s):  
Idris Skloul Ibrahim ◽  
Peter J.B. King ◽  
Hans-Wolfgang Loidl

AbstractNs2 is an open-source communications network simulator primarily used in research and teaching. Ns2 provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless networks. Although Ns2 is a widely used powerful simulator, it lacks a way to measure networks that are used to assess reliability and performance metrics (e.g., the number of packets transferred from source to destination, delay in packets, packet loss, etc.) and it does not analyse the trace files it produces. The data obtained from the simulations are not straightforward to analyse. Ns2 is still unable to provide any data analysis statistics or graphics as requested. Moreover, the analysis of the Ns2 trace file using any software scripts requires further steps by a developer to do data processing and then produce graphical outputs. Lack of standardisation of tools means that results from different users may not be strictly comparable. There are alternative tools; however, most of them are not standalone applications, requiring some additional libraries. Also, they lack a user-friendly interface. This article presents the architecture and development considerations for the NsGTFA (Ns2 GUI Trace File Analyser) tool, which intends to simplify the management and enable the statistical analysis of trace files generated during network simulations. NsGTFA runs under Windows and has a friendly graphical user interface. This tool is a very fast standalone application implemented in VC++, taking as input an Ns2 trace file. It can output two-dimensional (2D) and 3D graphs (points, lines, and bar charts) or data sets, whatever the trace file format (Tagged, Old, or New). It is also possible to specify the output of standard network performance metrics. NsGTFA satisfies most user needs. There is no complex installation process, and no external libraries are needed.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11229
Author(s):  
Josymar Torrejón-Magallanes ◽  
Enrique Morales-Bojórquez ◽  
Francisco Arreguín-Sánchez

Natural mortality (M) is defined as the rate of loss that occurs in a fish stock due to natural (non-fishing) causes and can be influenced by density-dependent or density-independent factors. Different methods have been used to estimate M, one of these is the gnomonic approach. This method estimates M rates by dividing the life cycle of a species into subunits of time that increase as a constant proportion of the time elapsed from birth up to the initiation of each subdivision. In this study, an improved gnomonic approach is proposed to estimate natural mortality throughout different life stages in marine stocks using the gnomonicM package written in R software. This package was built to require data about (i) the number of gnomonic intervals, (ii) egg stage duration, (iii) longevity, and (iv) fecundity. With this information, it is possible to estimate the duration and natural mortality (Mi) of each gnomonic interval. The gnomonicM package uses a deterministic or stochastic approach, the latter of which assesses variability in M by assuming that the mean lifetime fecundity (MLF) is the main source of uncertainty. Additionally, the gnomonicM package allows the incorporation of auxiliary information related to the observed temporal durations of specific gnomonic intervals, which is useful for calibrating estimates of M vectors. The gnomonicM package, tested via deterministic and stochastic functions, was supported by the reproducibility and verification of the results obtained from different reports, thus guaranteeing its functionality, applicability, and performance in estimating M for different ontogenetic developmental stages. Based on the biological information of Pacific chub mackerel (Scomber japonicus), we presented a new case study to provide a comprehensive guide to data collection to obtain results and explain the details of the application of the gnomonicM package and avoid its misuse. This package could provide an alternative approach for estimating M and provide basic input data for ecological models, allowing the option of using estimates of variable natural mortality across different ages, mainly for life stages affected by fishing. The inputs for the gnomonicM packages are composed of numbers, vectors, or characters depending on whether the deterministic or stochastic approach is used, making the package quick, flexible, and easy to use; this allows users to focus on obtaining and interpreting results rather than the calculation process.


Sign in / Sign up

Export Citation Format

Share Document