scholarly journals PhenoMeNal: Processing and analysis of Metabolomics data in the Cloud

2018 ◽  
Author(s):  
Kristian Peters ◽  
James Bradbury ◽  
Sven Bergmann ◽  
Marco Capuccini ◽  
Marta Cascante ◽  
...  

AbstractBackgroundMetabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism’s metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent – and sometimes incompatible – analysis methods that are difficult to connect into a useful and complete data analysis solution.FindingsThe PhenoMeNal (Phenome and Metabolome aNalysis) e-infrastructure provides a complete, workflow-oriented, interoperable metabolomics data analysis solution for a modern infrastructure-as-a-service (IaaS) cloud platform. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project’s continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm.ConclusionsPhenoMeNal constitutes a keystone solution in cloud infrastructures available for metabolomics. It provides scientists with a ready-to-use, workflow-driven, reproducible and shareable data analysis platform harmonizing the software installation and configuration through user-friendly web interfaces. The deployed cloud environments can be dynamically scaled to enable large-scale analyses which are interfaced through standard data formats, versioned, and have been tested for reproducibility and interoperability. The flexible implementation of PhenoMeNal allows easy adaptation of the infrastructure to other application areas and ‘omics research domains.

2021 ◽  
Vol 16 (1) ◽  
pp. 21
Author(s):  
Chung-Yi Hou ◽  
Matthew S. Mayernik

For research data repositories, web interfaces are usually the primary, if not the only, method that data users have to interact with repository systems. Data users often search, discover, understand, access, and sometimes use data directly through repository web interfaces. Given that sub-par user interfaces can reduce the ability of users to locate, obtain, and use data, it is important to consider how repositories’ web interfaces can be evaluated and improved in order to ensure useful and successful user interactions. This paper discusses how usability assessment techniques are being applied to improve the functioning of data repository interfaces at the National Center for Atmospheric Research (NCAR). At NCAR, a new suite of data system tools is being developed and collectively called the NCAR Digital Asset Services Hub (DASH). Usability evaluation techniques have been used throughout the NCAR DASH design and implementation cycles in order to ensure that the systems work well together for the intended user base. By applying user study, paper prototype, competitive analysis, journey mapping, and heuristic evaluation, the NCAR DASH Search and Repository experiences provide examples for how data systems can benefit from usability principles and techniques. Integrating usability principles and techniques into repository system design and implementation workflows helps to optimize the systems’ overall user experience.


2013 ◽  
Vol 74 (2) ◽  
pp. 195-207 ◽  
Author(s):  
Jingfeng Xia ◽  
Ying Liu

This paper uses Genome Expression Omnibus (GEO), a data repository in biomedical sciences, to examine the usage patterns of open data repositories. It attempts to identify the degree of recognition of data reuse value and understand how e-science has impacted a large-scale scholarship. By analyzing a list of 1,211 publications that cite GEO data to support their independent studies, it discovers that free data can support a wealth of high-quality investigations, that the rate of open data use keeps growing over the years, and that scholars in different countries show different rates of complying with data-sharing policies.


Metabolites ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 568
Author(s):  
Brechtje Hoegen ◽  
Alan Zammit ◽  
Albert Gerritsen ◽  
Udo F. H. Engelke ◽  
Steven Castelein ◽  
...  

Inborn errors of metabolism (IEM) are inherited conditions caused by genetic defects in enzymes or cofactors. These defects result in a specific metabolic fingerprint in patient body fluids, showing accumulation of substrate or lack of an end-product of the defective enzymatic step. Untargeted metabolomics has evolved as a high throughput methodology offering a comprehensive readout of this metabolic fingerprint. This makes it a promising tool for diagnostic screening of IEM patients. However, the size and complexity of metabolomics data have posed a challenge in translating this avalanche of information into knowledge, particularly for clinical application. We have previously established next-generation metabolic screening (NGMS) as a metabolomics-based diagnostic tool for analyzing plasma of individual IEM-suspected patients. To fully exploit the clinical potential of NGMS, we present a computational pipeline to streamline the analysis of untargeted metabolomics data. This pipeline allows for time-efficient and reproducible data analysis, compatible with ISO:15189 accredited clinical diagnostics. The pipeline implements a combination of tools embedded in a workflow environment for large-scale clinical metabolomics data analysis. The accompanying graphical user interface aids end-users from a diagnostic laboratory for efficient data interpretation and reporting. We also demonstrate the application of this pipeline with a case study and discuss future prospects.


2021 ◽  
Vol 21 (suppl 2) ◽  
pp. 429-435
Author(s):  
Ana Nery Melo Cavalcante ◽  
Lohanna Valeska de Sousa Tavares ◽  
Maria Luiza Almeida Bastos ◽  
Rosa Lívia Freitas de Almeida

Abstract Objectives: to describe the clinical-epidemiological profile of children and adolescents notified by COVID-19 in Ceará. Methods: descriptive epidemiological study from open data repositories of the State Government of Ceará, about cases of OVID-19 in children and adolescents, from 03/15/2020 to 07/31/2020. For data analysis the tests χ2 Pearson, Fisher's exact and Poisson's regression with robust variance were used. Results: 48,002 cases of children and adolescents suspected of COVID-19 were reported, of which 18,180 (8.9%) were confirmed. The median of confirmed cases was 12 years old, 10.5% were newborns/lactants, 10.7% were pre-school children, 21.2% were school children and 57.7% were adolescents. They evolved to death 0.3% of the cases, of which 15% had comorbidities. They needed hospitalization 1.8% of the cases. The highest probability of hospitalization was found in newborns/lactants, male and with comorbidities. Conclusions: most of the confirmed cases occurred in adolescents, however, the evolution of the disease was more severe and with greater need for hospitalization in the age group of newborns/lactants, being the male gender and the presence of comorbidities additional factors for the need for hospitalization.


1970 ◽  
Vol 12 (2) ◽  
pp. 177-195 ◽  
Author(s):  
Alastair Dunning ◽  
Madeleine De Smaele ◽  
Jasmin Böhmer

This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories. The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world. The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines1are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance. This practice paper is connected to a dataset(Dunning et al.,2017) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets. The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results. The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture. (1) H2020 Guidelines on FAIR Data Management:http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf


2019 ◽  
Vol 35 (19) ◽  
pp. 3752-3760 ◽  
Author(s):  
Payam Emami Khoonsari ◽  
Pablo Moreno ◽  
Sven Bergmann ◽  
Joachim Burman ◽  
Marco Capuccini ◽  
...  

Abstract Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Jason P Smith ◽  
M Ryan Corces ◽  
Jin Xu ◽  
Vincent P Reuter ◽  
Howard Y Chang ◽  
...  

Abstract As chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. PEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. BSD2-licensed code and documentation are available at https://pepatac.databio.org.


Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1670
Author(s):  
Waheeb Abu-Ulbeh ◽  
Maryam Altalhi ◽  
Laith Abualigah ◽  
Abdulwahab Ali Almazroi ◽  
Putra Sumari ◽  
...  

Cyberstalking is a growing anti-social problem being transformed on a large scale and in various forms. Cyberstalking detection has become increasingly popular in recent years and has technically been investigated by many researchers. However, cyberstalking victimization, an essential part of cyberstalking, has empirically received less attention from the paper community. This paper attempts to address this gap and develop a model to understand and estimate the prevalence of cyberstalking victimization. The model of this paper is produced using routine activities and lifestyle exposure theories and includes eight hypotheses. The data of this paper is collected from the 757 respondents in Jordanian universities. This review paper utilizes a quantitative approach and uses structural equation modeling for data analysis. The results revealed a modest prevalence range is more dependent on the cyberstalking type. The results also indicated that proximity to motivated offenders, suitable targets, and digital guardians significantly influences cyberstalking victimization. The outcome from moderation hypothesis testing demonstrated that age and residence have a significant effect on cyberstalking victimization. The proposed model is an essential element for assessing cyberstalking victimization among societies, which provides a valuable understanding of the prevalence of cyberstalking victimization. This can assist the researchers and practitioners for future research in the context of cyberstalking victimization.


Sign in / Sign up

Export Citation Format

Share Document