scholarly journals First experiences with a portable analysis infrastructure for LHC at INFN

2021 ◽  
Vol 251 ◽  
pp. 02045
Author(s):  
Diego Ciangottini ◽  
Tommaso Boccali ◽  
Andrea Ceccanti ◽  
Daniele Spiga ◽  
Davide Salomoni ◽  
...  

The challenges proposed by the HL-LHC era are not limited to the sheer amount of data to be processed: the capability of optimizing the analyser's experience will also bring important benefits for the LHC communities, in terms of total resource needs, user satisfaction and in the reduction of end time to publication. At the Italian National Institute for Nuclear Physics (INFN) a portable software stack for analysis has been proposed, based on cloud-native tools and capable of providing users with a fully integrated analysis environment for the CMS experiment. The main characterizing traits of the solution consist in the user-driven design and the portability to any cloud resource provider. All this is made possible via an evolution towards a “python-based” framework, that enables the usage of a set of open-source technologies largely adopted in both cloud-native and data-science environments. In addition, a “single sign on”-like experience is available thanks to the standards-based integration of INDIGO-IAM with all the tools. The integration of compute resources is done through the customization of a JupyterHUB solution, able to spawn identity-aware user instances ready to access data with no further setup actions. The integration with GPU resources is also available, designed to sustain more and more widespread ML based workflow. Seamless connections between the user UI and batch/big data processing framework (Spark, HTCondor) are possible. Eventually, the experiment data access latency is reduced thanks to the integrated deployment of a scalable set of caches, as developed in the context of ESCAPE project, and as such compatible with the future scenarios where a data-lake will be available for the research community. The outcome of the evaluation of such a solution in action is presented, showing how a real CMS analysis workflow can make use of the infrastructure to achieve its results.

2017 ◽  
Author(s):  
Alessandro Rigano ◽  
Caterina Strambio-De-Castillia

AbstractThe proposed Minimum Information About Particle Tracking Experiments (MIAPTE) reporting guidelines described here aim to deliver a set of rules representing the minimal information required to report and support interpretation and assessment of data arising from intracellular multiple particle tracking (MPT) experiments. Examples of such experiments are those tracking viral particles as they move from the site of entry to the site of replication within an infected cell, or those following vesicular dynamics during secretion, endocytosis, or exocytosis. By promoting development of community standards, we hope that MIAPTE will contribute to making MPT data FAIR (Findable Accessible Interoperable and Reusable). Ultimately, the goal of MIAPTE is to promote and maximize data access, discovery, preservation, re-use, and repurposing through efficient annotation, and ultimately to enable reproducibility of particle tracking experiments. This document introduces MIAPTE v0.2, which updates the version that was posted to Fairsharing.org in October 2016. MIAPTE v0.2 is presented with the specific intent of soliciting comments from the particle tracking community with the purpose of extending and improving the model. The MIAPTE guidelines are intended for different categories of users: 1) Scientists with the desire to make new results available in a way that can be interpreted unequivocally by both humans and machines. For this class of users, MIAPTE provides data descriptors to define data entry terms and the analysis workflow in a unified manner. 2) Scientists wishing to evaluate, replicate and re-analyze results published by others. For this class of users MIAPTE provides descriptors that define the analysis procedures in a manner that facilitates its reproduction. 3) Developers who want to take advantage of the schema of MIAPTE to produce MIAPTE compatible tools. MIAPTE consists of a list of controlled vocabulary (CV) terms that describe elements and properties for the minimal description of particle tracking experiments, with a focus on viral and vesicular traffic within cells. As part of this submission we provide entity relationship (ER) diagrams that show the relationship between terms. Finally, we also provide documents containing the MIAPTE-compliant XML schema describing the data model used by Open Microscopy Environment inteGrated Analysis (OMEGA), our novel particle tracking data analysis and management tool, which is reported in a separate manuscript. MIAPTE is structured in two sub-sections: 1) Section 1 contains elements, attributes and data structures describing the results of particle tracking, namely: particles, links, trajectories and trajectory segments. 2) Section 2 contains elements that provide details about the algorithmic procedure utilized to produce and analyze trajectories as well as the results of trajectory analysis. In addition MIAPTE includes those OME-XML elements that are required to capture the acquisition parameters and the structure of images to be subjected to particle tracking.


2021 ◽  
Vol 11 (7) ◽  
pp. 3012
Author(s):  
Muhammad Iftikhar Hussain ◽  
Jingsha He ◽  
Nafei Zhu ◽  
Fahad Sabah ◽  
Zulfiqar Ali Zardari ◽  
...  

In the modern digital era, everyone is partially or fully integrated with cloud computing to access numerous cloud models, services, and applications. Multi-cloud is a blend of a well-known cloud model under a single umbrella to accomplish all the distinct nature and realm requirements under one service level agreement (SLA). In current era of cloud paradigm as the flood of services, applications, and data access rise over the Internet, the lack of confidentiality of the end user’s credentials is rising to an alarming level. Users typically need to authenticate multiple times to get authority and access the desired services or applications. In this research, we have proposed a completely secure scheme to mitigate multiple authentications usually required from a particular user. In the proposed model, a federated trust is created between two different domains: consumer and provider. All traffic coming towards the service provider is further divided into three phases based on the concerned user’s data risks. Single sign-on (SSO) and multifactor authentication (MFA) are deployed to get authentication, authorization, accountability, and availability (AAAA) to ensure the security and confidentiality of the end user’s credentials. The proposed solution exploits the finding that MFA achieves a better AAAA pattern as compared to SSO.


Author(s):  
Rogério Aparecido Sá Ramalho ◽  
Ricardo César Gonçalves Sant'Ana ◽  
Francisco Carlos Paletta

The acceleration of the development of digital technologies and the increase of the capillarity of their effects present new challenges to the praxis related to the treatment and informational flows and those that are object of study of information science. This chapter is based on a theoretical study that analyzes information science contributions in the data science era, analyzing from the Cynefin Framework to the new contemporary informational demands generated by the increasing predominance of data access and use. In order to establish the relationship between the skills expected from the information science professional and its relationship with access to data, the Cynefin Framework was used as a basis to establish a perspective of analyzing the skills involved in each of the phases of the life cycle of the data.


Author(s):  
Tavinder Kaur Ark ◽  
Sarah Kesselring ◽  
Brent Hills ◽  
Kim McGrail

BackgroundPopulation Data BC (PopData) was established as a multi-university data and education resourceto support training and education, data linkage, and access to individual level, de-identified data forresearch in a wide variety of areas including human and community development and well-being. ApproachA combination of deterministic and probabilistic linkage is conducted based on the quality andavailability of identifiers for data linkage. PopData utilizes a harmonized data request and approvalprocess for data stewards and researchers to increase efficiency and ease of access to linked data.Researchers access linked data through a secure research environment (SRE) that is equipped witha wide variety of tools for analysis. The SRE also allows for ongoing management and control ofdata. PopData continues to expand its data holdings and to evolve its services as well as governanceand data access process. DiscussionPopData has provided efficient and cost-effective access to linked data sets for research. After twodecades of learning, future planned developments for the organization include, but are not limitedto, policies to facilitate programs of research, access to reusable datasets, evaluation and use of newdata linkage techniques such as privacy preserving record linkage (PPRL). ConclusionPopData continues to maintain and grow the number and type of data holdings available for research.Its existing models support a number of large-scale research projects and demonstrate the benefitsof having a third-party data linkage and provisioning center for research purposes. Building furtherconnections with existing data holders and governing bodies will be important to ensure ongoingaccess to data and changes in policy exist to facilitate access for researchers.


2015 ◽  
Vol 14s5 ◽  
pp. CIN.S30793 ◽  
Author(s):  
Jian Li ◽  
Aarif Mohamed Nazeer Batcha ◽  
Björn Gaining ◽  
Ulrich R. Mansmann

Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility.


Metabolomics ◽  
2019 ◽  
Vol 15 (10) ◽  
Author(s):  
Kevin M. Mendez ◽  
Leighton Pritchard ◽  
Stacey N. Reinke ◽  
David I. Broadhurst

Abstract Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.


Author(s):  
Dirk Lewandowski

Web search engines apply a variety of ranking signals to achieve user satisfaction, i.e., results pages that provide the best-possible results for the user. While these ranking signals implicitly consider credibility (e.g., by measuring popularity), explicit measures of credibility are not applied. In this chapter, credibility in Web search engines is discussed in a broad context: credibility as a measure for including documents in a search engine’s index, credibility as a ranking signal, credibility in the context of universal search results, and the possibility of using credibility as an explicit measure for ranking purposes. It is found that while search engines—at least to a certain extent—show credible results to their users, there is no fully integrated credibility framework for Web search engines.


Author(s):  
Kimberlyn McGrail ◽  
Brent Diverty ◽  
Lisa Lix

IntroductionNotwithstanding Canada’s exceptional longitudinal health data and research centres with extensive experience transforming data into knowledge, many Canadian studies based on linked administrative data have focused on a single province or territory. Health Data Research Network Canada (HDRN Canada), a new not-for-profit corporation, will bring together major national, provincial and territorial health data stewards from across Canada. HDRN Canada’s first initiative is the $81 million SPOR Canadian Data Platform funded under the Canadian Institutes of Health Research Strategy for Patient-Oriented Research (SPOR). Objectives and ApproachHDRN Canada is a distributed network through which individual data-holding centres work together to (i) create a single portal and support system for researchers requesting multi-jurisdictional data, (ii) harmonize and validate case definitions and key analytic variables across jurisdictions, (iii) expand the sources and types of data linkages, (iv) develop technological infrastructure to improve data access and collection, (v) create supports for advanced analytics and (vi) establish strong partnerships with patients, the public and with Indigenous communities. We will share our experiences and gather international feedback on our network and its goals from symposium participants. ResultsIn January 2020, HDRN Canada launched its Data Access Support Hub (DASH) which includes an inventory listing over 380 datasets, information about more than 120 algorithms and a repository of requirements and processes for accessing data. HDRN Canada is receiving requests for multi-province research studies that would be challenging to conduct without HDRN Canada. Conclusion / ImplicationsThus far, HDRN Canada services and tools have been developed primarily for Canadian researchers but HDRN Canada can also serve as a prompt for an international discussion about what has/has not worked in terms of multi-jurisdictional research data infrastructure. It can also present an opportunity for the development of metadata, standards and common approaches that support more multi-country research.


Sign in / Sign up

Export Citation Format

Share Document