data formats
Recently Published Documents





2022 ◽  
Vol 22 (1) ◽  
pp. 1-35
Muhammad Junaid ◽  
Adnan Sohail ◽  
Fadi Al Turjman ◽  
Rashid Ali

Over the years cloud computing has seen significant evolution in terms of improvement in infrastructure and resource provisioning. However the continuous emergence of new applications such as the Internet of Things (IoTs) with thousands of users put a significant load on cloud infrastructure. Load balancing of resource allocation in cloud-oriented IoT is a critical factor that has a significant impact on the smooth operation of cloud services and customer satisfaction. Several load balancing strategies for cloud environment have been proposed in the past. However the existing approaches mostly consider only a few parameters and ignore many critical factors having a pivotal role in load balancing leading to less optimized resource allocation. Load balancing is a challenging problem and therefore the research community has recently focused towards employing machine learning-based metaheuristic approaches for load balancing in the cloud. In this paper we propose a metaheuristics-based scheme Data Format Classification using Support Vector Machine (DFC-SVM), to deal with the load balancing problem. The proposed scheme aims to reduce the online load balancing complexity by offline-based pre-classification of raw-data from diverse sources (such as IoT) into different formats e.g. text images media etc. SVM is utilized to classify “n” types of data formats featuring audio video text digital images and maps etc. A one-to-many classification approach has been developed so that data formats from the cloud are initially classified into their respective classes and assigned to virtual machines through the proposed modified version of Particle Swarm Optimization (PSO) which schedules the data of a particular class efficiently. The experimental results compared with the baselines have shown a significant improvement in the performance of the proposed approach. Overall an average of 94% classification accuracy is achieved along with 11.82% less energy 16% less response time and 16.08% fewer SLA violations are observed.

2022 ◽  
Vol 15 ◽  
Margarita Ruiz-Olazar ◽  
Evandro Santos Rocha ◽  
Claudia D. Vargas ◽  
Kelly Rosa Braghetto

Computational tools can transform the manner by which neuroscientists perform their experiments. More than helping researchers to manage the complexity of experimental data, these tools can increase the value of experiments by enabling reproducibility and supporting the sharing and reuse of data. Despite the remarkable advances made in the Neuroinformatics field in recent years, there is still a lack of open-source computational tools to cope with the heterogeneity and volume of neuroscientific data and the related metadata that needs to be collected during an experiment and stored for posterior analysis. In this work, we present the Neuroscience Experiments System (NES), a free software to assist researchers in data collecting routines of clinical, electrophysiological, and behavioral experiments. NES enables researchers to efficiently perform the management of their experimental data in a secure and user-friendly environment, providing a unified repository for the experimental data of an entire research group. Furthermore, its modular software architecture is aligned with several initiatives of the neuroscience community and promotes standardized data formats for experiments and analysis reporting.

2022 ◽  
pp. 1-19
Zuleyha Akusta Dagdeviren

Internet of things (IoT) has attracted researchers in recent years as it has a great potential to solve many emerging problems. An IoT platform is missioned to operate as a horizontal key element for serving various vertical IoT domains such as structure monitoring, smart agriculture, healthcare, miner safety monitoring, smart home, and healthcare. In this chapter, the authors propose a comprehensive analysis of IoT platforms to evaluate their capabilities. The selected metrics (features) to investigate the IoT platforms are “ability to serve different domains,” “ability to handle different data formats,” “ability to process unlimited size of data from various context,” “ability to convert unstructured data to structured data,” and “ability to produce complex reports.” These metrics are chosen by considering the reporting capabilities of various IoT platforms, big data concepts, and domain-related issues. The authors provide a detailed comparison derived from the metric analysis to show the advantages and drawbacks of IoT platforms.

Anna Bernasconi

AbstractA wealth of public data repositories is available to drive genomics and clinical research. However, there is no agreement among the various data formats and models; in the common practice, data sources are accessed one by one, learning their specific descriptions with tedious efforts. In this context, the integration of genomic data and of their describing metadata becomes—at the same time—an important, difficult, and well-recognized challenge. In this chapter, after overviewing the most important human genomic data players, we propose a conceptual model of metadata and an extended architecture for integrating datasets, retrieved from a variety of data sources, based upon a structured transformation process; we then describe a user-friendly search system providing access to the resulting consolidated repository, enriched by a multi-ontology knowledge base. Inspired by our work on genomic data integration, during the COVID-19 pandemic outbreak we successfully re-applied the previously proposed model-build-search paradigm, building on the analogies among the human and viral genomics domains. The availability of conceptual models, related databases, and search systems for both humans and viruses will provide important opportunities for research, especially if virus data will be connected to its host, provider of genomic and phenotype information.

2021 ◽  
Vol 11 (1) ◽  
pp. 27
Sergio Martin-Segura ◽  
Francisco Javier Lopez-Pellicer ◽  
Javier Nogueras-Iso ◽  
Javier Lacasta ◽  
Francisco Javier Zarazaga-Soria

The content at the end of any hyperlink is subject to two phenomena: the link may break (Link Rot) or the content at the end of the link may no longer be the same as it was when it was created (Content Drift). Reference Rot denotes the combination of both effects. Spatial metadata records rely on hyperlinks for indicating the location of the resources they describe. Therefore, they are also subject to Reference Rot. This paper evaluates the presence of Reference Rot and its impact on the 22,738 distribution URIs of 18,054 metadata records from 26 European INSPIRE spatial data catalogues. Our Link Rot checking method detects broken links while considering the specific requirements of spatial data services. Our Content Drift checking method uses the data format as an indicator. It compares the data formats declared in the metadata with the actual data types returned by the hyperlinks. Findings show that 10.41% of the distribution URIs suffer from Link Rot and at least 6.21% of records suffer from Content Drift (do not declare its distribution types correctly). Additionally, 14.94% of metadata records only contain intermediate HTML web pages as distribution URIs and 31.37% contain at least one HTML web page; thus, they cannot be accessed or checked directly.

2021 ◽  
Vol 137 (1) ◽  
Manuela Boscolo ◽  
Helmut Burkhardt ◽  
Gerardo Ganis ◽  
Clément Helsens

AbstractPowerful flexible computer codes are essential for the design and optimisation of accelerator and experiments. We briefly review what already exists and what is needed in terms of accelerator codes. For the FCC-ee, it will be important to include the effects of beamstrahlung and beam–beam interaction as well as machine imperfections and sources of beam-induced backgrounds relevant for the experiments and consider the possibility of beam polarisation. The experiment software Key4hep, which aims to provide a common software stack for future experiments, is described, and the possibility of extending this concept to machine codes is discussed. We analyse how to interface and connect the accelerator and experiment codes in an efficient and flexible way for optimisation of the FCC-ee interaction region design and discuss the possibility of using shared data formats as an interface.


Представлены результаты исследовательской работы по унификации форматов данных,которые требуются при реализации аналитической составляющей в структуре информационных систем, предназначенных для решения экспертно-прогностических задач в сфере связи и вещания. По результатам анализа существующих в отрасли основных автоматизированных систем, их функциональных возможностей обоснована необходимость введения в действующие и перспективные системы аналитической составляющей и унификации форматов данных как непременного условия их взаимодействия. Рассматривается набор унифицированных форматов данных, разработанный для перспективного макета аналитической системы развития связи и вещания. The results of research work on unifying the data formats that are required for the implementation of the analytical component in the structure of information systems intended for the solution of expert predictive problems in the sphere of communications and broadcasting are presented. Based on the results of the analysis of the main automated systems and their functional capabilities the necessity of introducing an analytical component into existing and future systems as well as unifying data formats as an indispensable condition for their interaction is substantiated. A set of unified data formats developed for a promising layout of an analytical system for the development of communications and broadcasting is considered.

2021 ◽  
pp. 34-48
Sarah A. Sutherland

2021 ◽  
Vol 21 (1) ◽  
Arsenij Ustjanzew ◽  
Alexander Desuki ◽  
Christoph Ritzel ◽  
Alina Corinna Dolezilek ◽  
Daniel-Christoph Wagner ◽  

Abstract Background Extensive sequencing of tumor tissues has greatly improved our understanding of cancer biology over the past years. The integration of genomic and clinical data is increasingly used to select personalized therapies in dedicated tumor boards (Molecular Tumor Boards) or to identify patients for basket studies. Genomic alterations and clinical information can be stored, integrated and visualized in the open-access resource cBioPortal for Cancer Genomics. cBioPortal can be run as a local instance enabling storage and analysis of patient data in single institutions, in the respect of data privacy. However, uploading clinical input data and genetic aberrations requires the elaboration of multiple data files and specific data formats, which makes it difficult to integrate this system into clinical practice. To solve this problem, we developed cbpManager. Results cbpManager is an R package providing a web-based interactive graphical user interface intended to facilitate the maintenance of mutations data and clinical data, including patient and sample information, as well as timeline data. cbpManager enables a large spectrum of researchers and physicians, regardless of their informatics skills to intuitively create data files ready for upload in cBioPortal for Cancer Genomics on a daily basis or in batch. Due to its modular structure based on R Shiny, further data formats such as copy number and fusion data can be covered in future versions. Further, we provide cbpManager as a containerized solution, enabling a straightforward large-scale deployment in clinical systems and secure access in combination with ShinyProxy. cbpManager is freely available via the Bioconductor project at under the AGPL-3 license. It is already used at six University Hospitals in Germany (Mainz, Gießen, Lübeck, Halle, Freiburg, and Marburg). Conclusion In summary, our package cbpManager is currently a unique software solution in the workflow with cBioPortal for Cancer Genomics, to assist the user in the interactive generation and management of study files suited for the later upload in cBioPortal.

2021 ◽  
Vol 17 (4) ◽  
pp. 1-32
Siying Dong ◽  
Andrew Kryczka ◽  
Yanqin Jin ◽  
Michael Stumm

This article is an eight-year retrospective on development priorities for RocksDB, a key-value store developed at Facebook that targets large-scale distributed systems and that is optimized for Solid State Drives (SSDs). We describe how the priorities evolved over time as a result of hardware trends and extensive experiences running RocksDB at scale in production at a number of organizations: from optimizing write amplification, to space amplification, to CPU utilization. We describe lessons from running large-scale applications, including that resource allocation needs to be managed across different RocksDB instances, that data formats need to remain backward- and forward-compatible to allow incremental software rollouts, and that appropriate support for database replication and backups are needed. Lessons from failure handling taught us that data corruption errors needed to be detected earlier and that data integrity protection mechanisms are needed at every layer of the system. We describe improvements to the key-value interface. We describe a number of efforts that in retrospect proved to be misguided. Finally, we describe a number of open problems that could benefit from future research.

Sign in / Sign up

Export Citation Format

Share Document