data ingestion
Recently Published Documents


TOTAL DOCUMENTS

103
(FIVE YEARS 54)

H-INDEX

9
(FIVE YEARS 2)

Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8429
Author(s):  
Ala Arman ◽  
Pierfrancesco Bellini ◽  
Daniele Bologna ◽  
Paolo Nesi ◽  
Gianni Pantaleo ◽  
...  

The Internet of things has produced several heterogeneous devices and data models for sensors/actuators, physical and virtual. Corresponding data must be aggregated and their models have to be put in relationships with the general knowledge to make them immediately usable by visual analytics tools, APIs, and other devices. In this paper, models and tools for data ingestion and regularization are presented to simplify and enable the automated visual representation of corresponding data. The addressed problems are related to the (i) regularization of the high heterogeneity of data that are available in the IoT devices (physical or virtual) and KPIs (key performance indicators), thus allowing such data in elements of hypercubes to be reported, and (ii) the possibility of providing final users with an index on views and data structures that can be directly exploited by graphical widgets of visual analytics tools, according to different operators. The solution analyzes the loaded data to extract and generate the IoT device model, as well as to create the instances of the device and generate eventual time series. The whole process allows data for visual analytics and dashboarding to be prepared in a few clicks. The proposed IoT device model is compliant with FIWARE NGSI and is supported by a formal definition of data characterization in terms of value type, value unit, and data type. The resulting data model has been enforced into the Snap4City dashboard wizard and tool, which is a GDPR-compliant multitenant architecture. The solution has been developed and validated by considering six different pilots in Europe for collecting big data to monitor and reason people flows and tourism with the aim of improving quality of service; it has been developed in the context of the HERIT-DATA Interreg project and on top of Snap4City infrastructure and tools. The model turned out to be capable of meeting all the requirements of HERIT-DATA, while some of the visual representation tools still need to be updated and furtherly developed to add a few features.


2021 ◽  
Vol 10 (1) ◽  
pp. 18
Author(s):  
Simone Berto ◽  
Emanuel Demetrescu ◽  
Bruno Fanini ◽  
Jacopo Bonetto ◽  
Giuseppe Salemi

In this work, we will describe the application of the Extended Matrix Framework (EMF) to the 3D reconstruction of the temple on the Roman forum of Nora. EMF represents a specific section of the Extended Matrix (EM) method, developed by the VHLab of the CNR ISPC (Rome), dedicated to the development of software solutions for 3D data management in the field of virtual reconstruction. The combination of EM and EMF allows to: map the reconstructive process, validate the entire workflow (from data ingestion to 3D modelling), manage 3D data, and share outcomes online.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Paul Billing Ross ◽  
Jina Song ◽  
Philip S. Tsao ◽  
Cuiping Pan

AbstractBiomedical studies have become larger in size and yielded large quantities of data, yet efficient data processing remains a challenge. Here we present Trellis, a cloud-based data and task management framework that completely automates the process from data ingestion to result presentation, while tracking data lineage, facilitating information query, and supporting fault-tolerance and scalability. Using a graph database to coordinate the state of the data processing workflows and a scalable microservice architecture to perform bioinformatics tasks, Trellis has enabled efficient variant calling on 100,000 human genomes collected in the VA Million Veteran Program.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Muhammad Babar ◽  
Muhammad Usman Tariq ◽  
Ahmed S. Almasoud ◽  
Mohammad Dahman Alshehri

The present spreading out of big data found the realization of AI and machine learning. With the rise of big data and machine learning, the idea of improving accuracy and enhancing the efficacy of AI applications is also gaining prominence. Machine learning solutions provide improved guard safety in hazardous traffic circumstances in the context of traffic applications. The existing architectures have various challenges, where data privacy is the foremost challenge for vulnerable road users (VRUs). The key reason for failure in traffic control for pedestrians is flawed in the privacy handling of the users. The user data are at risk and are prone to several privacy and security gaps. If an invader succeeds to infiltrate the setup, exposed data can be malevolently influenced, contrived, and misrepresented for illegitimate drives. In this study, an architecture is proposed based on machine learning to analyze and process big data efficiently in a secure environment. The proposed model considers the privacy of users during big data processing. The proposed architecture is a layered framework with a parallel and distributed module using machine learning on big data to achieve secure big data analytics. The proposed architecture designs a distinct unit for privacy management using a machine learning classifier. A stream processing unit is also integrated with the architecture to process the information. The proposed system is apprehended using real-time datasets from various sources and experimentally tested with reliable datasets that disclose the effectiveness of the proposed architecture. The data ingestion results are also highlighted along with training and validation results.


2021 ◽  
Vol 10 (11) ◽  
pp. 743
Author(s):  
Xiaohui Huang ◽  
Junqing Fan ◽  
Ze Deng ◽  
Jining Yan ◽  
Jiabao Li ◽  
...  

Multi-source Internet of Things (IoT) data, archived in institutions’ repositories, are becoming more and more widely open-sourced to make them publicly accessed by scientists, developers, and decision makers via web services to promote researches on geohazards prevention. In this paper, we design and implement a big data-turbocharged system for effective IoT data management following the data lake architecture. We first propose a multi-threading parallel data ingestion method to ingest IoT data from institutions’ data repositories in parallel. Next, we design storage strategies for both ingested IoT data and processed IoT data to store them in a scalable, reliable storage environment. We also build a distributed cache layer to enable fast access to IoT data. Then, we provide users with a unified, SQL-based interactive environment to enable IoT data exploration by leveraging the processing ability of Apache Spark. In addition, we design a standard-based metadata model to describe ingested IoT data and thus support IoT dataset discovery. Finally, we implement a prototype system and conduct experiments on real IoT data repositories to evaluate the efficiency of the proposed system.


Author(s):  
Stefano Mendicino ◽  
Daniele Menniti ◽  
Francesco Palumbo ◽  
Anna Pinnarelli ◽  
Nicola Sorrentino ◽  
...  

2021 ◽  
Author(s):  
Stiw Herrera ◽  
Larissa Miguez da Silva ◽  
Paulo Ricardo Reis ◽  
Anderson Silva ◽  
Fabio Porto

Scientific data is mainly multidimensional in its nature, presenting interesting opportunities for optimizations when managed by array databases. However, in scenarios where data is sparse, an efficient implementation is still required. In this paper, we investigate the adoption of the Ph-tree as an in-memory indexing structure for sparse data. We compare the performance in data ingestion and in both range and punctual queries, using SAVIME as the multidimensional array DBMS. Our experiments, using a real weather dataset, highlights the challenges involving providing a fast data ingestion, as proposed by SAVIME, and at the same time efficiently answering multidimensional queries on sparse data.


Author(s):  
Javier Molina ◽  
Peggy Newman ◽  
David Martin ◽  
Vicente Ruiz Jurado

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two leading infrastructures serving the biodiversity community. In 2020, the ALA’s occurrence records management systems reached end of life after more than 10 years of operation, and the ALA embarked on a project to replace them. Significant overlap exists in the function of the ALA and GBIF data ingestion pipeline systems. Instead of the ALA developing new systems from scratch, we initiated a project to better align the two infrastructures. The collaboration brings benefits such as the improved reuse of modules and an overall reduction in development and operation costs. The ALA recently replaced its occurrence ingestion system with GBIF pipelines infrastructure and shared code. This is the first milestone of the broader ALA’s Core Infrastructure Project and some of the benefits from it are a more reliable, performant and scalable system, proven by the ability to ingest more and larger datasets while at the same time reducing infrastructure operational costs by more than 40% compared to the previous system. The new system is a key building block for an improved ingestion framework that is being developed within the ALA. The collaboration between the ALA and GBIF development teams will result in more consistent outputs from their respective processing pipelines. It will also allow the broader collective expertise of both infrastructure communities to inform future development and direction. The ALA’s adoption of GBIF pipelines will pave the way for the Living Atlases community to adopt GBIF systems and also contribute to them. In this talk we will introduce the project, share insights on how both the teams from the GBIF and the ALA worked together and finally we will delve into details about the technical implementation and benefits.


Sign in / Sign up

Export Citation Format

Share Document