scholarly journals Open Spatiotemporal Data Warehouse for Agriculture Production Analytics

2020 ◽  
Vol 13 (6) ◽  
pp. 419-431
Author(s):  
Irya Wisnubhadra ◽  
◽  
Safiza Baharin ◽  
Nanna Herman ◽  
◽  
...  

Business Intelligence (BI) technology with Extract, Transform, and Loading process, Data Warehouse, and OLAP have demonstrated the ability of information and knowledge generation for supporting decision making. In the last decade, the advancement of the Web 2.0 technology is improving the accessibility of web of data across the cloud. Linked Open Data, Linked Open Statistical Data, and Open Government Data is increasing massively, creating a more significant computer-recognizable data available for sharing. In agricultural production analytics, data resources with high availability and accessibility is a primary requirement. However, today’s data accessibility for production analytics is limited in the 2 or 3-stars open data format and rarely has attributes for spatiotemporal analytics. The new data warehouse concept has a new approach to combine the openness of data resources with mobility or spatiotemporal data in nature. This new approach could help the decision-makers to use external data to make a crucial decision more intuitive and flexible. This paper proposed the development of a spatiotemporal data warehouse with an integration process using service-oriented architecture and open data sources. The data sources are originating from the Village and Rural Area Information System (SIDeKa) that capture the agricultural production transaction in a daily manner. This paper also describes the way to spatiotemporal analytics for agricultural production using a new spatiotemporal data warehouse approach. The experiment results, by executing six relevant spatiotemporal query samples on DW with fact table contains 324096 tuples with temporal integer/float for each tuple, 4495 tuples of field dimension with geographic data as polygons, 80 tuples of village dimension, dozens of tuples of the district, subdistrict, province dimensions. The DW time dimension contains 3653 tuples representing a date for ten years, proved that this new approach has a convenient, simple model, and expressive performance for supporting executive to make decisions on agriculture production analytics based on spatiotemporal data. This research also underlines the prospects for scaling and nurturing the spatiotemporal data warehouse initiative.

Author(s):  
Nouha Arfaoui ◽  
Jalel Akaichi

The healthcare industry generates huge amount of data underused for decision making needs because of the absence of specific design mastered by healthcare actors and the lack of collaboration and information exchange between the institutions. In this work, a new approach is proposed to design the schema of a Hospital Data Warehouse (HDW). It starts by generating the schemas of the Hospital Data Mart (HDM) one for each department taking into consideration the requirements of the healthcare staffs and the existing data sources. Then, it merges them to build the schema of HDW. The bottom-up approach is suitable because the healthcare departments are separately. To merge the schemas, a new schema integration methodology is used. It starts by extracting the similar elements of the schemas and the conflicts and presents them as mapping rules. Then, it transforms the rules into queries and applies them to merge the schemas.


2016 ◽  
Vol 32 (2) ◽  
pp. 329-348 ◽  
Author(s):  
Mark Elliot ◽  
Elaine Mackey ◽  
Susan O’Shea ◽  
Caroline Tudor ◽  
Keith Spicer

Abstract In the UK, the transparency agenda is forcing data stewardship organisations to review their dissemination policies and to consider whether to release data that is currently only available to a restricted community of researchers under licence as open data. Here we describe the results of a study providing evidence about the risks of such an approach via a simulated attack on two social survey datasets. This is also the first systematic attempt to simulate a jigsaw identification attack (one using a mashup of multiple data sources) on an anonymised dataset. The information that we draw on is collected from multiple online data sources and purchasable commercial data. The results indicate that such an attack against anonymised end user licence (EUL) datasets, if converted into open datasets, is possible and therefore we would recommend that penetration tests should be factored into any decision to make datasets (that are about people) open.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Anneke Zuiderwijk ◽  
Mark de Reuver

Purpose Existing overviews of barriers for openly sharing and using government data are often conceptual or based on a limited number of cases. Furthermore, it is unclear what categories of barriers are most obstructive for attaining open data objectives. This paper aims to categorize and prioritize barriers for openly sharing and using government data based on many existing Open Government Data Initiatives (OGDIs). Design/methodology/approach This study analyzes 171 survey responses concerning existing OGDIs worldwide. Findings The authors found that the most critical OGDI barrier categories concern (in order of most to least critical): functionality and support; inclusiveness; economy, policy and process; data interpretation; data quality and resources; legislation and access; and sustainability. Policymakers should prioritize solving functionality and support barriers and inclusiveness barriers because the authors found that these are the most obstructive in attaining OGDI objectives. Practical implications The prioritization of open data barriers calls for three main actions by practitioners to reduce the barrier impact: open data portal developers should develop advanced tools to support data search, analysis, visualization, interpretation and interaction; open data experts and teachers should train potential users, and especially those currently excluded from OGDIs because of a lack of digital skills; and government agencies that provide open data should put user-centered design and the user experience central to better support open data users. Originality/value This study contributes to the open data literature by proposing a new, empirically based barrier categorization and prioritization based a large number of existing OGDIs.


Author(s):  
Nouha Arfaoui ◽  
Jalel Akaichi

The healthcare industry generates huge amount of data underused for decision making needs because of the absence of specific design mastered by healthcare actors and the lack of collaboration and information exchange between the institutions. In this work, a new approach is proposed to design the schema of a Hospital Data Warehouse (HDW). It starts by generating the schemas of the Hospital Data Mart (HDM) one for each department taking into consideration the requirements of the healthcare staffs and the existing data sources. Then, it merges them to build the schema of HDW. The bottom-up approach is suitable because the healthcare departments are separately. To merge the schemas, a new schema integration methodology is used. It starts by extracting the similar elements of the schemas and the conflicts and presents them as mapping rules. Then, it transforms the rules into queries and applies them to merge the schemas.


2011 ◽  
pp. 277-297 ◽  
Author(s):  
Carlo Combi ◽  
Barbara Oliboni

This chapter describes a graph-based approach to represent information stored in a data warehouse, by means of a temporal semistructured data model. We consider issues related to the representation of semistructured data warehouses, and discuss the set of constraints needed to manage in a correct way the warehouse time, i.e. the time dimension considered storing data in the data warehouse itself. We use a temporal semistructured data model because a data warehouse can contain data coming from different and heterogeneous data sources. This means that data stored in a data warehouse are semistructured in nature, i.e. in different documents the same information can be represented in different ways, and moreover, the document schemata can be available or not. Moreover, information stored into a data warehouse is often time varying, thus as for semistructured data, also in the data warehouse context, it could be useful to consider time.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Han-Yu Sung ◽  
Yu-Liang Chi

Purpose This study aims to develop a Web-based application system called Infomediary of Taiwanese Indigenous Peoples (ITIP) that can help individuals comprehend the society and culture of indigenous people. The ITIP is based on the use of Semantic Web technologies to integrate a number of data sources, particularly including the bibliographic records of a museum. Moreover, an ontology model was developed to help users search cultural collections by topic concepts. Design/methodology/approach Two issues were identified that needed to be addressed: the integration of heterogeneous data sources and semantic-based information retrieval. Two corresponding methods were proposed: SPARQL federated queries were designed for data integration across the Web and ontology-driven queries were designed to semantically search by knowledge inference. Furthermore, to help users perform searches easily, three searching interfaces, namely, ethnicity, region and topic, were developed to take full advantage of the content available on the Web. Findings Most open government data provides structured but non-resource description framework data, Semantic Web consumers, therefore, require additional data conversion before the data can be used. On the other hand, although the library, archive and museum (LAM) community has produced some emerging linked data, very few data sets are released to the general public as open data. The Semantic Web’s vision of “web of data” remains challenging. Originality/value This study developed data integration from various institutions, including those of the LAM community. The development was conducted based on the mode of non-institution members (i.e. institutional outsiders). The challenges encountered included uncertain data quality and the absence of institutional participation.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5204
Author(s):  
Anastasija Nikiforova

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.


Epidemiologia ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 315-324
Author(s):  
Juan M. Banda ◽  
Ramya Tekumalla ◽  
Guanyu Wang ◽  
Jingyuan Yu ◽  
Tuo Liu ◽  
...  

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.


2021 ◽  
Vol 10 (1) ◽  
pp. 30
Author(s):  
Alfonso Quarati ◽  
Monica De Martino ◽  
Sergio Rosim

The Open Government Data portals (OGD), thanks to the presence of thousands of geo-referenced datasets, containing spatial information are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered as hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. In addition, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.


2017 ◽  
Vol 4 (1) ◽  
pp. 205395171769075 ◽  
Author(s):  
Andrew Schrock ◽  
Gwen Shaffer

Government officials claim open data can improve internal and external communication and collaboration. These promises hinge on “data intermediaries”: extra-institutional actors that obtain, use, and translate data for the public. However, we know little about why these individuals might regard open data as a site of civic participation. In response, we draw on Ilana Gershon to conceptualize culturally situated and socially constructed perspectives on data, or “data ideologies.” This study employs mixed methodologies to examine why members of the public hold particular data ideologies and how they vary. In late 2015 the authors engaged the public through a commission in a diverse city of approximately 500,000. Qualitative data was collected from three public focus groups with residents. Simultaneously, we obtained quantitative data from surveys. Participants’ data ideologies varied based on how they perceived data to be useful for collaboration, tasks, and translations. Bucking the “geek” stereotype, only a minority of those surveyed (20%) were professional software developers or engineers. Although only a nascent movement, we argue open data intermediaries have important roles to play in a new political landscape.


Sign in / Sign up

Export Citation Format

Share Document