open data Latest Research Papers

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

Download Full-text

An open science and open data approach for the statistically robust estimation of forest disturbance areas

International Journal of Applied Earth Observation and Geoinformation ◽

10.1016/j.jag.2021.102663 ◽

2022 ◽

Vol 106 ◽

pp. 102663

Author(s):

Saverio Francini ◽

Ronald E. McRoberts ◽

Giovanni D'Amico ◽

Nicholas C. Coops ◽

Txomin Hermosilla ◽

...

Keyword(s):

Robust Estimation ◽

Forest Disturbance ◽

Open Data ◽

Open Science

Download Full-text

‘Trust Us’: Open Data and Preregistration in Political Science and International Relations

10.31222/osf.io/8h2bp ◽

2022 ◽

Author(s):

Bermond Scoggins ◽

Matthew Peter Robertson

Keyword(s):

International Relations ◽

Political Science ◽

Statistical Inference ◽

Large Scale ◽

Open Data ◽

Population Level ◽

Open Science ◽

Science Practices ◽

Level Data ◽

Replication Crisis

The scientific method is predicated on transparency -- yet the pace at which transparent research practices are being adopted by the scientific community is slow. The replication crisis in psychology showed that published findings employing statistical inference are threatened by undetected errors, data manipulation, and data falsification. To mitigate these problems and bolster research credibility, open data and preregistration have increasingly been adopted in the natural and social sciences. While many political science and international relations journals have committed to implementing these reforms, the extent of open science practices is unknown. We bring large-scale text analysis and machine learning classifiers to bear on the question. Using population-level data -- 93,931 articles across the top 160 political science and IR journals between 2010 and 2021 -- we find that approximately 21% of all statistical inference papers have open data, and 5% of all experiments are preregistered. Despite this shortfall, the example of leading journals in the field shows that change is feasible and can be effected quickly.

Download Full-text

The LCA Commons—How an Open-Source Repository for US Federal Life Cycle Assessment (LCA) Data Products Advances Inter-Agency Coordination

Applied Sciences ◽

10.3390/app12020865 ◽

2022 ◽

Vol 12 (2) ◽

pp. 865

Author(s):

Ezra Kahn ◽

Erin Antognoli ◽

Peter Arbuckle

Keyword(s):

Life Cycle Assessment ◽

Life Cycle ◽

Environmental Impact ◽

Community Of Practice ◽

Federal Government ◽

Open Data ◽

Data Infrastructure ◽

The Us ◽

The Government ◽

Us Federal Government

Life cycle assessment (LCA) is a flexible and powerful tool for quantifying the total environmental impact of a product or service from cradle-to-grave. The US federal government has developed deep expertise in environmental LCA for a range of applications including policy, regulation, and emerging technologies. LCA professionals from across the government have been coordinating the distributed LCA expertise through a community of practice known as the Federal LCA Commons. The Federal LCA Commons has developed open data infrastructure and workflows to share knowledge and align LCA methods. This data infrastructure is a key component to creating a harmonized network of LCA capacity from across the federal government.

Download Full-text

Digitalization of culturally significant buildings: ensuring high-quality data exchanges in the heritage domain using OpenBIM

Heritage Science ◽

10.1186/s40494-021-00640-y ◽

2022 ◽

Vol 10 (1) ◽

Author(s):

Laurens Jozef Nicolaas Oostwegel ◽

Štefan Jaud ◽

Sergej Muhič ◽

Katja Malovrh Rebec

Keyword(s):

Data Exchange ◽

Open Data ◽

Quality Data ◽

Semantic Data ◽

Information Models ◽

Building Information ◽

Heritage Building ◽

Conservation Plan ◽

The Creation ◽

Information Delivery Manual

AbstractCultural heritage building information models (HBIMs) incorporate specific geometric and semantic data that are mandatory for supporting the workflows and decision making during a heritage study. The Industry Foundation Classes (IFC) open data exchange standard can be used to migrate these data between different software solutions as an openBIM approach, and has the potential to mitigate data loss. Specific data-exchange scenarios can be supported by firstly developing an Information Delivery Manual (IDM) and subsequently filtering portions of the IFC schema and producing a specialized Model View Definition (MVD). This paper showcases the creation of a specialized IDM for the heritage domain in consultation with experts in the restoration and preservation of built heritage. The IDM was then translated into a pilot MVD for heritage. We tested our developments on an HBIM case study, where a historic building was semantically enriched with information about the case study’s conservation plan and then checked against the specified IDM requirements using the developed MVD. We concluded that the creation of an IDM and then the MVD for the heritage domain are achievable and will bring us one step closer to BIM standardisation in the field of digitised cultural buildings.

Download Full-text

GPS-based fine-scale mapping surveys for schistosomiasis assessment: a practical introduction and documentation of field implementation

Infectious Diseases of Poverty ◽

10.1186/s40249-021-00928-y ◽

2022 ◽

Vol 11 (1) ◽

Author(s):

Lydia Trippler ◽

Mohammed Nassor Ali ◽

Shaali Makame Ame ◽

Said Mohammed Ali ◽

Fatma Kabole ◽

...

Keyword(s):

Mobile Applications ◽

Open Data ◽

Household Survey ◽

Fine Scale ◽

Urogenital Schistosomiasis ◽

Related Data ◽

Median Distance ◽

Eligible Household ◽

Fine Scale Mapping ◽

Housing Structures

Abstract Background Fine-scale mapping of schistosomiasis to guide micro-targeting of interventions will gain importance in elimination settings, where the heterogeneity of transmission is often pronounced. Novel mobile applications offer new opportunities for disease mapping. We provide a practical introduction and documentation of the strengths and shortcomings of GPS-based household identification and participant recruitment using tablet-based applications for fine-scale schistosomiasis mapping at sub-district level in a remote area in Pemba, Tanzania. Methods A community-based household survey for urogenital schistosomiasis assessment was conducted from November 2020 until February 2021 in 20 small administrative areas in Pemba. For the survey, 1400 housing structures were prospectively and randomly selected from shapefile data. To identify pre-selected structures and collect survey-related data, field enumerators searched for the houses’ geolocation using the mobile applications Open Data Kit (ODK) and MAPS.ME. The number of inhabited and uninhabited structures, the median distance between the pre-selected and recorded locations, and the dropout rates due to non-participation or non-submission of urine samples of sufficient volume for schistosomiasis testing was assessed. Results Among the 1400 randomly selected housing structures, 1396 (99.7%) were identified by the enumerators. The median distance between the pre-selected and recorded structures was 5.4 m. A total of 1098 (78.7%) were residential houses. Among them, 99 (9.0%) were dropped due to continuous absence of residents and 40 (3.6%) households refused to participate. In 797 (83.1%) among the 959 participating households, all eligible household members or all but one provided a urine sample of sufficient volume. Conclusions The fine-scale mapping approach using a combination of ODK and an offline navigation application installed on tablet computers allows a very precise identification of housing structures. Dropouts due to non-residential housing structures, absence, non-participation and lack of urine need to be considered in survey designs. Our findings can guide the planning and implementation of future household-based mapping or longitudinal surveys and thus support micro-targeting and follow-up of interventions for schistosomiasis control and elimination in remote areas. Trial registration ISRCTN, ISCRCTN91431493. Registered 11 February 2020, https://www.isrctn.com/ISRCTN91431493

Download Full-text

Handling qualitative preferences in SPARQL over virtual ontology-based data access

Semantic Web ◽

10.3233/sw-212895 ◽

2022 ◽

pp. 1-24

Author(s):

Marlene Goncalves ◽

David Chaves-Fraga ◽

Oscar Corcho

Keyword(s):

Open Data ◽

Scoring Function ◽

Data Access ◽

Heterogeneous Data ◽

Database Management System ◽

Query Complexity ◽

Distribution Data ◽

Preference Queries ◽

Preference Criteria ◽

Qualitative Preferences

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

Download Full-text

An Operational Analysis Ready Radar Backscatter Dataset for the African Continent

Remote Sensing ◽

10.3390/rs14020351 ◽

2022 ◽

Vol 14 (2) ◽

pp. 351

Author(s):

Fang Yuan ◽

Marko Repse ◽

Alex Leith ◽

Ake Rosenqvist ◽

Grega Milcinski ◽

...

Keyword(s):

Open Data ◽

Data Cube ◽

Observation Data ◽

Cloud Infrastructure ◽

Resource Monitoring ◽

Radar Backscatter ◽

Digital Earth ◽

Continental Scale ◽

Wide Range ◽

Natural Resource Monitoring

Digital Earth Africa is now providing an operational Sentinel-1 normalized radar backscatter dataset for Africa. This is the first free and open continental scale analysis ready data of this kind that has been developed to be compliant with the CEOS Analysis Ready Data for Land (CARD4L) specification for normalized radar backscatter (NRB) products. Partnership with Sinergise, a European geospatial company and Earth observation data provider, has ensured this dataset is produced efficiently in the cloud infrastructure and can be sustained in the long term. The workflow applies radiometric terrain correction (RTC) to the Sentinel-1 ground range detected (GRD) product, using the Copernicus 30 m digital elevation model (DEM). The method is used to generate data for a range of sites around the world and has been validated as producing good results. This dataset over Africa is made available publicly as a AWS public dataset and can be accessed through the Digital Earth Africa platform and its Open Data Cube API. We expect this dataset to support a wide range of applications, including natural resource monitoring, agriculture, and land cover mapping across Africa.

Download Full-text

open data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Unifying telescope and microscope: A multi-lens framework with open data for modeling emerging events

Water in the West: Trends, production efficiency, and a call for open data

Automated Annotations for AI Data and Model Transparency

An open science and open data approach for the statistically robust estimation of forest disturbance areas

‘Trust Us’: Open Data and Preregistration in Political Science and International Relations

The LCA Commons—How an Open-Source Repository for US Federal Life Cycle Assessment (LCA) Data Products Advances Inter-Agency Coordination

Digitalization of culturally significant buildings: ensuring high-quality data exchanges in the heritage domain using OpenBIM

GPS-based fine-scale mapping surveys for schistosomiasis assessment: a practical introduction and documentation of field implementation

Handling qualitative preferences in SPARQL over virtual ontology-based data access

An Operational Analysis Ready Radar Backscatter Dataset for the African Continent

Export Citation Format

open dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Unifying telescope and microscope: A multi-lens framework with open data for modeling emerging events

Water in the West: Trends, production efficiency, and a call for open data

Automated Annotations for AI Data and Model Transparency

An open science and open data approach for the statistically robust estimation of forest disturbance areas

‘Trust Us’: Open Data and Preregistration in Political Science and International Relations

The LCA Commons—How an Open-Source Repository for US Federal Life Cycle Assessment (LCA) Data Products Advances Inter-Agency Coordination

Digitalization of culturally significant buildings: ensuring high-quality data exchanges in the heritage domain using OpenBIM

GPS-based fine-scale mapping surveys for schistosomiasis assessment: a practical introduction and documentation of field implementation

Handling qualitative preferences in SPARQL over virtual ontology-based data access

An Operational Analysis Ready Radar Backscatter Dataset for the African Continent

open data
Recently Published Documents