open data
Recently Published Documents





2022 ◽  
Vol 306 ◽  
pp. 114330
Britta L. Schumacher ◽  
Matt A. Yost ◽  
Emily K. Burchfield ◽  
Niel Allen

2022 ◽  
Vol 14 (1) ◽  
pp. 1-9
Saravanan Thirumuruganathan ◽  
Mayuresh Kunjir ◽  
Mourad Ouzzani ◽  
Sanjay Chawla

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

Saverio Francini ◽  
Ronald E. McRoberts ◽  
Giovanni D'Amico ◽  
Nicholas C. Coops ◽  
Txomin Hermosilla ◽  

2022 ◽  
Bermond Scoggins ◽  
Matthew Peter Robertson

The scientific method is predicated on transparency -- yet the pace at which transparent research practices are being adopted by the scientific community is slow. The replication crisis in psychology showed that published findings employing statistical inference are threatened by undetected errors, data manipulation, and data falsification. To mitigate these problems and bolster research credibility, open data and preregistration have increasingly been adopted in the natural and social sciences. While many political science and international relations journals have committed to implementing these reforms, the extent of open science practices is unknown. We bring large-scale text analysis and machine learning classifiers to bear on the question. Using population-level data -- 93,931 articles across the top 160 political science and IR journals between 2010 and 2021 -- we find that approximately 21% of all statistical inference papers have open data, and 5% of all experiments are preregistered. Despite this shortfall, the example of leading journals in the field shows that change is feasible and can be effected quickly.

2022 ◽  
Vol 12 (2) ◽  
pp. 865
Ezra Kahn ◽  
Erin Antognoli ◽  
Peter Arbuckle

Life cycle assessment (LCA) is a flexible and powerful tool for quantifying the total environmental impact of a product or service from cradle-to-grave. The US federal government has developed deep expertise in environmental LCA for a range of applications including policy, regulation, and emerging technologies. LCA professionals from across the government have been coordinating the distributed LCA expertise through a community of practice known as the Federal LCA Commons. The Federal LCA Commons has developed open data infrastructure and workflows to share knowledge and align LCA methods. This data infrastructure is a key component to creating a harmonized network of LCA capacity from across the federal government.

2022 ◽  
Vol 10 (1) ◽  
Laurens Jozef Nicolaas Oostwegel ◽  
Štefan Jaud ◽  
Sergej Muhič ◽  
Katja Malovrh Rebec

AbstractCultural heritage building information models (HBIMs) incorporate specific geometric and semantic data that are mandatory for supporting the workflows and decision making during a heritage study. The Industry Foundation Classes (IFC) open data exchange standard can be used to migrate these data between different software solutions as an openBIM approach, and has the potential to mitigate data loss. Specific data-exchange scenarios can be supported by firstly developing an Information Delivery Manual (IDM) and subsequently filtering portions of the IFC schema and producing a specialized Model View Definition (MVD). This paper showcases the creation of a specialized IDM for the heritage domain in consultation with experts in the restoration and preservation of built heritage. The IDM was then translated into a pilot MVD for heritage. We tested our developments on an HBIM case study, where a historic building was semantically enriched with information about the case study’s conservation plan and then checked against the specified IDM requirements using the developed MVD. We concluded that the creation of an IDM and then the MVD for the heritage domain are achievable and will bring us one step closer to BIM standardisation in the field of digitised cultural buildings.

2022 ◽  
Vol 11 (1) ◽  
Lydia Trippler ◽  
Mohammed Nassor Ali ◽  
Shaali Makame Ame ◽  
Said Mohammed Ali ◽  
Fatma Kabole ◽  

Abstract Background Fine-scale mapping of schistosomiasis to guide micro-targeting of interventions will gain importance in elimination settings, where the heterogeneity of transmission is often pronounced. Novel mobile applications offer new opportunities for disease mapping. We provide a practical introduction and documentation of the strengths and shortcomings of GPS-based household identification and participant recruitment using tablet-based applications for fine-scale schistosomiasis mapping at sub-district level in a remote area in Pemba, Tanzania. Methods A community-based household survey for urogenital schistosomiasis assessment was conducted from November 2020 until February 2021 in 20 small administrative areas in Pemba. For the survey, 1400 housing structures were prospectively and randomly selected from shapefile data. To identify pre-selected structures and collect survey-related data, field enumerators searched for the houses’ geolocation using the mobile applications Open Data Kit (ODK) and MAPS.ME. The number of inhabited and uninhabited structures, the median distance between the pre-selected and recorded locations, and the dropout rates due to non-participation or non-submission of urine samples of sufficient volume for schistosomiasis testing was assessed. Results Among the 1400 randomly selected housing structures, 1396 (99.7%) were identified by the enumerators. The median distance between the pre-selected and recorded structures was 5.4 m. A total of 1098 (78.7%) were residential houses. Among them, 99 (9.0%) were dropped due to continuous absence of residents and 40 (3.6%) households refused to participate. In 797 (83.1%) among the 959 participating households, all eligible household members or all but one provided a urine sample of sufficient volume. Conclusions The fine-scale mapping approach using a combination of ODK and an offline navigation application installed on tablet computers allows a very precise identification of housing structures. Dropouts due to non-residential housing structures, absence, non-participation and lack of urine need to be considered in survey designs. Our findings can guide the planning and implementation of future household-based mapping or longitudinal surveys and thus support micro-targeting and follow-up of interventions for schistosomiasis control and elimination in remote areas. Trial registration ISRCTN, ISCRCTN91431493. Registered 11 February 2020,

Semantic Web ◽  
2022 ◽  
pp. 1-24
Marlene Goncalves ◽  
David Chaves-Fraga ◽  
Oscar Corcho

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

2022 ◽  
Vol 14 (2) ◽  
pp. 351
Fang Yuan ◽  
Marko Repse ◽  
Alex Leith ◽  
Ake Rosenqvist ◽  
Grega Milcinski ◽  

Digital Earth Africa is now providing an operational Sentinel-1 normalized radar backscatter dataset for Africa. This is the first free and open continental scale analysis ready data of this kind that has been developed to be compliant with the CEOS Analysis Ready Data for Land (CARD4L) specification for normalized radar backscatter (NRB) products. Partnership with Sinergise, a European geospatial company and Earth observation data provider, has ensured this dataset is produced efficiently in the cloud infrastructure and can be sustained in the long term. The workflow applies radiometric terrain correction (RTC) to the Sentinel-1 ground range detected (GRD) product, using the Copernicus 30 m digital elevation model (DEM). The method is used to generate data for a range of sites around the world and has been validated as producing good results. This dataset over Africa is made available publicly as a AWS public dataset and can be accessed through the Digital Earth Africa platform and its Open Data Cube API. We expect this dataset to support a wide range of applications, including natural resource monitoring, agriculture, and land cover mapping across Africa.

Sign in / Sign up

Export Citation Format

Share Document