Automated Annotations for AI Data and Model Transparency

2022 ◽  
Vol 14 (1) ◽  
pp. 1-9
Author(s):  
Saravanan Thirumuruganathan ◽  
Mayuresh Kunjir ◽  
Mourad Ouzzani ◽  
Sanjay Chawla

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

2021 ◽  
Vol 251 ◽  
pp. 01004
Author(s):  
Kati Lassila-Perini ◽  
Clemens Lange ◽  
Edgar Carrera Jarrin ◽  
Matthew Bellis

The CMS experiment at CERN has released research-quality data from particle collisions at the LHC since 2014. Almost all data from the first LHC run in 2010–2012 with the corresponding simulated samples are now in the public domain, and several scientific studies have been performed using these data. This paper summarizes the available data and tools, reviews the challenges in using them in research, and discusses measures to improve their usability.


Author(s):  
Ronald F. Wright

Community prosecution seeks input from local groups to shape the priorities of the prosecutor’s office. Prosecutors who listen to the community aim to develop a relationship of trust between the community and the local prosecutor’s office; such outreach is especially valuable in connection with racial minority groups with a history of negative experiences with criminal justice actors. A community prosecution strategy calls for the office to work with community partners both upstream and downstream from the criminal courtroom. The upstream efforts involve diversion of defendants out of criminal proceedings and into treatment and accountability programs outside the courts. Downstream efforts include programs to promote the smooth re-entry of people returning to the community after serving a criminal sentence. Community prosecution is best accomplished in offices committed to collection and use of data, transparency, and accountability to the public.


2021 ◽  
Vol 1 (1) ◽  
pp. 1-32
Author(s):  
Sage Cammers-Goodwin ◽  
Naomi Van Stralen

“Transparency” is continually set as a core value for cities as they digitalize. Global initiatives and regulations claim that transparency will be key to making smart cities ethical. Unfortunately, how exactly to achieve a transparent city is quite opaque. Current regulations often only mandate that information be made accessible in the case of personal data collection. While such standards might encourage anonymization techniques, they do not enforce that publicly collected data be made publicly visible or an issue of public concern. This paper covers three main needs for data transparency in public space. The first, why data visibility is important, sets the stage for why transparency cannot solely be based on personal as opposed to anonymous data collection as well as what counts as making data transparent. The second concern, how to make data visible onsite, addresses the issue of how to create public space that communicates its sensing capabilities without overwhelming the public. The final section, what regulations are necessary for data visibility, argues that for a transparent public space government needs to step in to regulate contextual open data sharing, data registries, signage, and data literacy education.  


2020 ◽  
pp. 1-21
Author(s):  
Shaida Badiee ◽  
Jamison Crowell ◽  
Lorenz Noe ◽  
Amelia Pittman ◽  
Caleb Rudow ◽  
...  

For data that are collected and managed by national statistical offices to reach their full potential and benefit to society, they must be made available to the public as open data. In the simplest terms, open data are data that can be freely used, modified, and shared by anyone for any purpose. This paper reviews the development of standards for the production and dissemination of open data. It discusses the implementation of these standards in national statistical systems and reviews tool kits, readiness assessments, and maturity models that are available to guide national statistical offices in the adoption of open data. The demand for open data has created challenges for official statistics, but it has also raised the profile of the statistical office and points to a new and expanded role as data brokers and data stewards. The paper concludes with a discussion of how open data in official statistics can be used to improve data governance.


Semantic Web ◽  
2021 ◽  
pp. 1-32
Author(s):  
Houcemeddine Turki ◽  
Mohamed Ali Hadj Taieb ◽  
Thomas Shafee ◽  
Tiago Lubiana ◽  
Dariusz Jemielniak ◽  
...  

Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenges and opportunities. Wikidata is an interdisciplinary, multilingual, open collaborative knowledge base of more than 90 million entities connected by well over a billion relationships. It acts as a web-scale platform for broader computer-supported cooperative work and linked open data, since it can be written to and queried in multiple ways in near real time by specialists, automated tools and the public. The main query language, SPARQL, is a semantic language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format. Here, we introduce four aspects of Wikidata that enable it to serve as a knowledge base for general information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The rich knowledge graph created for COVID-19 in Wikidata can be visualized, explored, and analyzed for purposes like decision support as well as educational and scholarly research.


Author(s):  
Hannah Hamilton ◽  
Stefano De Paoli ◽  
Anna Wilson ◽  
Greg Singh

In this paper we will discuss the challenges and opportunities of infusing in open data a life beyond its original public release. Indeed, it is often unclear whether open data has a life beyond the one it was initially collected for, to the extent that some authors have even described the public reuse of government data as no more than a “myth”. We will present the results of the project Data Commons Scotland launched with the idea of creating an Internet based prototype platform for creating a trustworthy common of open data, thus facilitating a life for data beyond the one of the original producer. We will discuss the results of our empirical research for the project based on 31 qualitative interviews with a number of actors, such as data producers or citizens. Moreover, we will present the results of the co-design conducted for the design of the Data Commons Scotland platform. With the results of our analysis we will reflect on the challenges of building Internet based platforms for open data supporting the generation of a common.


2016 ◽  
Vol 12 (2) ◽  
Author(s):  
Aurelie Larquemin ◽  
Jyoti Prasad Mukhopadhyay ◽  
Sharon Buteau

Public entities are one of the main producers of socio-economic data around the world. The Open Government Data (OGD) movement encourages these entities to make their data publicly available in order to improve transparency and accountability, which may lead to good governance. Thus, OGD can promote evidence-based public policy by supporting empirical research through making quality data available. Hence, in this paper we discuss the current status of OGD initiative in India, how its principles are considered and applied by the public authorities, and the feedback of the research community about OGD in India.   Les institutions publiques sont parmi les principaux producteurs de données socio-économiques. Le mouvement « Données Gouvernementales ouvertes » les encourage et assiste parfois dans la mise à disposition de leurs données au public, pour améliorer la transparence, ce qui peut conduire à une meilleure gouvernance. Ainsi, les données ouvertes gouvernementales peuvent conduire à de meilleures politiques publiques basées sur leurs résultats en soutenant la recherche par la publication de données de qualité. Ce document traite de la situation des données ouvertes en Inde, leur publication et usage par les institutions publiques et par la communauté de recherche.   Las instituciones públicas son los principales productores de datos socio-económicos. El movimiento de " datos gubernamentales abiertos" alienta estas entidades de poner sus datos a disposición del público para mejorar la transparencia, y la gobernanza. Por lo tanto los datos gubernamentales abiertos pueden promover políticas públicas basadas en evidencia, mediante el apoyo a la investigación empírica a través de hacer datos de calidad disponibles.  En este trabajo se discute lo que es la realidad de los datos gubernamentales abiertos en la India, cómo sus principios están consideradas y aplicadas por las autoridades públicas y la comunidad de investigación.  


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Han Zhang ◽  
Ying Bi ◽  
Fei Kang ◽  
Zhong Wang

PurposeThe purpose of this paper is to analyze the factors influencing the behaviors of government officials during the implementation of open government data (OGD). By identifying and understanding the key factors that determine government officials' adoption of OGD in China, this study can create a valuable reference for other countries and their decision-making regarding government implementation of OGD.Design/methodology/approachThis research collected data by in-depth interviews with government officials in Chinese OGD departments. Through these interviews, the authors consulted 15 administrators from departments that are responsible for the information tasks in Beijing and other cities on their opinions about OGD. The authors also interviewed senior executives from information technology (IT) companies, as well as open data policy scholars from big data alliance and research institutions.FindingsThis paper provides insights about how to promote government officials in OGD implementation, including (1) strengthen social supervision for the environment, through developing and publishing OGD technology roadmaps, then attracting the public to actively participate in the implementing of OGD; (2) establish an OGD assessment mechanism for government officials, with bonus motivations, position promotion incentives, as well as spiritual incentives via regional or sector rankings; (3) alleviate the risks of officials' OGD decisions in actual practice, using the institution construction of OGD to guide its direction and strengthen security protection.Originality/valueThis paper fulfils an identified need to study how government officials' behavior can be motivated on OGD implementation.


2020 ◽  
Author(s):  
Maximilian Heimstädt

With the rise of digital technologies, organizations are able to produce, process, and transfer large amounts of information at marginal cost. In recent years, these technological developments together with other macro-phenomena like globalization and rising distrust of institutions has led to unprecedented public expectations regarding organizational transparency. In this study I explore the ways in which organizations resolve the tension between a growing norm to share internal information with the public and their inherent preferences for informational control. Through developing the notion of transparency decoupling, I examine how organizations respond strategically to transparency expectations. Drawing on studies of “open data” transparency initiatives in NYC, London, and Berlin, I inductively carve out three modes of institutional information decoupling: (a) selecting the disclosed information to exclude parts of the data or parts of the audience; (b) bending the information in order to retain some control over its representative value; (c) orchestrating new information for a particular audience. The article integrates literature from New Institutional Theory and Transparency Studies in order to contribute to our understanding of how information sharing is realized in the interaction between organizations and their environment.


2021 ◽  
Vol 13 (15) ◽  
pp. 8239
Author(s):  
Grazia Concilio ◽  
Francesco Molinari

The paper identifies a contradiction between data openness and economic value, possibly hiding a ‘market failure’ requiring a more active intervention from the public hand. Though the sheer quantity of data available for free usage is steadily increasing worldwide, its average quality usually stays well below the minimum threshold required for value creation. In contrast, there is now growing evidence that the use of data has enormous potential for the economy and society, including research and the progress of science. Unfortunately, useful datasets are usually locked in and when actually made accessible, suffer from the same limitations mentioned before. Maybe the time is ripe to undervalue the generalized disclosure of government data in favor of an appropriately incentivized and targeted creation of actionable bases of new IT applications. We present four cases touching upon the issues and potentials of service design, urban innovation, and data-related policies. We identify two possible ways of tackling the highlighted market failure: direct subsidies to government bodies or agencies engaged in disclosing their own datasets and keeping them clean and accessible over time or new regulations that establish more productive data ecosystems, rewarding knowledge creation rather than mere data ownership.


Sign in / Sign up

Export Citation Format

Share Document