Automated Annotations for AI Data and Model Transparency

Saravanan Thirumuruganathan; Mayuresh Kunjir; Mourad Ouzzani; Sanjay Chawla

doi:10.1145/3460000

Automated Annotations for AI Data and Model Transparency

Journal of Data and Information Quality ◽

10.1145/3460000 ◽

2022 ◽

Vol 14 (1) ◽

pp. 1-9

Author(s):

Saravanan Thirumuruganathan ◽

Mayuresh Kunjir ◽

Mourad Ouzzani ◽

Sanjay Chawla

Keyword(s):

Open Data ◽

Ease Of Use ◽

Quality Data ◽

Key Factors ◽

Data Governance ◽

Policy Compliance ◽

The Public ◽

Use Of Data ◽

Challenges And Opportunities ◽

Data Transparency

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

Get full-text (via PubEx)

Using CMS Open Data in research – challenges and directions

EPJ Web of Conferences ◽

10.1051/epjconf/202125101004 ◽

2021 ◽

Vol 251 ◽

pp. 01004

Author(s):

Kati Lassila-Perini ◽

Clemens Lange ◽

Edgar Carrera Jarrin ◽

Matthew Bellis

Keyword(s):

Open Data ◽

Public Domain ◽

Research Quality ◽

Quality Data ◽

Particle Collisions ◽

The Public ◽

Cms Experiment ◽

Research Challenges ◽

Almost All

The CMS experiment at CERN has released research-quality data from particle collisions at the LHC since 2014. Almost all data from the first LHC run in 2010–2012 with the corresponding simulated samples are now in the public domain, and several scientific studies have been performed using these data. This paper summarizes the available data and tools, reviews the challenges in using them in research, and discusses measures to improve their usability.

Get full-text (via PubEx)

Making Data Visible in Public Space

McGill GLSA Research Series ◽

10.26443/glsars.v1i1.120 ◽

2021 ◽

Vol 1 (1) ◽

pp. 1-32

Author(s):

Sage Cammers-Goodwin ◽

Naomi Van Stralen

Keyword(s):

Data Collection ◽

Public Space ◽

Final Section ◽

Smart Cities ◽

Open Data ◽

Personal Data ◽

Literacy Education ◽

The Public ◽

Data Transparency ◽

Global Initiatives

“Transparency” is continually set as a core value for cities as they digitalize. Global initiatives and regulations claim that transparency will be key to making smart cities ethical. Unfortunately, how exactly to achieve a transparent city is quite opaque. Current regulations often only mandate that information be made accessible in the case of personal data collection. While such standards might encourage anonymization techniques, they do not enforce that publicly collected data be made publicly visible or an issue of public concern. This paper covers three main needs for data transparency in public space. The first, why data visibility is important, sets the stage for why transparency cannot solely be based on personal as opposed to anonymous data collection as well as what counts as making data transparent. The second concern, how to make data visible onsite, addresses the issue of how to create public space that communicates its sensing capabilities without overwhelming the public. The final section, what regulations are necessary for data visibility, argues that for a transparent public space government needs to step in to regulate contextual open data sharing, data registries, signage, and data literacy education.

Get full-text (via PubEx)

Community Prosecution and Building Trust Across a Racial Divide

The Oxford Handbook of Prosecutors and Prosecution ◽

10.1093/oxfordhb/9780190905422.013.21 ◽

2021 ◽

pp. 412-428

Author(s):

Ronald F. Wright

Keyword(s):

Minority Groups ◽

Criminal Proceedings ◽

Negative Experiences ◽

The Public ◽

Use Of Data ◽

History Of ◽

Data Transparency ◽

Racial Divide ◽

Relationship Of ◽

Criminal Courtroom

Community prosecution seeks input from local groups to shape the priorities of the prosecutor’s office. Prosecutors who listen to the community aim to develop a relationship of trust between the community and the local prosecutor’s office; such outreach is especially valuable in connection with racial minority groups with a history of negative experiences with criminal justice actors. A community prosecution strategy calls for the office to work with community partners both upstream and downstream from the criminal courtroom. The upstream efforts involve diversion of defendants out of criminal proceedings and into treatment and accountability programs outside the courts. Downstream efforts include programs to promote the smooth re-entry of people returning to the community after serving a criminal sentence. Community prosecution is best accomplished in offices committed to collection and use of data, transparency, and accountability to the public.

Get full-text (via PubEx)

Open Data and Evidence-based Socio-economic Policy Research in India: An overview

The Journal of Community Informatics ◽

10.15353/joci.v12i2.3224 ◽

2016 ◽

Vol 12 (2) ◽

Author(s):

Aurelie Larquemin ◽

Jyoti Prasad Mukhopadhyay ◽

Sharon Buteau

Keyword(s):

Good Governance ◽

Policy Research ◽

Open Data ◽

Quality Data ◽

Current Status ◽

Evidence Based ◽

Public Authorities ◽

The Public ◽

Open Government Data ◽

Government Data

Public entities are one of the main producers of socio-economic data around the world. The Open Government Data (OGD) movement encourages these entities to make their data publicly available in order to improve transparency and accountability, which may lead to good governance. Thus, OGD can promote evidence-based public policy by supporting empirical research through making quality data available. Hence, in this paper we discuss the current status of OGD initiative in India, how its principles are considered and applied by the public authorities, and the feedback of the research community about OGD in India. Les institutions publiques sont parmi les principaux producteurs de données socio-économiques. Le mouvement « Données Gouvernementales ouvertes » les encourage et assiste parfois dans la mise à disposition de leurs données au public, pour améliorer la transparence, ce qui peut conduire à une meilleure gouvernance. Ainsi, les données ouvertes gouvernementales peuvent conduire à de meilleures politiques publiques basées sur leurs résultats en soutenant la recherche par la publication de données de qualité. Ce document traite de la situation des données ouvertes en Inde, leur publication et usage par les institutions publiques et par la communauté de recherche. Las instituciones públicas son los principales productores de datos socio-económicos. El movimiento de " datos gubernamentales abiertos" alienta estas entidades de poner sus datos a disposición del público para mejorar la transparencia, y la gobernanza. Por lo tanto los datos gubernamentales abiertos pueden promover políticas públicas basadas en evidencia, mediante el apoyo a la investigación empírica a través de hacer datos de calidad disponibles. En este trabajo se discute lo que es la realidad de los datos gubernamentales abiertos en la India, cómo sus principios están consideradas y aplicadas por las autoridades públicas y la comunidad de investigación.

Get full-text (via PubEx)

Open data for official statistics: History, principles, and implementation

Statistical Journal of the IAOS ◽

10.3233/sji-200761 ◽

2020 ◽

pp. 1-21

Author(s):

Shaida Badiee ◽

Jamison Crowell ◽

Lorenz Noe ◽

Amelia Pittman ◽

Caleb Rudow ◽

...

Keyword(s):

Open Data ◽

Full Potential ◽

Data Governance ◽

Official Statistics ◽

Maturity Models ◽

The Public ◽

Statistical Systems

For data that are collected and managed by national statistical offices to reach their full potential and benefit to society, they must be made available to the public as open data. In the simplest terms, open data are data that can be freely used, modified, and shared by anyone for any purpose. This paper reviews the development of standards for the production and dissemination of open data. It discusses the implementation of these standards in national statistical systems and reviews tool kits, readiness assessments, and maturity models that are available to guide national statistical offices in the adoption of open data. The demand for open data has created challenges for official statistics, but it has also raised the profile of the statistical office and points to a new and expanded role as data brokers and data stewards. The paper concludes with a discussion of how open data in official statistics can be used to improve data governance.

Get full-text (via PubEx)

Representing COVID-19 information in collaborative knowledge graphs: The case of Wikidata

Semantic Web ◽

10.3233/sw-210444 ◽

2021 ◽

pp. 1-32

Author(s):

Houcemeddine Turki ◽

Mohamed Ali Hadj Taieb ◽

Thomas Shafee ◽

Tiago Lubiana ◽

Dariusz Jemielniak ◽

...

Keyword(s):

Knowledge Base ◽

Query Language ◽

Open Data ◽

General Information ◽

The Public ◽

Challenges And Opportunities ◽

The Rich ◽

Description Framework ◽

Automated Tools ◽

Collaborative Knowledge

Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenges and opportunities. Wikidata is an interdisciplinary, multilingual, open collaborative knowledge base of more than 90 million entities connected by well over a billion relationships. It acts as a web-scale platform for broader computer-supported cooperative work and linked open data, since it can be written to and queried in multiple ways in near real time by specialists, automated tools and the public. The main query language, SPARQL, is a semantic language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format. Here, we introduce four aspects of Wikidata that enable it to serve as a knowledge base for general information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The rich knowledge graph created for COVID-19 in Wikidata can be visualized, explored, and analyzed for purposes like decision support as well as educational and scholarly research.

Get full-text (via PubEx)

OPEN DATA PLATFORMS FOR A DATA COMMONS, HOW TO SUPPORT THE LIFE OF DATA BEYOND RELEASE

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2020i0.11226 ◽

2020 ◽

Author(s):

Hannah Hamilton ◽

Stefano De Paoli ◽

Anna Wilson ◽

Greg Singh

Keyword(s):

Empirical Research ◽

Qualitative Interviews ◽

Open Data ◽

The Public ◽

Challenges And Opportunities ◽

Government Data ◽

The One ◽

Project Data ◽

Data Commons

In this paper we will discuss the challenges and opportunities of infusing in open data a life beyond its original public release. Indeed, it is often unclear whether open data has a life beyond the one it was initially collected for, to the extent that some authors have even described the public reuse of government data as no more than a “myth”. We will present the results of the project Data Commons Scotland launched with the idea of creating an Internet based prototype platform for creating a trustworthy common of open data, thus facilitating a life for data beyond the one of the original producer. We will discuss the results of our empirical research for the project based on 31 qualitative interviews with a number of actors, such as data producers or citizens. Moreover, we will present the results of the co-design conducted for the design of the Data Commons Scotland platform. With the results of our analysis we will reflect on the challenges of building Internet based platforms for open data supporting the generation of a common.

Get full-text (via PubEx)

Incentive mechanisms for government officials' implementing open government data in China

Online Information Review ◽

10.1108/oir-05-2020-0154 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Han Zhang ◽

Ying Bi ◽

Fei Kang ◽

Zhong Wang

Keyword(s):

Open Data ◽

Open Government ◽

Senior Executives ◽

Key Factors ◽

Government Officials ◽

Content Type ◽

The Public ◽

Open Government Data ◽

Data Policy ◽

Government Data

PurposeThe purpose of this paper is to analyze the factors influencing the behaviors of government officials during the implementation of open government data (OGD). By identifying and understanding the key factors that determine government officials' adoption of OGD in China, this study can create a valuable reference for other countries and their decision-making regarding government implementation of OGD.Design/methodology/approachThis research collected data by in-depth interviews with government officials in Chinese OGD departments. Through these interviews, the authors consulted 15 administrators from departments that are responsible for the information tasks in Beijing and other cities on their opinions about OGD. The authors also interviewed senior executives from information technology (IT) companies, as well as open data policy scholars from big data alliance and research institutions.FindingsThis paper provides insights about how to promote government officials in OGD implementation, including (1) strengthen social supervision for the environment, through developing and publishing OGD technology roadmaps, then attracting the public to actively participate in the implementing of OGD; (2) establish an OGD assessment mechanism for government officials, with bonus motivations, position promotion incentives, as well as spiritual incentives via regional or sector rankings; (3) alleviate the risks of officials' OGD decisions in actual practice, using the institution construction of OGD to guide its direction and strengthen security protection.Originality/valueThis paper fulfils an identified need to study how government officials' behavior can be motivated on OGD implementation.

Get full-text (via PubEx)

Openwashing: A decoupling perspective on organizational transparency

10.31235/osf.io/w9h8d ◽

2020 ◽

Author(s):

Maximilian Heimstädt

Keyword(s):

Open Data ◽

New Institutional Theory ◽

The Public ◽

Public Expectations ◽

New Information ◽

Informational Control ◽

Technological Developments ◽

Internal Information ◽

Data Transparency ◽

Organizational Transparency

With the rise of digital technologies, organizations are able to produce, process, and transfer large amounts of information at marginal cost. In recent years, these technological developments together with other macro-phenomena like globalization and rising distrust of institutions has led to unprecedented public expectations regarding organizational transparency. In this study I explore the ways in which organizations resolve the tension between a growing norm to share internal information with the public and their inherent preferences for informational control. Through developing the notion of transparency decoupling, I examine how organizations respond strategically to transparency expectations. Drawing on studies of “open data” transparency initiatives in NYC, London, and Berlin, I inductively carve out three modes of institutional information decoupling: (a) selecting the disclosed information to exclude parts of the data or parts of the audience; (b) bending the information in order to retain some control over its representative value; (c) orchestrating new information for a particular audience. The article integrates literature from New Institutional Theory and Transparency Studies in order to contribute to our understanding of how information sharing is realized in the interaction between organizations and their environment.

Get full-text (via PubEx)

The Unexploitable Smartness of Open Data

Sustainability ◽

10.3390/su13158239 ◽

2021 ◽

Vol 13 (15) ◽

pp. 8239

Author(s):

Grazia Concilio ◽

Francesco Molinari

Keyword(s):

Value Creation ◽

Knowledge Creation ◽

Market Failure ◽

Economic Value ◽

Open Data ◽

Average Quality ◽

The Public ◽

Use Of Data ◽

Data Ownership ◽

Government Data

The paper identifies a contradiction between data openness and economic value, possibly hiding a ‘market failure’ requiring a more active intervention from the public hand. Though the sheer quantity of data available for free usage is steadily increasing worldwide, its average quality usually stays well below the minimum threshold required for value creation. In contrast, there is now growing evidence that the use of data has enormous potential for the economy and society, including research and the progress of science. Unfortunately, useful datasets are usually locked in and when actually made accessible, suffer from the same limitations mentioned before. Maybe the time is ripe to undervalue the generalized disclosure of government data in favor of an appropriately incentivized and targeted creation of actionable bases of new IT applications. We present four cases touching upon the issues and potentials of service design, urban innovation, and data-related policies. We identify two possible ways of tackling the highlighted market failure: direct subsidies to government bodies or agencies engaged in disclosing their own datasets and keeping them clean and accessible over time or new regulations that establish more productive data ecosystems, rewarding knowledge creation rather than mere data ownership.

Get full-text (via PubEx)