access to data Latest Research Papers

Automatic Curation of Court Documents: Anonymizing Personal Data

Information ◽

10.3390/info13010027 ◽

2022 ◽

Vol 13 (1) ◽

pp. 27

Author(s):

Diego Garat ◽

Dina Wonsever

Keyword(s):

Sequence Data ◽

Personal Information ◽

Personal Data ◽

Published Data ◽

Special Focus ◽

Biological Databases ◽

Biological Sequence ◽

The Right To Privacy ◽

Access To Data ◽

The Right

In order to provide open access to data of public interest, it is often necessary to perform several data curation processes. In some cases, such as biological databases, curation involves quality control to ensure reliable experimental support for biological sequence data. In others, such as medical records or judicial files, publication must not interfere with the right to privacy of the persons involved. There are also interventions in the published data with the aim of generating metadata that enable a better experience of querying and navigation. In all cases, the curation process constitutes a bottleneck that slows down general access to the data, so it is of great interest to have automatic or semi-automatic curation processes. In this paper, we present a solution aimed at the automatic curation of our National Jurisprudence Database, with special focus on the process of the anonymization of personal information. The anonymization process aims to hide the names of the participants involved in a lawsuit without losing the meaning of the narrative of facts. In order to achieve this goal, we need, not only to recognize person names but also resolve co-references in order to assign the same label to all mentions of the same person. Our corpus has significant differences in the spelling of person names, so it was clear from the beginning that pre-existing tools would not be able to reach a good performance. The challenge was to find a good way of injecting specialized knowledge about person names syntax while taking profit of previous capabilities of pre-trained tools. We fine-tuned an NER analyzer and we built a clusterization algorithm to solve co-references between named entities. We present our first results, which, for both tasks, are promising: We obtained a 90.21% of F1-micro in the NER task—from a 39.99% score before retraining the same analyzer in our corpus—and a 95.95% ARI score in clustering for co-reference resolution.

An Explainable AI-based Intrusion Detection System for DNS over HTTPS (DoH) Attacks

10.36227/techrxiv.17696972.v1 ◽

2022 ◽

Author(s):

Tahmina Zebin ◽

Shahadate Rezvy, ◽

Yuan Luo

Keyword(s):

Detection System ◽

Accurate Solution ◽

Privacy And Security ◽

Learning Framework ◽

Internet Users ◽

Explainable Ai ◽

Secure Network ◽

Access To Data ◽

Network Administrators ◽

Gain Access

Over the past few years, Domain Name Service (DNS) remained a prime target for hackers as it enables them to gain first entry into networks and gain access to data for exfiltration. Although the DNS over HTTPS (DoH) protocol has desirable properties for internet users such as privacy and security, it also causes a problem in that network administrators are prevented from detecting suspicious network traffic generated by malware and malicious tools. To support their efforts in maintaining a secure network, in this paper, we have implemented an explainable AI solution using a novel machine learning framework. We have used the publicly available CIRA-CIC-DoHBrw-2020 dataset for developing an accurate solution to detect and classify the DNS over HTTPS attacks. Our proposed balanced and stacked Random Forest achieved very high precision (99.91\%), recall (99.92\%) and F1 score (99.91\%) for the classification task at hand. Using explainable AI methods, we have additionally highlighted the underlying feature contributions in an attempt to provide transparent and explainable results from the model.

An Explainable AI-based Intrusion Detection System for DNS over HTTPS (DoH) Attacks

10.36227/techrxiv.17696972 ◽

2022 ◽

Author(s):

Tahmina Zebin ◽

Shahadate Rezvy, ◽

Yuan Luo

Keyword(s):

Detection System ◽

Accurate Solution ◽

Privacy And Security ◽

Learning Framework ◽

Internet Users ◽

Explainable Ai ◽

Secure Network ◽

Access To Data ◽

Network Administrators ◽

Gain Access

Over the past few years, Domain Name Service (DNS) remained a prime target for hackers as it enables them to gain first entry into networks and gain access to data for exfiltration. Although the DNS over HTTPS (DoH) protocol has desirable properties for internet users such as privacy and security, it also causes a problem in that network administrators are prevented from detecting suspicious network traffic generated by malware and malicious tools. To support their efforts in maintaining a secure network, in this paper, we have implemented an explainable AI solution using a novel machine learning framework. We have used the publicly available CIRA-CIC-DoHBrw-2020 dataset for developing an accurate solution to detect and classify the DNS over HTTPS attacks. Our proposed balanced and stacked Random Forest achieved very high precision (99.91\%), recall (99.92\%) and F1 score (99.91\%) for the classification task at hand. Using explainable AI methods, we have additionally highlighted the underlying feature contributions in an attempt to provide transparent and explainable results from the model.

Data Mining, Linked Data, and Library Service Delivery

10.4018/978-1-7998-9094-2.ch005 ◽

2022 ◽

pp. 60-72

Author(s):

Blessing Babawale Amusan ◽

Adepero Olajumoke Odumade

Keyword(s):

Data Mining ◽

Service Delivery ◽

Linked Data ◽

Service Improvement ◽

Text And Image ◽

Library Service ◽

New Knowledge ◽

Library Resources ◽

Access To Data ◽

Effective Service

There is no doubt that data mining and linked data can enhance library service delivery. Data mining aspects such as text and image mining will enable libraries to have access to data that can be used to discover new knowledge aid planning for effective service delivery or service improvement. Also, linked data will enable libraries connect with other libraries to share such data that can enhance job performance leading to enhanced productivity, improved service delivery, and wider visibility and access to library resources.

Doing Research With Online Platforms

10.4018/978-1-7998-8473-6.ch006 ◽

2022 ◽

pp. 65-86

Author(s):

Francesco Marrazzo

Keyword(s):

Public Sphere ◽

Research Community ◽

Public Institutions ◽

Civil Society Organizations ◽

Technical Expertise ◽

Digital Ecosystem ◽

Research Activities ◽

Online Platforms ◽

Issue Network ◽

Access To Data

The post-API age in digital research has brought immediate consequences in research activities based on (big) data owned by online platforms. Even some initiatives made by online platforms themselves, mainly based on funding specific research projects, have not found a warm reception in the research community and have been considered not enough to do research on the most relevant phenomena of the digital public sphere. Therefore, since the access-to-data has become a relevant issue even for civil society organizations and public actors dealing with digital ecosystem, a specific brand-new issue network among public institutions, NGOs, and researches has been established. The technical expertise, the shared interests, and the fulfilment of similar goals in shaping public values in the online platforms activities seem to be crucial to the permanence and even to the institutionalization of such an issue network.

Quality Assurance Issues for Big Data Applications in Supply Chain Management

10.4018/978-1-6684-3702-5.ch070 ◽

2022 ◽

pp. 1458-1483

Author(s):

Kamalendu Pal

Keyword(s):

Quality Assurance ◽

Big Data ◽

Supply Chain ◽

Software Testing ◽

Quality Factors ◽

Testing Methods ◽

Multiple Sources ◽

Huge Data ◽

Alliance Partners ◽

Access To Data

Heterogeneous data types, widely distributed data sources, huge data volumes, and large-scale business-alliance partners describe typical global supply chain operational environments. Mobile and wireless technologies are putting an extra layer of data source in this technology-enriched supply chain operation. This environment also needs to provide access to data anywhere, anytime to its end-users. This new type of data set originating from the global retail supply chain is commonly known as big data because of its huge volume, resulting from the velocity with which it arrives in the global retail business environment. Such environments empower and necessitate decision makers to act or react quicker to all decision tasks. Academics and practitioners are researching and building the next generation of big-data-based application software systems. This new generation of software applications is based on complex data analysis algorithms (i.e., on data that does not adhere to standard relational data models). The traditional software testing methods are insufficient for big-data-based applications. Testing big-data-based applications is one of the biggest challenges faced by modern software design and development communities because of lack of knowledge on what to test and how much data to test. Big-data-based applications developers have been facing a daunting task in defining the best strategies for structured and unstructured data validation, setting up an optimal test environment, and working with non-relational databases testing approaches. This chapter focuses on big-data-based software testing and quality-assurance-related issues in the context of Hadoop, an open source framework. It includes discussion about several challenges with respect to massively parallel data generation from multiple sources, testing methods for validation of pre-Hadoop processing, software application quality factors, and some of the software testing mechanisms for this new breed of applications

A Further Information about Compilation of and Access to Data Sources

Painting by Numbers ◽

10.1515/9780691214948-007 ◽

2021 ◽

pp. 157-158

Keyword(s):

Data Sources ◽

Access To Data

BONNIE: Building Online Narratives from Noteworthy Interaction Events

ACM Transactions on Interactive Intelligent Systems ◽

10.1145/3423048 ◽

2021 ◽

Vol 11 (3-4) ◽

pp. 1-31

Author(s):

VinÍcius Segura ◽

Simone D. J. Barbosa

Keyword(s):

Data Analysis ◽

Technology Acceptance ◽

Visual Analytics ◽

User Interaction ◽

Essential Information ◽

Exploratory Data ◽

Acceptance Model ◽

Access To Data ◽

Visualization Techniques ◽

Interaction History

Nowadays, we have access to data of unprecedented volume, high dimensionality, and complexity. To extract novel insights from such complex and dynamic data, we need effective and efficient strategies. One such strategy is to combine data analysis and visualization techniques, which are the essence of visual analytics applications. After the knowledge discovery process, a major challenge is to filter the essential information that has led to a discovery and to communicate the findings to other people, explaining the decisions they may have made based on the data. We propose to record and use the trace left by the exploratory data analysis, in the form of user interaction history, to aid this process. With the trace, users can choose the desired interaction steps and create a narrative, sharing the acquired knowledge with readers. To achieve our goal, we have developed the BONNIE ( Building Online Narratives from Noteworthy Interaction Events ) framework. BONNIE comprises a log model to register the interaction events, auxiliary code to help developers instrument their own code, and an environment to view users’ own interaction history and build narratives. This article presents our proposal for communicating discoveries in visual analytics applications, the BONNIE framework, and the studies we conducted to evaluate our solution. After two user studies (the first one focused on history visualization and the second one focused on narrative creation), our solution has showed to be promising, with mostly positive feedback and results from a Technology Acceptance Model ( TAM ) questionnaire.

IMPACT OF MODERN TECHNOLOGIES ON FREE MOVEMENT OF EVIDENCE IN EUROPEAN UNION

Journal of Criminology and Criminal Law ◽

10.47152/rkkp.59.3.6 ◽

2021 ◽

Vol 59 (3) ◽

pp. 123-140

Author(s):

Marina Matić Bošković ◽

Keyword(s):

Service Providers ◽

Fair Trial ◽

Fundamental Rights ◽

Private Company ◽

Electronic Evidence ◽

Production Order ◽

Access To Data ◽

Proposed Regulation ◽

The Right ◽

Eu Fundamental Rights

According to the estimate of the EU Commission 85 percent of criminal investigations require electronic evidence, while in almost two thirds (65 percent) of the investigations where e-evidence is relevant. Investigation and prosecution of crime increasingly relies on the possibility to have access to data held by service providers, as private company. Modern criminal investigation and use of electronic evidence imposes challenges to the right to fair trial and rule of law standards. The paper identifies benefits and challenges of proposed EU instruments for facilitating e-evidence. The European Commission proposed Regulation of Production Order and Preservation Order with the aim to facilitate access to relevant data stored by service providers. The paper recognizes shortcomings of the proposed Regulation. The biggest challenge is lack of judicial oversight of orders, as a guarantee of fair trial. The paper includes recommendations and policy options for promoting judicial system for cross border access and collection of electronic data in line with EU fundamental rights standards.

Advisors’ Perceptions of the Value of Advising Students During the COVID-19 Pandemic: A Case Study at a South African University

Journal of Student Affairs in Africa ◽

10.24085/jsaa.v9i2.3699 ◽

2021 ◽

Vol 9 (2) ◽

pp. 65-83

Author(s):

Raazia Moosa

Keyword(s):

South African ◽

Social Inequalities ◽

Holistic Approach ◽

Shared Responsibility ◽

Central Research ◽

Developmental Approach ◽

Peer Advisors ◽

Access To Data ◽

Remote Learning

Traditional advising responsibilities are shifting to include a holistic, learning-based and developmental approach that favours advising of the entire university experience. A dearth of systematic empirical evidence exists on advisors’ perceptions of the value of advising students during the COVID-19 pandemic in the South African context. The purpose of this study is to elucidate advisors’ perceptions of the complexity and challenges inherent in their responsibilities during the pandemic. This case study draws on a qualitative research design; it is based on semi-structured in-depth interviews undertaken with nine advisors in 2020. The central research questions posed in this study are: how do advisors describe their perceptions of their responsibilities within the COVID-19 pandemic, and how might these contribute to future practices? The findings indicate that advising during the pandemic has transcended the typical transactional dissemination of information to include addressing contextual environmental and resource challenges, social justice imperatives, emergency remote learning, asynchronous advising challenges and data-informed advising. These responsibilities have encompassed a holistic approach to advising and to getting to know students as ‘whole people’. Adjustments and transitions to emergency remote learning have highlighted social inequalities in access to data, to internet and electricity connectivity, which have served as impediments to students’ learning, and to educational experiences. Some home environments were not conducive to studying but necessitated doing household chores and herding cattle. The findings also indicate that an institution’s advising delivery model should enhance advisors’ abilities to perform their responsibilities. A network of cascaded responsibilities that incorporates greater involvement of lecturers in advising could contribute to a shared responsibility between lecturers and central, faculty and peer advisors. Insights gained may lead to a more nuanced understanding of advisors’ responsibilities as they relate to student learning and to the overall educational experience to promote retention and student success in a post-pandemic era.

access to data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automatic Curation of Court Documents: Anonymizing Personal Data

An Explainable AI-based Intrusion Detection System for DNS over HTTPS (DoH) Attacks

An Explainable AI-based Intrusion Detection System for DNS over HTTPS (DoH) Attacks

Data Mining, Linked Data, and Library Service Delivery

Doing Research With Online Platforms

Quality Assurance Issues for Big Data Applications in Supply Chain Management

A Further Information about Compilation of and Access to Data Sources

BONNIE: Building Online Narratives from Noteworthy Interaction Events

IMPACT OF MODERN TECHNOLOGIES ON FREE MOVEMENT OF EVIDENCE IN EUROPEAN UNION

Advisors’ Perceptions of the Value of Advising Students During the COVID-19 Pandemic: A Case Study at a South African University

Export Citation Format

access to dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automatic Curation of Court Documents: Anonymizing Personal Data

An Explainable AI-based Intrusion Detection System for DNS over HTTPS (DoH) Attacks

An Explainable AI-based Intrusion Detection System for DNS over HTTPS (DoH) Attacks

Data Mining, Linked Data, and Library Service Delivery

Doing Research With Online Platforms

Quality Assurance Issues for Big Data Applications in Supply Chain Management

A Further Information about Compilation of and Access to Data Sources

BONNIE: Building Online Narratives from Noteworthy Interaction Events

IMPACT OF MODERN TECHNOLOGIES ON FREE MOVEMENT OF EVIDENCE IN EUROPEAN UNION

Advisors’ Perceptions of the Value of Advising Students During the COVID-19 Pandemic: A Case Study at a South African University

access to data
Recently Published Documents