access to data
Recently Published Documents


TOTAL DOCUMENTS

955
(FIVE YEARS 487)

H-INDEX

20
(FIVE YEARS 5)

Information ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 27
Author(s):  
Diego Garat ◽  
Dina Wonsever

In order to provide open access to data of public interest, it is often necessary to perform several data curation processes. In some cases, such as biological databases, curation involves quality control to ensure reliable experimental support for biological sequence data. In others, such as medical records or judicial files, publication must not interfere with the right to privacy of the persons involved. There are also interventions in the published data with the aim of generating metadata that enable a better experience of querying and navigation. In all cases, the curation process constitutes a bottleneck that slows down general access to the data, so it is of great interest to have automatic or semi-automatic curation processes. In this paper, we present a solution aimed at the automatic curation of our National Jurisprudence Database, with special focus on the process of the anonymization of personal information. The anonymization process aims to hide the names of the participants involved in a lawsuit without losing the meaning of the narrative of facts. In order to achieve this goal, we need, not only to recognize person names but also resolve co-references in order to assign the same label to all mentions of the same person. Our corpus has significant differences in the spelling of person names, so it was clear from the beginning that pre-existing tools would not be able to reach a good performance. The challenge was to find a good way of injecting specialized knowledge about person names syntax while taking profit of previous capabilities of pre-trained tools. We fine-tuned an NER analyzer and we built a clusterization algorithm to solve co-references between named entities. We present our first results, which, for both tasks, are promising: We obtained a 90.21% of F1-micro in the NER task—from a 39.99% score before retraining the same analyzer in our corpus—and a 95.95% ARI score in clustering for co-reference resolution.


2022 ◽  
Author(s):  
Tahmina Zebin ◽  
Shahadate Rezvy, ◽  
Yuan Luo

Over the past few years, Domain Name Service (DNS) remained a prime target for hackers as it enables them to gain first entry into networks and gain access to data for exfiltration. Although the DNS over HTTPS (DoH) protocol has desirable properties for internet users such as privacy and security, it also causes a problem in that network administrators are prevented from detecting suspicious network traffic generated by malware and malicious tools. To support their efforts in maintaining a secure network, in this paper, we have implemented an explainable AI solution using a novel machine learning framework. We have used the publicly available CIRA-CIC-DoHBrw-2020 dataset for developing an accurate solution to detect and classify the DNS over HTTPS attacks. Our proposed balanced and stacked Random Forest achieved very high precision (99.91\%), recall (99.92\%) and F1 score (99.91\%) for the classification task at hand. Using explainable AI methods, we have additionally highlighted the underlying feature contributions in an attempt to provide transparent and explainable results from the model.


2022 ◽  
Author(s):  
Tahmina Zebin ◽  
Shahadate Rezvy, ◽  
Yuan Luo

Over the past few years, Domain Name Service (DNS) remained a prime target for hackers as it enables them to gain first entry into networks and gain access to data for exfiltration. Although the DNS over HTTPS (DoH) protocol has desirable properties for internet users such as privacy and security, it also causes a problem in that network administrators are prevented from detecting suspicious network traffic generated by malware and malicious tools. To support their efforts in maintaining a secure network, in this paper, we have implemented an explainable AI solution using a novel machine learning framework. We have used the publicly available CIRA-CIC-DoHBrw-2020 dataset for developing an accurate solution to detect and classify the DNS over HTTPS attacks. Our proposed balanced and stacked Random Forest achieved very high precision (99.91\%), recall (99.92\%) and F1 score (99.91\%) for the classification task at hand. Using explainable AI methods, we have additionally highlighted the underlying feature contributions in an attempt to provide transparent and explainable results from the model.


2022 ◽  
pp. 60-72
Author(s):  
Blessing Babawale Amusan ◽  
Adepero Olajumoke Odumade

There is no doubt that data mining and linked data can enhance library service delivery. Data mining aspects such as text and image mining will enable libraries to have access to data that can be used to discover new knowledge aid planning for effective service delivery or service improvement. Also, linked data will enable libraries connect with other libraries to share such data that can enhance job performance leading to enhanced productivity, improved service delivery, and wider visibility and access to library resources.


2022 ◽  
pp. 65-86
Author(s):  
Francesco Marrazzo

The post-API age in digital research has brought immediate consequences in research activities based on (big) data owned by online platforms. Even some initiatives made by online platforms themselves, mainly based on funding specific research projects, have not found a warm reception in the research community and have been considered not enough to do research on the most relevant phenomena of the digital public sphere. Therefore, since the access-to-data has become a relevant issue even for civil society organizations and public actors dealing with digital ecosystem, a specific brand-new issue network among public institutions, NGOs, and researches has been established. The technical expertise, the shared interests, and the fulfilment of similar goals in shaping public values in the online platforms activities seem to be crucial to the permanence and even to the institutionalization of such an issue network.


2022 ◽  
pp. 1458-1483
Author(s):  
Kamalendu Pal

Heterogeneous data types, widely distributed data sources, huge data volumes, and large-scale business-alliance partners describe typical global supply chain operational environments. Mobile and wireless technologies are putting an extra layer of data source in this technology-enriched supply chain operation. This environment also needs to provide access to data anywhere, anytime to its end-users. This new type of data set originating from the global retail supply chain is commonly known as big data because of its huge volume, resulting from the velocity with which it arrives in the global retail business environment. Such environments empower and necessitate decision makers to act or react quicker to all decision tasks. Academics and practitioners are researching and building the next generation of big-data-based application software systems. This new generation of software applications is based on complex data analysis algorithms (i.e., on data that does not adhere to standard relational data models). The traditional software testing methods are insufficient for big-data-based applications. Testing big-data-based applications is one of the biggest challenges faced by modern software design and development communities because of lack of knowledge on what to test and how much data to test. Big-data-based applications developers have been facing a daunting task in defining the best strategies for structured and unstructured data validation, setting up an optimal test environment, and working with non-relational databases testing approaches. This chapter focuses on big-data-based software testing and quality-assurance-related issues in the context of Hadoop, an open source framework. It includes discussion about several challenges with respect to massively parallel data generation from multiple sources, testing methods for validation of pre-Hadoop processing, software application quality factors, and some of the software testing mechanisms for this new breed of applications


2021 ◽  
Vol 11 (3-4) ◽  
pp. 1-31
Author(s):  
VinÍcius Segura ◽  
Simone D. J. Barbosa

Nowadays, we have access to data of unprecedented volume, high dimensionality, and complexity. To extract novel insights from such complex and dynamic data, we need effective and efficient strategies. One such strategy is to combine data analysis and visualization techniques, which are the essence of visual analytics applications. After the knowledge discovery process, a major challenge is to filter the essential information that has led to a discovery and to communicate the findings to other people, explaining the decisions they may have made based on the data. We propose to record and use the trace left by the exploratory data analysis, in the form of user interaction history, to aid this process. With the trace, users can choose the desired interaction steps and create a narrative, sharing the acquired knowledge with readers. To achieve our goal, we have developed the BONNIE ( Building Online Narratives from Noteworthy Interaction Events ) framework. BONNIE comprises a log model to register the interaction events, auxiliary code to help developers instrument their own code, and an environment to view users’ own interaction history and build narratives. This article presents our proposal for communicating discoveries in visual analytics applications, the BONNIE framework, and the studies we conducted to evaluate our solution. After two user studies (the first one focused on history visualization and the second one focused on narrative creation), our solution has showed to be promising, with mostly positive feedback and results from a Technology Acceptance Model ( TAM ) questionnaire.


2021 ◽  
Vol 59 (3) ◽  
pp. 123-140
Author(s):  
Marina Matić Bošković ◽  

According to the estimate of the EU Commission 85 percent of criminal investigations require electronic evidence, while in almost two thirds (65 percent) of the investigations where e-evidence is relevant. Investigation and prosecution of crime increasingly relies on the possibility to have access to data held by service providers, as private company. Modern criminal investigation and use of electronic evidence imposes challenges to the right to fair trial and rule of law standards. The paper identifies benefits and challenges of proposed EU instruments for facilitating e-evidence. The European Commission proposed Regulation of Production Order and Preservation Order with the aim to facilitate access to relevant data stored by service providers. The paper recognizes shortcomings of the proposed Regulation. The biggest challenge is lack of judicial oversight of orders, as a guarantee of fair trial. The paper includes recommendations and policy options for promoting judicial system for cross border access and collection of electronic data in line with EU fundamental rights standards.


2021 ◽  
Vol 9 (2) ◽  
pp. 65-83
Author(s):  
Raazia Moosa

Traditional advising responsibilities are shifting to include a holistic, learning-based and developmental approach that favours advising of the entire university experience. A dearth of systematic empirical evidence exists on advisors’ perceptions of the value of advising students during the COVID-19 pandemic in the South African context. The purpose of this study is to elucidate advisors’ perceptions of the complexity and challenges inherent in their responsibilities during the pandemic. This case study draws on a qualitative research design; it is based on semi-structured in-depth interviews undertaken with nine advisors in 2020. The central research questions posed in this study are: how do advisors describe their perceptions of their responsibilities within the COVID-19 pandemic, and how might these contribute to future practices? The findings indicate that advising during the pandemic has transcended the typical transactional dissemination of information to include addressing contextual environmental and resource challenges, social justice imperatives, emergency remote learning, asynchronous advising challenges and data-informed advising. These responsibilities have encompassed a holistic approach to advising and to getting to know students as ‘whole people’. Adjustments and transitions to emergency remote learning have highlighted social inequalities in access to data, to internet and electricity connectivity, which have served as impediments to students’ learning, and to educational experiences. Some home environments were not conducive to studying but necessitated doing household chores and herding cattle. The findings also indicate that an institution’s advising delivery model should enhance advisors’ abilities to perform their responsibilities. A network of cascaded responsibilities that incorporates greater involvement of lecturers in advising could contribute to a shared responsibility between lecturers and central, faculty and peer advisors. Insights gained may lead to a more nuanced understanding of advisors’ responsibilities as they relate to student learning and to the overall educational experience to promote retention and student success in a post-pandemic era.


Sign in / Sign up

Export Citation Format

Share Document