scholarly journals Methodology for the Assessment of the Text Similarity of Documents in the CORE Open Access Data Set of Scholarly Documents

Author(s):  
Ivan Kovačič ◽  
David Bajs ◽  
Milan Ojsteršek

This paper describes the methodology of data preparation and analysis of the text similarity required for plagiarism detection on the CORE data set. Firstly, we used the CrossREF API and Microsoft Academic Graph data set for metadata enrichment and elimination of duplicates of doc-uments from the CORE 2018 data set. In the second step, we used 4-gram sequences of words from every document and transformed them into SHA-256 hash values. Features retrieved using hashing algorithm are compared, and the result is a list of documents and the percentages of cov-erage between pairs of documents features. In the third step, called pairwise feature-based ex-haustive analysis, pairs of documents are checked using the longest common substring.

2004 ◽  
Vol 34 (136) ◽  
pp. 339-356
Author(s):  
Tobias Wölfle ◽  
Oliver Schöller

Under the term “Hilfe zur Arbeit” (aid for work) the federal law of social welfare subsumes all kinds of labour disciplining instruments. First, the paper shows the historical connection of welfare and labour disciplining mechanisms in the context of different periods within capitalist development. In a second step, against the background of historical experiences, we will analyse the trends of “Hilfe zur Arbeit” during the past two decades. It will be shown that by the rise of unemployment, the impact of labour disciplining aspects of “Hilfe zur Arbeit” has increased both on the federal and on the municipal level. For this reason the leverage of the liberal paradigm would take place even in the core of social rights.


Author(s):  
Ricardo Giglio ◽  
Thomas Lux

AbstractWe investigate the network topology of a comprehensive data set of the world-wide population of corporate entities. In particular, we have extracted information on the boards of all companies listed in Bloomberg’s archive of company profiles in October, 2015, a total of almost 100,000 firms. We provide information on board membership overlaps at various levels, and, in particular, show that there exists a core of directors who accumulate a large number of seats and are highly connected among themselves both at the level of national networks and at the worldwide aggregated level.


Author(s):  
Eric H. Nielsen ◽  
John R. Dixon ◽  
George E. Zinsmeister

Abstract The goal of “intelligent” computer-aided-design (CAD) systems is to provide greater support for the process of design, as distinguished from drafting and analysis. More supportive design systems should provide a quick and simple means of creating and modifying design configurations, automating evaluation procedures (e.g., for manufacturing), and automating interfaces to analysis procedures. In this paper we are concerned with the issues of representing in-progress designs so that such goals can be met. A feature-based representation is proposed in which features are defined as possessing not only form but also certain designer intentions regarding geometric relationships. A working experimental version of a design-with-features system using this representation for thin-walled components illustrates its use in composing a design as a configuration of feature-forms, in modifying the design geometry through automatic, intelligent incorporation and propagation of designer-initiated geometry changes, and in providing for the generation of user-defined features. In contrast to constraint-driven simultaneous equation solving methods, this system uses an intent-driven knowledge-based method to propagate and incorporate geometry modifications not only in fully-constrained designs, but also in over- and under-constrained designs. Issues of manageability, extensibility, and computationally efficiency were considered in the development of the core services of the system.


2017 ◽  
Vol 19 (2) ◽  
pp. 53-66 ◽  
Author(s):  
Michael Preston-Shoot

Purpose The purpose of this paper is twofold: first, to update the core data set of self-neglect serious case reviews (SCRs) and safeguarding adult reviews (SARs), and accompanying thematic analysis; second, to respond to the critique in the Wood Report of SCRs commissioned by Local Safeguarding Children Boards (LSCBs) by exploring the degree to which the reviews scrutinised here can transform and improve the quality of adult safeguarding practice. Design/methodology/approach Further published reviews are added to the core data set from the websites of Safeguarding Adults Boards (SABs) and from contacts with SAB independent chairs and business managers. Thematic analysis is updated using the four domains employed previously. The findings are then further used to respond to the critique in the Wood Report of SCRs commissioned by LSCBs, with implications discussed for Safeguarding Adult Boards. Findings Thematic analysis within and recommendations from reviews have tended to focus on the micro context, namely, what takes place between individual practitioners, their teams and adults who self-neglect. This level of analysis enables an understanding of local geography. However, there are other wider systems that impact on and influence this work. If review findings and recommendations are to fully answer the question “why”, systemic analysis should appreciate the influence of national geography. Review findings and recommendations may also be used to contest the critique of reviews, namely, that they fail to engage practitioners, are insufficiently systemic and of variable quality, and generate repetitive findings from which lessons are not learned. Research limitations/implications There is still no national database of reviews commissioned by SABs so the data set reported here might be incomplete. The Care Act 2014 does not require publication of reports but only a summary of findings and recommendations in SAB annual reports. This makes learning for service improvement challenging. Reading the reviews reported here against the strands in the critique of SCRs enables conclusions to be reached about their potential to transform adult safeguarding policy and practice. Practical implications Answering the question “why” is a significant challenge for SARs. Different approaches have been recommended, some rooted in systems theory. The critique of SCRs challenges those now engaged in SARs to reflect on how transformational change can be achieved to improve the quality of adult safeguarding policy and practice. Originality/value The paper extends the thematic analysis of available reviews that focus on work with adults who self-neglect, further building on the evidence base for practice. The paper also contributes new perspectives to the process of conducting SARs by using the analysis of themes and recommendations within this data set to evaluate the critique that reviews are insufficiently systemic, fail to engage those involved in reviewed cases and in their repetitive conclusions demonstrate that lessons are not being learned.


2010 ◽  
Vol 14 (3) ◽  
pp. 545-556 ◽  
Author(s):  
J. Rings ◽  
J. A. Huisman ◽  
H. Vereecken

Abstract. Coupled hydrogeophysical methods infer hydrological and petrophysical parameters directly from geophysical measurements. Widespread methods do not explicitly recognize uncertainty in parameter estimates. Therefore, we apply a sequential Bayesian framework that provides updates of state, parameters and their uncertainty whenever measurements become available. We have coupled a hydrological and an electrical resistivity tomography (ERT) forward code in a particle filtering framework. First, we analyze a synthetic data set of lysimeter infiltration monitored with ERT. In a second step, we apply the approach to field data measured during an infiltration event on a full-scale dike model. For the synthetic data, the water content distribution and the hydraulic conductivity are accurately estimated after a few time steps. For the field data, hydraulic parameters are successfully estimated from water content measurements made with spatial time domain reflectometry and ERT, and the development of their posterior distributions is shown.


Author(s):  
Philippe Henry

In the present research, I used an open access data set (Medicinal Genomics) consisting of nearly 200'000 genome-wide single nucleotide polymorphisms (SNPs) typed in 28 cannabis accessions to shed light on the plant's underlying genetic structure. Genome-wide loadings were used to sequentially cull less informative markers. The process involved reducing the number of SNPs to 100K, 10K, 1K, 100 until I identified a set of 42 highly informative SNPs that I present here. The two first principal components, encompass over 3/4 of the genetic variation present in the dataset (PCA1 = 48.6%, PCA2= 26.3%). This set of diagnostic SNPs is then used to identify clusters into which cannabis accession segregate. I identified three clear and consistent clusters; reflective of the ancient domestication trilogy of the genus Cannabis.


2021 ◽  
pp. 1-27
Author(s):  
Tim Sainburg ◽  
Leland McInnes ◽  
Timothy Q. Gentner

Abstract UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.


Author(s):  
Jinling Li ◽  
Yuhao Liu ◽  
Ahmed Tageldin ◽  
Mohamed H. Zaki ◽  
Greg Mori ◽  
...  

An approach for vehicle conflict analysis based on three-dimensional (3-D) vehicle detection is presented. Techniques for quantitative conflict measurements often use a point trajectory representation for vehicles. More accurate conflict measurement can be facilitated with a region-based vehicle representation instead. This paper describes a computer vision approach for extracting vehicle trajectories from video sequences. The method relied on a fusion of background subtraction and feature-based tracking to provide a three-dimensional (3-D) cuboid representation of the vehicle. Standard conflict measures, including time to collision and postencroachment time, were computed with the use of the 3-D cuboid vehicle representations. The use of these conflict measures was demonstrated on a challenging data set of video footage. Results showed that the region-based representation could provide more precise calculation of traffic conflict indicators compared with approaches based on a point representation.


Big Data ◽  
2016 ◽  
pp. 261-287
Author(s):  
Keqin Wu ◽  
Song Zhang

While uncertainty in scientific data attracts an increasing research interest in the visualization community, two critical issues remain insufficiently studied: (1) visualizing the impact of the uncertainty of a data set on its features and (2) interactively exploring 3D or large 2D data sets with uncertainties. In this chapter, a suite of feature-based techniques is developed to address these issues. First, an interactive visualization tool for exploring scalar data with data-level, contour-level, and topology-level uncertainties is developed. Second, a framework of visualizing feature-level uncertainty is proposed to study the uncertain feature deviations in both scalar and vector data sets. With quantified representation and interactive capability, the proposed feature-based visualizations provide new insights into the uncertainties of both data and their features which otherwise would remain unknown with the visualization of only data uncertainties.


Author(s):  
Sérgio Luís Guerreiro

When organizations are collaborating, their access control models need to interoperate. However, there are too many access control model variants, and the interoperability enforcement consumes extra effort. In this context, this chapter identifies the challenges of how to design and enforce a meta-access control model to facilitate the interoperability between the different access control mechanisms available. The problem is posed using an ontological approach. Then, the challenges are explained using a descriptive explanation of the meta access control enforcement. The core issues addressed are access models interoperability, standardization of storage for access data, and provisioning of access models.


Sign in / Sign up

Export Citation Format

Share Document