Research Data Journal for the Humanities and Social Sciences
Latest Publications


TOTAL DOCUMENTS

43
(FIVE YEARS 30)

H-INDEX

2
(FIVE YEARS 1)

Published By Brill

2452-3666

Author(s):  
Valentin Bellassen ◽  
Filippo Arfini ◽  
Federico Antonioli ◽  
Antonio Bodini ◽  
Michael Boehm ◽  
...  

Abstract The dataset Sustainability performance of certified and non-certified food (https://www.doi.org/10.15454/OP51SJ) contains 25 indicators of economic, environmental, and social performance, estimated for 27 certified food value chains and their 27 conventional reference products. The indicators are estimated at different levels of the value chain: farm level, processing level, and retail level. It also contains the raw data based on which the indicators are estimated, its source, and the completed spreadsheet calculators for the following indicators: carbon footprint and food miles. This article describes the common method and indicators used to collect data for the twenty-seven certified products and their conventional counterparts. It presents the assumptions and choices, the process of data collection, and the indicator estimation methods designed to assess the three sustainability dimensions within a reasonable time constraint. That is: three person-months for each food quality scheme and its non-certified reference product. Several prioritisations were set regarding data collection (indicator, variable, value chain level) together with a level of representativeness specific to each variable and product type (country and sector). Technical details on how relatively common variables (e.g., number of animals per hectare) are combined into indicators (e.g., carbon footprint) are provided in the full documentation of the dataset.


Author(s):  
Morgan Macleod ◽  
Elena Anagnostopolou ◽  
Dionysios Mertyris ◽  
Christina Sevdali
Keyword(s):  
Web Site ◽  

Abstract The DiGreC (DIachrony of GREek Case) treebank is a corpus of selected sentences from Greek texts, ranging from Homer to Modern Greek, which have been annotated morphosyntactically and semantically. The corpus comprises excerpts from 655 texts, for a total of 3385 sentences and 56,440 word tokens; automated tagging and lemmatisation has been supplemented with manual review to ensure accuracy. The data exist in xml and csv formats, which can be manipulated and converted automatically to other schemata. A web site has also been created to allow users to interact with the data more easily, and to provide specialised functionality for searching and visualisation. This corpus was created to inform theoretical debates regarding the role of case in grammar, and may be of use to researchers searching for specific attestations of a range of different constructions in Greek.


Author(s):  
Nick Redfern

Abstract This article presents a new data set comprising audio, colour, motion, and shot length data of trailers for the fifty highest grossing horror films at the US box office from 2011 to 2015. This data set is one of the few available for computational film analysis that includes data on multiple elements of film style and is the only existing data set for motion picture trailers suitable for formal analyses. Data is stored in csv files available under a Creative Commons Attribution 4.0 International license on Zenodo: www.doi.org/10.5281/zenodo.4479068.


Author(s):  
Björn Quanjer ◽  
Jan Kok

Abstract In this article, the authors describe and explore the dataset Pupils of the Amsterdam Maritime Institute 1792–1943, which is based on the Comportementboeken of the Amsterdam Maritime Institute. These records contain biographical information and bodily measurements of aspirant sailors between 12 and 20 years of age. The authors have linked the records (N = 5439) to enrolment records and the examinations for the military draft, which provides unique data on historical adolescent growth rates. Apart from anthropometric research, the dataset can be used for different kinds of studies into the background and early careers of Dutch sailors.


Author(s):  
Jan Jonker ◽  
Wouter Poot ◽  
Peter Doorn

Abstract Since the end of the nineties, Dutch census publications have been digitized and made available for digital processing. New analyses of the data were presented in some fruitful conferences in the first decade of this century. In addition to the census publications, a mass of detailed census data was found in dossiers and so-called “transparencies” in the archive of Statistics Netherlands. Most of that material was scanned into digital images, awaiting further content conversion into numeric data. In the present article, the authors describe the process of digitizing the detailed tables of the Dutch Population and Occupational Censuses held in 1947, which is the first set of detailed census data that is made available in a digitally processible form. They give an example of historical analyses made possible by this dataset. Moreover, they take these census data as an example of preparing and publishing a large dataset. Experiences and lessons learned in the process lead to ample opportunities for further analysis of the data and for efficient ways to accomplish the content conversion of the many remaining images of census data.


Author(s):  
Eltjo Buringh

Abstract This article presents an expanded dataset of the historical urban population in Europe, European urban population, 700–2000 (https://www.doi.org/10.17026/dans-xzy-u62q). This dataset contains new and improved estimates of the urban population (in thousands of inhabitants) between the years 700 and 2000 in 2,262 European settlements, including European cities with more than 100,000 inhabitants. The dataset is based on previous historical demographic sources that have been critically assessed and systematically complemented with new population estimates for additional time windows, deriving from either quantitative sources or proxies. Missing data are covered by city-specific and time-specific imputations. The applied time windows are whole centuries before 1500 and half a century afterwards. The article discusses the robustness checks that have been performed to validate the reliability of the imputed numerical results.


Author(s):  
René van Weeren ◽  
Tine De Moor

Abstract Marriage is generally regarded as a decisive moment in the life course of individuals. As the social, but also the legal status of women and men changes as soon as they enter marriage and – by extension – their preceding wedding engagement, registers are and were being kept to record this life event in most societies. The difficulty in studying the long-term development of marriage patterns is the need for, among other things, detailed information about the marriage formation process. Most of the research on marriage patterns is based on a limited amount of data. Data either cover only a limited period (at most several consecutive decades), a limited number of variables, a relatively small number of marriages, and/or a relatively small town or region. The Amsterdam marriage banns registers are an exception to the above, in terms of content, focus area, and volume. In this article, we present the dataset results of the Citizen Science project ‘Ja, ik wil!’ [‘Yes, I do!’], involving over 500 participants retrieving a wide range of socio-economic data on over 94,000 couples from the rich source of the historical Amsterdam marriage banns registers, covering every fifth year between 1580 and 1810.


Author(s):  
Inga Brentel ◽  
Kristi Winters

Abstract This article details the novel structure developed to handle, harmonize and document big data for reuse and long-term preservation. ‘The Longitudinal IntermediaPlus (2014–2016)’ big data dataset is uniquely rich: it covers an array of German online media extendable to cross-media channels and user information. The metadata file for this dataset, and its documentation, were recently deposited as its own MySQL database called charmstana_sample_14-16.sql (https://data.gesis.org/sharing/#!Detail/10.7802/2030) (cs16) and is suitable for generating descriptive statistics. Analogous to the ‘Data View’ in spss, the charmstana_analysis (ca) contains the dataset’s numerical values. Both the cs16 and ca MySQL files are needed to conduct analysis on the full database. The research challenge was to process large-scaled datasets into one longitudinal, big-data data source suitable for academic research, and according to fair principles. The authors review four methodological recommendations that can serve as a framework for solving big-data structuring challenges, using the harmonization software CharmStats.


2020 ◽  
Vol 5 (2) ◽  
pp. 109-125
Author(s):  
Thunnis van Oort ◽  
Åsa Jernudd ◽  
Kathleen Lotze ◽  
Clara Pafort-Overduin ◽  
Daniël Biltereyst ◽  
...  

Abstract This data paper and the data collection from which it emerges aim to present a fully harmonized data set originating in several research projects on post-war cinema programming. The paper will reflect on the collection and structure of this aggregated data set, that consists of titles of feature films screened for public viewing in cinemas in the cities Bari (Italy), Antwerp and Ghent (Belgium), Gothenburg (Sweden), Leicester (United Kingdom) and Rotterdam (Netherlands) for the year 1952. As comparisons of movie-going patterns between European countries are still rare, this paper offers a model for constructing a data set which can be replicated, scaled up and used to compare, contextualize, and eventually theorize practices of cinema-going across countries at a global level.


Author(s):  
Simon McVeigh

Abstract The paper outlines the genesis and subsequent transformation of the database Calendar of London Concerts 1750–1800, now available as a dataset at https://www.doi.org/10.17026/dans-znv-3c2j. Originally developed during the 1980s, the database was used as a primary research tool in the preparation of articles and a 1993 monograph: the first comprehensive study of London’s flourishing public concert life in the later eighteenth century, which culminated in Haydn’s London visits in 1791–5. The database itself, extending to over 4000 records, was derived from an exhaustive study of London newspapers. Following the obsolescence of the relational database in which the material was initially stored, it has recently been transferred to a spreadsheet in csv format, publicly available with free open access. Issues arising out of the standardisation of concert data are explored, especially regarding the layout of complete concert programmes, and the strengths and limitations of the original design are analysed, within the context of the newly available version.


Sign in / Sign up

Export Citation Format

Share Document