scholarly journals Evaluating Methods for Transcribing Specimen Labels

Author(s):  
Sarah Phillips ◽  
Mathias Dillen ◽  
Laura Green ◽  
Quentin Groom ◽  
Marie-Helene Weech

Distributed Systems of Scientific Collections (DiSSCo) a pan-European Research Infrastructure will facilitate the production of tens of millions of digital images of natural history specimens each year. The labels of these specimens contain valuable information for research, but their transcription can be difficult and time-consuming, with often hard to read handwritten labels. Whilst accurate label transcription is only one step along the way to create a specimen record fit for different research uses, it is an extremely important one. It would be very time-consuming to have to return to recheck label information for even a very small proportion of specimens. Once a specimen label is transcribed correctly, it becomes much easier to enhance the record with additional information from other sources, e.g. from literature or collector itineraries. It also becomes feasible to determine the point of collection from the textual information on the label by a process known as georeferencing, or even to find inaccuracies within the label itself. Under the auspices of the project Innovation and Consolidation for Large Scale Digitisation of Natural Heritage (ICEDIG), we compared different manual approaches to transcription of collection labels. Using herbarium specimens as an example, the quality of transcribed data by: in-house trained institute staff, outsourcing to a commercial company or transcription by the general public through online crowdsourcing platforms was compared through two transcription pilots. in-house trained institute staff, outsourcing to a commercial company or transcription by the general public through online crowdsourcing platforms was compared through two transcription pilots. The first pilot consisted of 200 Solanum specimen images from the Royal Botanic Gardens Kew in the UK and 200 from Meise Botanic Garden in Belgium. This particular genus was chosen as both institutes had specimens from which the label data had already been transcribed through the digitisation company Picturae, completed by Alembo. The Kew specimens had also been transcribed in-house by staff employed as digitisation officers or curators and by an independant researcher. The images from both institutes were uploaded to two crowdsourcing platforms: DigiVol and DoeDat. In a second pilot, multiple European institutions holding botanical collections were approached to provide a sample of 200 digitally imaged herbarium sheet specimens to upload to multiple crowdsourcing platforms. Specimens from 7 institutions were uploaded for transcription to 5 different crowdsourcing platforms: DigiVol, DoeDat, Die Herbonauten, Les Herbonautes and Notes from Nature. For both pilots, key transcription data were assessed and common errors in label transcription identified. Reasons for these errors will be discussed along with possible mechanisms to improve the accuracy of the transcriptions. The need for standards for transcription is identified and recommendations made.

2012 ◽  
Vol 41 (2) ◽  
pp. 409-427 ◽  
Author(s):  
PAUL DOLAN ◽  
ROBERT METCALFE

AbstractGovernments around the world are now beginning to seriously consider the use of measures of subjective wellbeing (SWB) – ratings of thoughts and feelings about life – for monitoring progress and for informing and appraising public policy. The mental state account of wellbeing upon which SWB measures are based can provide useful additional information about who is doing well and badly in life when compared to that provided by the objective list and preference satisfaction accounts. It may be particularly useful when deciding how best to allocate scarce resources, where it is desirable to express the benefits of intervention in a single metric that can be compared to the costs of intervention. There are three main concepts of SWB in the literature – evaluation (life satisfaction), experience (momentary mood) and eudemonia (purpose) – and policy-makers should seek to measure all three, at least for the purposes of monitoring progress. There are some major challenges to the use of SWB measures. Two related and well-rehearsed issues are the effects of expectations and adaptation on ratings. The degree to which we should allow wellbeing to vary according to expectations and adaptation are vexing moral problems but information on SWB can highlight what difference allowing for these considerations would have in practice (e.g. in informing prioiritisation decisions), which can then be fed into the normative debate. There are also questions about precisely what attention should be drawn to in SWB questions and how to capture the ratings of those least inclined to take part in surveys, but these can be addressed through more widespread use of SWB. We also provide some concrete recommendations about precisely what questions should be asked in large-scale surveys, and these recommendations have been taken up by the Office of National Statistics in the UK and are being looked at closely by the OECD.


2019 ◽  
Author(s):  
Anya Skatova ◽  
James Goulding

Advances in digital technology have led to large amounts of personal data being recorded and retained by industry, constituting an invaluable asset to both governmental and private organizations. The implementation of the General Data Protection Regulation in the EU, including the UK, fundamentally reshaped how data is handled across every sector. It enables the general public to access data collected about them by organisations, opening up the possibility of this data being used for research that benefits the public; for example, to uncover lifestyle causes of health outcomes. A significant barrier for using this commercial data for academic research is the lack of publicly acceptable research frameworks. Data donation - an act of active consent of an individual to donate their personal data for research - could enable the use of commercial data for societal benefit. However, it is not clear which motives, if any, would drive people to donate their personal data. In this paper we present the results of a large-scale survey (N = 1,300) that studied intentions and reasons to donate personal data. We found that over half of individuals are willing to donate their personal data for research that could benefit the wider general public. We identified three distinct reasons to donate personal data: an opportunity to achieve self-benefit, prosocial motive to serve society, and the need to understand the purpose of data donation. We developed a questionnaire to measure those three reasons and provided further evidence on the validity of the scales. Our results demonstrate that these motivations predict people’s intentions to donate personal data over and above generic altruistic motives and relevant personality traits. We show that a prosocial motive to serve society is the strongest predictor of the intention to donate personal data, while understanding the purpose of data donation also positively predicting the intentions to donate personal data. In contrast, self-serving motives show a negative association with intentions to donate personal data. The findings presented here inform the ethical use of commercially collected personal data for academic research for public good.


2019 ◽  
Vol 89 (10) ◽  
pp. 1055-1073 ◽  
Author(s):  
Nicolaas Molenaar ◽  
Marita Felder

ABSTRACT Dolomite is a common and volumetrically important mineral in many siliciclastic sandstones, including Permian Rotliegend sandstones (the Slochteren Formation). These sandstones form extensive gas reservoirs in the Southern Permian Basin in the Netherlands, Germany, Poland, and the UK. The reservoir quality of these sandstones is negatively influenced by the content and distribution of dolomite. The origin and the stratigraphic distribution of the dolomite is not yet fully understood. The aim of this study is to identify the origin of carbonate. The main methods used to achieve those aims are a combination of thin-section petrography, scanning electron microscopy (SEM and EDX), and XRD analyses. The present study shows that the typical dispersed occurrence of the dolomite is a consequence of dispersed detrital carbonate grains that served both as nuclei and source for authigenic dolomite cement. The dolomite cement formed syntaxial outgrowths and overgrowths around detrital carbonate grains. The study also shows that dolomite cement, often in combination with ankerite and siderite, precipitated during burial after mechanical compaction. Most of the carbonate grains consisted of dolomite before deposition. The carbonate grains were affected by compaction and pressure dissolution, and commonly have no well-defined outlines anymore. The distribution of dolomite cement in the Rotliegend sandstones was controlled by the presence of stable carbonate grains. Due to the restricted and variable content of carbonate grains and their dispersed occurrence, the cement is also dispersed and the degree of cementation heterogeneous. Our findings have important implications on diagenesis modeling. The presence of detrital carbonate excludes the need for external supply by any large-scale advective flow of diagenetic fluids. By knowing that the carbonate source is local and related to detrital grains instead of being externally derived from an unknown source, the presence of carbonate cement can be linked to a paleogeographic and sedimentological model.


Science ◽  
2021 ◽  
pp. eabf2946
Author(s):  
Louis du Plessis ◽  
John T. McCrone ◽  
Alexander E. Zarebski ◽  
Verity Hill ◽  
Christopher Ruis ◽  
...  

The UK’s COVID-19 epidemic during early 2020 was one of world’s largest and unusually well represented by virus genomic sampling. Here we reveal the fine-scale genetic lineage structure of this epidemic through analysis of 50,887 SARS-CoV-2 genomes, including 26,181 from the UK sampled throughout the country’s first wave of infection. Using large-scale phylogenetic analyses, combined with epidemiological and travel data, we quantify the size, spatio-temporal origins and persistence of genetically-distinct UK transmission lineages. Rapid fluctuations in virus importation rates resulted in >1000 lineages; those introduced prior to national lockdown tended to be larger and more dispersed. Lineage importation and regional lineage diversity declined after lockdown, while lineage elimination was size-dependent. We discuss the implications of our genetic perspective on transmission dynamics for COVID-19 epidemiology and control.


2021 ◽  
Vol 7 (2) ◽  
pp. 20
Author(s):  
Carlos Lassance ◽  
Yasir Latif ◽  
Ravi Garg ◽  
Vincent Gripon ◽  
Ian Reid

Vision-based localization is the problem of inferring the pose of the camera given a single image. One commonly used approach relies on image retrieval where the query input is compared against a database of localized support examples and its pose is inferred with the help of the retrieved items. This assumes that images taken from the same places consist of the same landmarks and thus would have similar feature representations. These representations can learn to be robust to different variations in capture conditions like time of the day or weather. In this work, we introduce a framework which aims at enhancing the performance of such retrieval-based localization methods. It consists in taking into account additional information available, such as GPS coordinates or temporal proximity in the acquisition of the images. More precisely, our method consists in constructing a graph based on this additional information that is later used to improve reliability of the retrieval process by filtering the feature representations of support and/or query images. We show that the proposed method is able to significantly improve the localization accuracy on two large scale datasets, as well as the mean average precision in classical image retrieval scenarios.


Author(s):  
Prasad Nagakumar ◽  
Ceri-Louise Chadwick ◽  
Andrew Bush ◽  
Atul Gupta

AbstractThe COVID-19 pandemic caused by SARS-COV-2 virus fortunately resulted in few children suffering from severe disease. However, the collateral effects on the COVID-19 pandemic appear to have had significant detrimental effects on children affected and young people. There are also some positive impacts in the form of reduced prevalence of viral bronchiolitis. The new strain of SARS-COV-2 identified recently in the UK appears to have increased transmissibility to children. However, there are no large vaccine trials set up in children to evaluate safety and efficacy. In this short communication, we review the collateral effects of COVID-19 pandemic in children and young people. We highlight the need for urgent strategies to mitigate the risks to children due to the COVID-19 pandemic. What is Known:• Children and young people account for <2% of all COVID-19 hospital admissions• The collateral impact of COVID-19 pandemic on children and young people is devastating• Significant reduction in influenza and respiratory syncytial virus (RSV) infection in the southern hemisphere What is New:• The public health measures to reduce COVID-19 infection may have also resulted in near elimination of influenza and RSV infections across the globe• A COVID-19 vaccine has been licensed for adults. However, large scale vaccine studies are yet to be initiated although there is emerging evidence of the new SARS-COV-2 strain spreading more rapidly though young people.• Children and young people continue to bear the collateral effects of COVID-19 pandemic


2016 ◽  
Vol 21 (3) ◽  
Author(s):  
William M. Adams ◽  
Ian D. Hodge ◽  
Nicholas A. Macgregor ◽  
Lindsey C. Sandbrook
Keyword(s):  

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 79 ◽  
Author(s):  
Xiaoyu Han ◽  
Yue Zhang ◽  
Wenkai Zhang ◽  
Tinglei Huang

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.


Sign in / Sign up

Export Citation Format

Share Document