scholarly journals A Test Collection for Dataset Retrieval in Biodiversity Research

2021 ◽  
Vol 7 ◽  
Author(s):  
Felicitas Löffler ◽  
Andreas Schuldt ◽  
Birgitta König-Ries ◽  
Helge Bruelheide ◽  
Friederike Klan

Searching for scientific datasets is a prominent task in scholars' daily research practice. A variety of data publishers, archives and data portals offer search applications that allow the discovery of datasets. The evaluation of such dataset retrieval systems requires proper test collections, including questions that reflect real world information needs of scholars, a set of datasets and human judgements assessing the relevance of the datasets to the questions in the benchmark corpus. Unfortunately, only very few test collections exist for a dataset search. In this paper, we introduce the BEF-China test collection, the very first test collection for dataset retrieval in biodiversity research, a research field with an increasing demand in data discovery services. The test collection consists of 14 questions, a corpus of 372 datasets from the BEF-China project and binary relevance judgements provided by a biodiversity expert.

2020 ◽  
Vol 54 (2) ◽  
pp. 1-2
Author(s):  
Kevin Roitero

To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. In this thesis we focus on three main directions: the first part details the usage of few topics (i.e., information needs) in retrieval evaluation and shows an extensive study detailing the effect of using fewer topics for retrieval evaluation in terms of number of topics, topics subsets, and statistical power. The second part of this thesis discusses the evaluation without relevance judgements, reproducing, extending, and generalizing state-of-the-art methods and investigating their combinations by means of data fusion techniques and machine learning. Finally, the third part uses crowdsourcing to gather relevance labels, and in particular shows the effect of using fine grained judgement scales; furthermore, explores methods to transform judgements between different relevance scales. Awarded by: University of Udine, Udine, Italy on 19 March 2020. Supervised by: Professor Stefano Mizzaro. Available at: https://kevinroitero.com/resources/kr-phd-thesis.pdf.


2020 ◽  
Vol 54 (2) ◽  
pp. 1-2
Author(s):  
Dan Li

The availability of test collections in Cranfield paradigm has significantly benefited the development of models, methods and tools in information retrieval. Such test collections typically consist of a set of topics, a document collection and a set of relevance assessments. Constructing these test collections requires effort of various perspectives such as topic selection, document selection, relevance assessment, and relevance label aggregation etc. The work in the thesis provides a fundamental way of constructing and utilizing test collections in information retrieval in an effective, efficient and reliable manner. To that end, we have focused on four aspects. We first study the document selection issue when building test collections. We devise an active sampling method for efficient large-scale evaluation [Li and Kanoulas, 2017]. Different from past sampling-based approaches, we account for the fact that some systems are of higher quality than others, and we design the sampling distribution to over-sample documents from these systems. At the same time, the estimated evaluation measures are unbiased, and assessments can be used to evaluate new, novel systems without introducing any systematic error. Then a natural further step is determining when to stop the document selection and assessment procedure. This is an important but understudied problem in the construction of test collections. We consider both the gain of identifying relevant documents and the cost of assessing documents as the optimization goals. We handle the problem under the continuous active learning framework by jointly training a ranking model to rank documents, and estimating the total number of relevant documents in the collection using a "greedy" sampling method [Li and Kanoulas, 2020]. The next stage of constructing a test collection is assessing relevance. We study how to denoise relevance assessments by aggregating from multiple crowd annotation sources to obtain high-quality relevance assessments. This helps to boost the quality of relevance assessments acquired in a crowdsourcing manner. We assume a Gaussian process prior on query-document pairs to model their correlation. The proposed model shows good performance in terms of interring true relevance labels. Besides, it allows predicting relevance labels for new tasks that has no crowd annotations, which is a new functionality of CrowdGP. Ablation studies demonstrate that the effectiveness is attributed to the modelling of task correlation based on the axillary information of tasks and the prior relevance information of documents to queries. After a test collection is constructed, it can be used to either evaluate retrieval systems or train a ranking model. We propose to use it to optimize the configuration of retrieval systems. We use Bayesian optimization approach to model the effect of a δ -step in the configuration space to the effectiveness of the retrieval system, by suggesting to use different similarity functions (covariance functions) for continuous and categorical values, and examine their ability to effectively and efficiently guide the search in the configuration space [Li and Kanoulas, 2018]. Beyond the algorithmic and empirical contributions, work done as part of this thesis also contributed to the research community as the CLEF Technology Assisted Reviews in Empirical Medicine Tracks in 2017, 2018, and 2019 [Kanoulas et al., 2017, 2018, 2019]. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Evangelos Kanoulas. Available at: https://dare.uva.nl/search?identifier=3438a2b6-9271-4f2c-add5-3c811cc48d42.


Author(s):  
Pierre Taberlet ◽  
Aurélie Bonin ◽  
Lucie Zinger ◽  
Eric Coissac

Environmental DNA (eDNA), i.e. DNA released in the environment by any living form, represents a formidable opportunity to gather high-throughput and standard information on the distribution or feeding habits of species. It has therefore great potential for applications in ecology and biodiversity management. However, this research field is fast-moving, involves different areas of expertise and currently lacks standard approaches, which calls for an up-to-date and comprehensive synthesis. Environmental DNA for biodiversity research and monitoring covers current methods based on eDNA, with a particular focus on “eDNA metabarcoding”. Intended for scientists and managers, it provides the background information to allow the design of sound experiments. It revisits all steps necessary to produce high-quality metabarcoding data such as sampling, metabarcode design, optimization of PCR and sequencing protocols, as well as analysis of large sequencing datasets. All these different steps are presented by discussing the potential and current challenges of eDNA-based approaches to infer parameters on biodiversity or ecological processes. The last chapters of this book review how DNA metabarcoding has been used so far to unravel novel patterns of diversity in space and time, to detect particular species, and to answer new ecological questions in various ecosystems and for various organisms. Environmental DNA for biodiversity research and monitoring constitutes an essential reading for all graduate students, researchers and practitioners who do not have a strong background in molecular genetics and who are willing to use eDNA approaches in ecology and biomonitoring.


2018 ◽  
Vol 10 (7) ◽  
pp. 2434 ◽  
Author(s):  
Sebastian Fastenrath ◽  
Boris Braun

Socio-technical transitions towards more sustainable modes of production and consumption are receiving increasing attention in the academic world and also from political and economic decision-makers. There is increasing demand for resource-efficient technologies and institutional innovations, particularly at the city level. However, it is widely unclear how processes of change evolve and develop and how they are embedded in different socio-spatial contexts. While numerous scholars have contributed to the vibrant research field around sustainability transitions, the geographical expertise largely has been ignored. The lack of knowledge about the role of spatial contexts, learning processes, and the co-evolution of technological, economical, and socio-political processes has been prominently addressed. Bridging approaches from Transition Studies and perspectives of Economic Geography, the paper presents conceptual ideas for an evolutionary and relational understanding of urban sustainability transitions. The paper introduces new perspectives on sustainability transitions towards a better understanding of socio-spatial contexts.


1997 ◽  
pp. 13-26 ◽  
Author(s):  
David Johnson ◽  
Myke Gluck

This article looks at the access to geographic information through a review of information science theory and its application to the WWW. The two most common retrieval systems are information and data retrieval. A retrieval system has seven elements: retrieval models, indexing, match and retrieval, relevance, order, query languages and query specification. The goal of information retrieval is to match the user's needs to the information that is in the system. Retrieval of geographic information is a combination of both information and data retrieval. Aids to effective retrieval of geographic information are: query languages that employ icons and natural language, automatic indexing of geographic information, and standardization of geographic information. One area that has seen an explosion of geographic information retrieval systems (GIR's) is the World Wide Web (WWW). The final section of this article discusses how seven WWW GIR's solve the the problem of matching the user's information needs to the information in the system.


2012 ◽  
pp. 143-172
Author(s):  
Gaetano Chinnici ◽  
Biagio Pecorino ◽  
Alessandro Scuderi

The common agricultural policy over the years has expanded the tools of promotion and protection of farm produce quality. At the national level but also from Europe we are witnessing a change in consumer behavior: they become more and more relevant information needs, safety and food security, increasing demand for quality products and the willingness to pay for those products that meet consumer expectations. The paper focuses on the perceived quality of local products in order to identify those variables that influence purchasing decisions and dietary habits and consumer group. The survey was conducted using a principal components analysis to summarize the information that characterizes the choices of consumption, followed by cluster analysis which allowed us to confirm the presence of different segments of consumers of local products.


Author(s):  
Zahid Ashraf Wani ◽  
Huma Shafiq

Nowadays, we all rely on cyberspace for our information needs. We make use of different types of search tools. Some of them have specialization in a specific format or two, while few can crawl a good portion of the web irrespective of formats. Therefore, it is very imperative for information professionals to have thorough understandings of these tools. As such, the chapter is an endeavor to delve deep and highlight various trends in online information retrieval from primitive to modern ones. The chapter also made an effort to envisage the future requirements and expectation keeping in view the ever-increasing dependence on diverse species of information retrieval tools.


Author(s):  
Roberto J.G. Unger ◽  
Isa Maria Freire

O artigo apresenta o conceito de regime de informação aos gestores de informação, como contribuição aos processos de adaptação e adequação de sistemas de informação e linguagens documentárias para atender às necessidades informacionais dos usuários. Regimes de informação são modos de produção informacional dominantes numa formação econômico-social que pressupõem, necessariamente, em seu contexto fontes de informação que são disseminadas e exercem influência no contexto social em que estão estabelecidas. Nesse aspecto, as sociedades têm regimes de informação através dos quais organizam a produção material e simbólica e representam a dinâmica das relações sociais. Dentre as diversas formas de manifestações institucionais atuais, destacam-se os sistemas de recuperação da informação, a manifestação per se do fenômeno que move o regime. Os sistemas de recuperação da informação, por sua vez, usam linguagens documentárias para organizar e comunicar a informação organizada nos inúmeros “agregados de informação”, que Barreto (1996) define como “estruturas” que armazenam “estoques de informação” e podem atuar como “agentes”, ou “mediadores”, entre uma fonte de informação e seus usuários. Abstract The article presents the concept of regime of information to information managers as a contribution for the proccesses of adaptation and adjustment of information systems and documentary language to really attend the information needs of users. Regimes of information are dominants modules of informational production in economic-social formation that presuppose, necessarily, in its context information sources wich are disseminated and put in actions influences in the structure which they are established. Under these circumstances, societies have regimes of information through whom organize symbolic and material production and represent the social dynamics relations. In the midst of several kinds of actual institutional manifestations, distinguish the information retrieval systems, the expression per se of the phenomenon that moves the regime. Under this configuration, the information retrieval systems make use of documentary language to organize, describe and communicate provided information in innumerable aggregates of information that, according Barreto (1996), “are structures which harvest “supply of information” and they operate as “agents” or “mediators” between a source of information and their users”.


Sign in / Sign up

Export Citation Format

Share Document