A Test Collection for Dataset Retrieval in Biodiversity Research

Searching for scientific datasets is a prominent task in scholars' daily research practice. A variety of data publishers, archives and data portals offer search applications that allow the discovery of datasets. The evaluation of such dataset retrieval systems requires proper test collections, including questions that reflect real world information needs of scholars, a set of datasets and human judgements assessing the relevance of the datasets to the questions in the benchmark corpus. Unfortunately, only very few test collections exist for a dataset search. In this paper, we introduce the BEF-China test collection, the very first test collection for dataset retrieval in biodiversity research, a research field with an increasing demand in data discovery services. The test collection consists of 14 questions, a corpus of 372 datasets from the BEF-China project and binary relevance judgements provided by a biodiversity expert.

Download Full-text

Cheap IR evaluation

ACM SIGIR Forum ◽

10.1145/3483382.3483400 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Kevin Roitero

Keyword(s):

Statistical Power ◽

Information Needs ◽

Web Search ◽

State Of The Art ◽

Extensive Study ◽

Test Collection ◽

Test Collections ◽

Fine Grained ◽

Retrieval Systems ◽

Ranked List

To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach shares several issues with the test collection approach. The aim of this work is to: (i) develop and improve some state-of-the-art work on the evaluation of IR effectiveness while saving resources, and (ii) propose a novel, more principled and engineered, overall approach to test collection based effectiveness evaluation. In this thesis we focus on three main directions: the first part details the usage of few topics (i.e., information needs) in retrieval evaluation and shows an extensive study detailing the effect of using fewer topics for retrieval evaluation in terms of number of topics, topics subsets, and statistical power. The second part of this thesis discusses the evaluation without relevance judgements, reproducing, extending, and generalizing state-of-the-art methods and investigating their combinations by means of data fusion techniques and machine learning. Finally, the third part uses crowdsourcing to gather relevance labels, and in particular shows the effect of using fine grained judgement scales; furthermore, explores methods to transform judgements between different relevance scales. Awarded by: University of Udine, Udine, Italy on 19 March 2020. Supervised by: Professor Stefano Mizzaro. Available at: https://kevinroitero.com/resources/kr-phd-thesis.pdf.

Download Full-text

Effective collection construction for information retrieval evaluation and optimization

ACM SIGIR Forum ◽

10.1145/3483382.3483401 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Dan Li

Keyword(s):

Information Retrieval ◽

Configuration Space ◽

Sampling Method ◽

Bayesian Optimization ◽

Test Collection ◽

Test Collections ◽

Information Retrieval Evaluation ◽

Retrieval Systems ◽

Ranking Model ◽

Relevance Assessments

The availability of test collections in Cranfield paradigm has significantly benefited the development of models, methods and tools in information retrieval. Such test collections typically consist of a set of topics, a document collection and a set of relevance assessments. Constructing these test collections requires effort of various perspectives such as topic selection, document selection, relevance assessment, and relevance label aggregation etc. The work in the thesis provides a fundamental way of constructing and utilizing test collections in information retrieval in an effective, efficient and reliable manner. To that end, we have focused on four aspects. We first study the document selection issue when building test collections. We devise an active sampling method for efficient large-scale evaluation [Li and Kanoulas, 2017]. Different from past sampling-based approaches, we account for the fact that some systems are of higher quality than others, and we design the sampling distribution to over-sample documents from these systems. At the same time, the estimated evaluation measures are unbiased, and assessments can be used to evaluate new, novel systems without introducing any systematic error. Then a natural further step is determining when to stop the document selection and assessment procedure. This is an important but understudied problem in the construction of test collections. We consider both the gain of identifying relevant documents and the cost of assessing documents as the optimization goals. We handle the problem under the continuous active learning framework by jointly training a ranking model to rank documents, and estimating the total number of relevant documents in the collection using a "greedy" sampling method [Li and Kanoulas, 2020]. The next stage of constructing a test collection is assessing relevance. We study how to denoise relevance assessments by aggregating from multiple crowd annotation sources to obtain high-quality relevance assessments. This helps to boost the quality of relevance assessments acquired in a crowdsourcing manner. We assume a Gaussian process prior on query-document pairs to model their correlation. The proposed model shows good performance in terms of interring true relevance labels. Besides, it allows predicting relevance labels for new tasks that has no crowd annotations, which is a new functionality of CrowdGP. Ablation studies demonstrate that the effectiveness is attributed to the modelling of task correlation based on the axillary information of tasks and the prior relevance information of documents to queries. After a test collection is constructed, it can be used to either evaluate retrieval systems or train a ranking model. We propose to use it to optimize the configuration of retrieval systems. We use Bayesian optimization approach to model the effect of a δ -step in the configuration space to the effectiveness of the retrieval system, by suggesting to use different similarity functions (covariance functions) for continuous and categorical values, and examine their ability to effectively and efficiently guide the search in the configuration space [Li and Kanoulas, 2018]. Beyond the algorithmic and empirical contributions, work done as part of this thesis also contributed to the research community as the CLEF Technology Assisted Reviews in Empirical Medicine Tracks in 2017, 2018, and 2019 [Kanoulas et al., 2017, 2018, 2019]. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Evangelos Kanoulas. Available at: https://dare.uva.nl/search?identifier=3438a2b6-9271-4f2c-add5-3c811cc48d42.

Download Full-text

Environmental DNA

10.1093/oso/9780198767220.001.0001 ◽

2018 ◽

Cited By ~ 137

Author(s):

Pierre Taberlet ◽

Aurélie Bonin ◽

Lucie Zinger ◽

Eric Coissac

Keyword(s):

Graduate Students ◽

Feeding Habits ◽

Environmental Dna ◽

Research Field ◽

Background Information ◽

Ecological Processes ◽

Biodiversity Research ◽

Dna Metabarcoding ◽

Standard Information

Environmental DNA (eDNA), i.e. DNA released in the environment by any living form, represents a formidable opportunity to gather high-throughput and standard information on the distribution or feeding habits of species. It has therefore great potential for applications in ecology and biodiversity management. However, this research field is fast-moving, involves different areas of expertise and currently lacks standard approaches, which calls for an up-to-date and comprehensive synthesis. Environmental DNA for biodiversity research and monitoring covers current methods based on eDNA, with a particular focus on “eDNA metabarcoding”. Intended for scientists and managers, it provides the background information to allow the design of sound experiments. It revisits all steps necessary to produce high-quality metabarcoding data such as sampling, metabarcode design, optimization of PCR and sequencing protocols, as well as analysis of large sequencing datasets. All these different steps are presented by discussing the potential and current challenges of eDNA-based approaches to infer parameters on biodiversity or ecological processes. The last chapters of this book review how DNA metabarcoding has been used so far to unravel novel patterns of diversity in space and time, to detect particular species, and to answer new ecological questions in various ecosystems and for various organisms. Environmental DNA for biodiversity research and monitoring constitutes an essential reading for all graduate students, researchers and practitioners who do not have a strong background in molecular genetics and who are willing to use eDNA approaches in ecology and biomonitoring.

Download Full-text

Model for determination of the information needs of user in adjustable documentary information retrieval systems

Cybernetics ◽

10.1007/bf01075082 ◽

1987 ◽

Vol 22 (4) ◽

pp. 507-512

Author(s):

V. M. Driyanskii ◽

G. V. Zhogov

Keyword(s):

Information Retrieval ◽

Information Needs ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Lost in Transition? Directions for an Economic Geography of Urban Sustainability Transitions

Sustainability ◽

10.3390/su10072434 ◽

2018 ◽

Vol 10 (7) ◽

pp. 2434 ◽

Cited By ~ 9

Author(s):

Sebastian Fastenrath ◽

Boris Braun

Keyword(s):

Economic Geography ◽

Urban Sustainability ◽

Research Field ◽

Decision Makers ◽

Sustainability Transitions ◽

Political Processes ◽

Production And Consumption ◽

Increasing Demand ◽

The City

Socio-technical transitions towards more sustainable modes of production and consumption are receiving increasing attention in the academic world and also from political and economic decision-makers. There is increasing demand for resource-efficient technologies and institutional innovations, particularly at the city level. However, it is widely unclear how processes of change evolve and develop and how they are embedded in different socio-spatial contexts. While numerous scholars have contributed to the vibrant research field around sustainability transitions, the geographical expertise largely has been ignored. The lack of knowledge about the role of spatial contexts, learning processes, and the co-evolution of technological, economical, and socio-political processes has been prominently addressed. Bridging approaches from Transition Studies and perspectives of Economic Geography, the paper presents conceptual ideas for an evolutionary and relational understanding of urban sustainability transitions. The paper introduces new perspectives on sustainability transitions towards a better understanding of socio-spatial contexts.

Download Full-text

Geographic Information Retrieval and the World Wide Web: A Match Made in Electronic Space

Cartographic Perspectives ◽

10.14714/cp26.717 ◽

1997 ◽

pp. 13-26 ◽

Cited By ~ 2

Author(s):

David Johnson ◽

Myke Gluck

Keyword(s):

Information Retrieval ◽

World Wide Web ◽

Information Needs ◽

World Wide ◽

Data Retrieval ◽

Query Languages ◽

Geographic Information ◽

Geographic Information Retrieval ◽

Retrieval Systems ◽

The World

This article looks at the access to geographic information through a review of information science theory and its application to the WWW. The two most common retrieval systems are information and data retrieval. A retrieval system has seven elements: retrieval models, indexing, match and retrieval, relevance, order, query languages and query specification. The goal of information retrieval is to match the user's needs to the information that is in the system. Retrieval of geographic information is a combination of both information and data retrieval. Aids to effective retrieval of geographic information are: query languages that employ icons and natural language, automatic indexing of geographic information, and standardization of geographic information. One area that has seen an explosion of geographic information retrieval systems (GIR's) is the World Wide Web (WWW). The final section of this article discusses how seven WWW GIR's solve the the problem of matching the user's information needs to the information in the system.

Download Full-text

La percezione della qualitŕ dei prodotti tipici da parte del consumatore in Sicilia

ECONOMIA AGRO-ALIMENTARE ◽

10.3280/ecag2012-001007 ◽

2012 ◽

pp. 143-172

Author(s):

Gaetano Chinnici ◽

Biagio Pecorino ◽

Alessandro Scuderi

Keyword(s):

Information Needs ◽

Dietary Habits ◽

National Level ◽

Relevant Information ◽

Purchasing Decisions ◽

Farm Produce ◽

The Common ◽

Components Analysis ◽

Increasing Demand

The common agricultural policy over the years has expanded the tools of promotion and protection of farm produce quality. At the national level but also from Europe we are witnessing a change in consumer behavior: they become more and more relevant information needs, safety and food security, increasing demand for quality products and the willingness to pay for those products that meet consumer expectations. The paper focuses on the perceived quality of local products in order to identify those variables that influence purchasing decisions and dietary habits and consumer group. The survey was conducted using a principal components analysis to summarize the information that characterizes the choices of consumption, followed by cluster analysis which allowed us to confirm the presence of different segments of consumers of local products.

Download Full-text

Online Information Retrieval Systems Trending From Evolutionary to Revolutionary Approach

Advances in Library and Information Science - Advanced Methodologies and Technologies in Library Science, Information Management, and Scholarly Inquiry ◽

10.4018/978-1-5225-7659-4.ch024 ◽

2019 ◽

pp. 303-317

Author(s):

Zahid Ashraf Wani ◽

Huma Shafiq

Keyword(s):

Information Retrieval ◽

Information Needs ◽

Online Information ◽

Retrieval Systems ◽

Diverse Species ◽

Different Types ◽

The Future ◽

Information Retrieval Systems ◽

Information Professionals ◽

The Web

Nowadays, we all rely on cyberspace for our information needs. We make use of different types of search tools. Some of them have specialization in a specific format or two, while few can crawl a good portion of the web irrespective of formats. Therefore, it is very imperative for information professionals to have thorough understandings of these tools. As such, the chapter is an endeavor to delve deep and highlight various trends in online information retrieval from primitive to modern ones. The chapter also made an effort to envisage the future requirements and expectation keeping in view the ever-increasing dependence on diverse species of information retrieval tools.

Download Full-text

Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems

2007 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2007.370697 ◽

2007 ◽

Cited By ~ 8

Author(s):

Kyumars Sheykh Esmaili ◽

Hassan Abolhassani ◽

Mahmood Neshati ◽

Ehsan Behrangi ◽

Asreen Rostami ◽

...

Keyword(s):

Information Retrieval ◽

Test Collection ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Sistemas de informação e linguagens documentárias no contexto dos regimes de informação: um exercício conceitual

RDBCI Revista Digital de Biblioteconomia e Ciência da Informação ◽

10.20396/rdbci.v4i1.2038 ◽

2006 ◽

Vol 4 (1) ◽

pp. 102 ◽

Cited By ~ 1

Author(s):

Roberto J.G. Unger ◽

Isa Maria Freire

Keyword(s):

Information Retrieval ◽

Information Needs ◽

Social Dynamics ◽

Information Sources ◽

Context Information ◽

Retrieval Systems ◽

The Social ◽

Information Retrieval Systems ◽

Per Se ◽

Source Of Information

O artigo apresenta o conceito de regime de informação aos gestores de informação, como contribuição aos processos de adaptação e adequação de sistemas de informação e linguagens documentárias para atender às necessidades informacionais dos usuários. Regimes de informação são modos de produção informacional dominantes numa formação econômico-social que pressupõem, necessariamente, em seu contexto fontes de informação que são disseminadas e exercem influência no contexto social em que estão estabelecidas. Nesse aspecto, as sociedades têm regimes de informação através dos quais organizam a produção material e simbólica e representam a dinâmica das relações sociais. Dentre as diversas formas de manifestações institucionais atuais, destacam-se os sistemas de recuperação da informação, a manifestação per se do fenômeno que move o regime. Os sistemas de recuperação da informação, por sua vez, usam linguagens documentárias para organizar e comunicar a informação organizada nos inúmeros “agregados de informação”, que Barreto (1996) define como “estruturas” que armazenam “estoques de informação” e podem atuar como “agentes”, ou “mediadores”, entre uma fonte de informação e seus usuários. Abstract The article presents the concept of regime of information to information managers as a contribution for the proccesses of adaptation and adjustment of information systems and documentary language to really attend the information needs of users. Regimes of information are dominants modules of informational production in economic-social formation that presuppose, necessarily, in its context information sources wich are disseminated and put in actions influences in the structure which they are established. Under these circumstances, societies have regimes of information through whom organize symbolic and material production and represent the social dynamics relations. In the midst of several kinds of actual institutional manifestations, distinguish the information retrieval systems, the expression per se of the phenomenon that moves the regime. Under this configuration, the information retrieval systems make use of documentary language to organize, describe and communicate provided information in innumerable aggregates of information that, according Barreto (1996), “are structures which harvest “supply of information” and they operate as “agents” or “mediators” between a source of information and their users”.

Download Full-text