A content-based dataset recommendation system for researchers—a case study on Gene Expression Omnibus (GEO) repository

Abstract It is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. Sharing data with fellow researchers helps in increasing the visibility of the work. On the other hand, there are researchers who are inhibited by the lack of data resources. To overcome this challenge, many repositories and knowledge bases have been established to date to ease data sharing. Further, in the past two decades, there has been an exponential increase in the number of datasets added to these dataset repositories. However, most of these repositories are domain-specific, and none of them can recommend datasets to researchers/users. Naturally, it is challenging for a researcher to keep track of all the relevant repositories for potential use. Thus, a dataset recommender system that recommends datasets to a researcher based on previous publications can enhance their productivity and expedite further research. This work adopts an information retrieval (IR) paradigm for dataset recommendation. We hypothesize that two fundamental differences exist between dataset recommendation and PubMed-style biomedical IR beyond the corpus. First, instead of keywords, the query is the researcher, embodied by his or her publications. Second, to filter the relevant datasets from non-relevant ones, researchers are better represented by a set of interests, as opposed to the entire body of their research. This second approach is implemented using a non-parametric clustering technique. These clusters are used to recommend datasets for each researcher using the cosine similarity between the vector representations of publication clusters and datasets. The maximum normalized discounted cumulative gain at 10 (NDCG@10), precision at 10 (p@10) partial and p@10 strict of 0.89, 0.78 and 0.61, respectively, were obtained using the proposed method after manual evaluation by five researchers. As per the best of our knowledge, this is the first study of its kind on content-based dataset recommendation. We hope that this system will further promote data sharing, offset the researchers’ workload in identifying the right dataset and increase the reusability of biomedical datasets. Database URL: http://genestudy.org/recommends/#/

Download Full-text

A content-based literature recommendation system for datasets to improve data reusability – A case study on Gene Expression Omnibus (GEO) datasets

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103399 ◽

2020 ◽

Vol 104 ◽

pp. 103399 ◽

Cited By ~ 3

Author(s):

Braja Gopal Patra ◽

Vahed Maroufy ◽

Babak Soltanalizadeh ◽

Nan Deng ◽

W. Jim Zheng ◽

...

Keyword(s):

Gene Expression ◽

Recommendation System ◽

Gene Expression Omnibus

Download Full-text

From Mattering to Mattering More: ‘Goods’ and ‘Bads’ in Ageing and Innovation Policy Discourses

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18147596 ◽

2021 ◽

Vol 18 (14) ◽

pp. 7596

Author(s):

Carla Greubel ◽

Ellen H. M. Moors ◽

Alexander Peine

Keyword(s):

Data Sharing ◽

Unmet Needs ◽

Innovation Policy ◽

Methodological Approach ◽

Empirical Ethics ◽

Healthy Ageing ◽

Active And Healthy Ageing ◽

The Right ◽

Policy Discourses

This paper provides an empirical ethics analysis of the goods and bads enacted in EU ageing and innovation policy discourses. It revolves around a case study of the persona Maria, developed as part of the EU’s Active and Healthy Ageing Policies. Drawing on Pols’ empirical ethics as a theoretical and methodological approach, we describe the variety of goods (practices/situations to be strived for) and bads (practices/situations to be avoided) that are articulated in Maria’s persona. We analyse how certain ideas about good and bad ageing—those associated with the use of sophisticated technologies—come to matter more in the solutions proposed for Maria and the framing of her unmet needs, while others which were initially seen as relevant and that describe her dreams, fears and interactions, are marginalised. The paper adds to existing studies of ageing and technology by analysing specific practices that render visible how the idea of technology and data sharing as evidently the right path towards futures of (good) ageing, comes to prevail.

Download Full-text

Analysis on Research Paper Publication Recommendation System with Composition of Papers and Conferences Matrices

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207330 ◽

2020 ◽

pp. 120-127

Author(s):

Htay Htay Win ◽

Aye Thida Myint ◽

Mi Cho Cho

Keyword(s):

Social Network ◽

Dimensionality Reduction ◽

Correspondence Analysis ◽

Recommendation System ◽

Research Paper ◽

Topic Modelling ◽

Research Papers ◽

The Right ◽

Publication Recommendation

For years, achievements and discoveries made by researcher are made aware through research papers published in appropriate journals or conferences. Many a time, established s researcher and mainly new user are caught up in the predicament of choosing an appropriate conference to get their work all the time. Every scienti?c conference and journal is inclined towards a particular ?eld of research and there is a extensive group of them for any particular ?eld. Choosing an appropriate venue is needed as it helps in reaching out to the right listener and also to further one’s chance of getting their paper published. In this work, we address the problem of recommending appropriate conferences to the authors to increase their chances of receipt. We present three di?erent approaches for the same involving the use of social network of the authors and the content of the paper in the settings of dimensionality reduction and topic modelling. In all these approaches, we apply Correspondence Analysis (CA) to obtain appropriate relationships between the entities in question, such as conferences and papers. Our models show hopeful results when compared with existing methods such as content-based ?ltering, collaborative ?ltering and hybrid ?ltering.

Download Full-text

Cyberphysical Design Automation Framework for Knowledge-based Engineering

Journal of Innovation Management ◽

10.24840/2183-0606_001.001_0011 ◽

2013 ◽

Vol 1 (1) ◽

pp. 158-178

Author(s):

Urcun John Tanik

Keyword(s):

System Design ◽

Design Process ◽

Design Automation ◽

Design Theory ◽

Knowledge Bases ◽

Knowledge Based ◽

Knowledge Based Engineering ◽

Cyberphysical System ◽

Automation Framework

Cyberphysical system design automation utilizing knowledge based engineering techniques with globally networked knowledge bases can tremendously improve the design process for emerging systems. Our goal is to develop a comprehensive architectural framework to improve the design process for cyberphysical systems (CPS) and implement a case study with Axiomatic Design Solutions Inc. to develop next generation toolsets utilizing knowledge-based engineering (KBE) systems adapted to multiple domains in the field of CPS design automation. The Cyberphysical System Design Automation Framework (CPSDAF) will be based on advances in CPS design theory based on current research and knowledge collected from global sources automatically via Semantic Web Services. A case study utilizing STEM students is discussed.

Download Full-text

Is data sharing the right step towards open science?

Editage Insights ◽

10.34193/ei-a-6052 ◽

2019 ◽

Author(s):

Sneha Kulkarni

Keyword(s):

Data Sharing ◽

Open Science ◽

The Right

Download Full-text

The Calculation of Turbulent Diffusion Coefficients in Aquatic Ecosystems

Revista de Chimie ◽

10.37358/rc.19.11.7669 ◽

2019 ◽

Vol 70 (11) ◽

pp. 3903-3907

Author(s):

Galina Marusic ◽

Valeriu Panaitescu

Keyword(s):

Transport Equation ◽

Turbulent Diffusion ◽

Diffusion Coefficients ◽

Aquatic Ecosystems ◽

Specific Area ◽

Left Bank ◽

The Right ◽

Prut River

The paper deals with the issues related to the pollution of aquatic ecosystems. The influence of turbulence on the transport and dispersion of pollutants in the mentioned systems, as well as the calculation of the turbulent diffusion coefficients are studied. A case study on the determination of turbulent diffusion coefficients for some sectors of the Prut River is presented. A new method is proposed for the determination of the turbulent diffusion coefficients in the pollutant transport equation for specific sectors of a river, according to the associated number of P�clet, calculated for each specific area: the left bank, the right bank and the middle of the river.

Download Full-text

Streamlining States’ Data Sharing for Advanced Onsite Product Approval: Using the Chesapeake Bay Watershed States as a Case Study for a Broader National Program

Proceedings of the Water Environment Federation ◽

10.2175/193864715819541701 ◽

2015 ◽

Vol 2015 (11) ◽

pp. 2954-2960

Author(s):

Maureen Pepper ◽

Joyce Hudson ◽

Mark Nelson ◽

Gemma Kite

Keyword(s):

Chesapeake Bay ◽

Data Sharing ◽

National Program ◽

Product Approval ◽

Chesapeake Bay Watershed ◽

States Data

Download Full-text

Quality Control for New Rights in International Human Rights Law: A Case Study of the Right to a Good Enviroment

Australian Year Book of International Law ◽

10.22145/aybil.33.4 ◽

2015 ◽

Vol 33 ◽

Author(s):

Bridget Lewis

Keyword(s):

Quality Control ◽

Human Rights ◽

International Human Rights Law ◽

Human Rights Law ◽

International Human Rights ◽

International Human ◽

The Right

Download Full-text

Traditionalism and politics: A case study of Northern Nigeria

Government and Opposition ◽

10.1111/j.1477-7053.1967.tb01182.x ◽

1967 ◽

Vol 2 (4) ◽

pp. 509-524 ◽

Cited By ~ 2

Author(s):

B. J. O. Dudley

Keyword(s):

Social Change ◽

Local Government ◽

Status Quo ◽

Political Elite ◽

High Time ◽

The Political ◽

Northern Nigeria ◽

The North ◽

The Right

In the debate on the Native Authority (Amendment) Law of 1955, the late Premier of the North, Sir Ahmadu Bello, Sardauna of Sokoto, replying to the demand that ‘it is high time in the development of local government systems in this Region that obsolete and undemocratic ways of appointing Emirs’ Councils should close’, commented that ‘the right traditions that we have gone away from are the cutting off of the hands of thieves, and that has caused a lot of thieving in this country. Why should we not be cutting (off) the hands of thieves in order to reduce thieving? That is logical and it is lawful in our tradition and custom here.’ This could be read as a defence against social change, a recrudescence of ‘barbarism’ after the inroads of pax Britannica, and a plea for the retention of the status quo and the entrenched privilege of the political elite.

Download Full-text

A clarified typology of core-periphery structure in networks

Science Advances ◽

10.1126/sciadv.abc9800 ◽

2021 ◽

Vol 7 (12) ◽

pp. eabc9800

Author(s):

Ryan J. Gallagher ◽

Jean-Gabriel Young ◽

Brooke Foucault Welles

Keyword(s):

Dense Core ◽

Block Model ◽

Hub And Spoke ◽

Block Modeling ◽

Domain Specific ◽

The Core ◽

Detailed Case ◽

Rich Diversity ◽

Modeling Techniques

Core-periphery structure, the arrangement of a network into a dense core and sparse periphery, is a versatile descriptor of various social, biological, and technological networks. In practice, different core-periphery algorithms are often applied interchangeably despite the fact that they can yield inconsistent descriptions of core-periphery structure. For example, two of the most widely used algorithms, the k-cores decomposition and the classic two-block model of Borgatti and Everett, extract fundamentally different structures: The latter partitions a network into a binary hub-and-spoke layout, while the former divides it into a layered hierarchy. We introduce a core-periphery typology to clarify these differences, along with Bayesian stochastic block modeling techniques to classify networks in accordance with this typology. Empirically, we find a rich diversity of core-periphery structure among networks. Through a detailed case study, we demonstrate the importance of acknowledging this diversity and situating networks within the core-periphery typology when conducting domain-specific analyses.

Download Full-text