Data Intelligence
Latest Publications


TOTAL DOCUMENTS

111
(FIVE YEARS 111)

H-INDEX

8
(FIVE YEARS 8)

Published By Mit Press

2641-435x

2021 ◽  
pp. 1-24
Author(s):  
Qiushuo Zheng ◽  
Hao Wen ◽  
Meng Wang ◽  
Guilin Qi

Abstract Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships, largely neglecting fine-grained scene understanding. In fact, many data-driven applications on the web (e.g. newsreading and e-shopping) require to accurately recognize much less coarse concepts as entities and properly link to a knowledge graph, which can take their performance to the next level. In light of this, in this paper, we identify a new research task: visual entity linking for fine-grained scene understanding. To accomplish the task, we first extract features of candidate entities from different modalities, i.e., visual features, textual features, and KG features. Then, we design a deep modal-attention neural network-based learning-to-rank method aggregates all features and map visual objects to the entities in KG. Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46% to 83.16% comparing with baselines.


2021 ◽  
pp. 1-20
Author(s):  
Shusaku Egami ◽  
Takahiro Kawamura ◽  
Kouji Kozaki ◽  
Akihiko Ohsuga

Abstract Urban areas have many problems, including homelessness, graffiti, and littering. These problems are influenced by various factors and are linked to each other; thus, an understanding of the problem structure is required in order to detect and solve the root problems that generate vicious cycles. Moreover, before implementing action plans to solve these problems, local governments need to estimate cost-effectiveness when the plans are carried out. Therefore, this paper proposes constructing an urban problem knowledge graph that would include urban problems' causality and the related cost information in budget sheets. In addition, this paper proposes a method for detecting vicious cycles of urban problems using SPARQL queries with inference rules from the knowledge graph. Finally, several root problems that led to vicious cycles were detected. Urban-problem experts evaluated the extracted causal relations.


2021 ◽  
pp. 1-20
Author(s):  
Quan-Hoang Vuong ◽  
Manh-Toan Ho ◽  
Viet-Phuong La ◽  
Tam-Tri Le ◽  
Thanh Huyen T. Nguyen ◽  
...  

Abstract Video gaming has been rising rapidly to become one of the primary entertainment media, especially during the COVID-19 pandemic. Playing video games has been reported to associate with many psychological and behavioral traits. However, little is known about the connections between game players’ behaviors in the virtual environment and environmental perceptions. Thus, the current dataset offers valuable resources regarding environmental worldviews and behaviors in the virtual world of 640 Animal Crossing: New Horizons (ACNH) gameplayers from 29 countries around the globe. The dataset consists of six major categories: 1) socio-demographic profile, 2) COVID-19 concern, 3) environmental perception, 4) game-playing habit, 5) in-game behavior, and 6) game-playing feeling. By making this dataset open, we aim to provide policymakers, game producers, and researchers with valuable resources for understanding the interactions between behaviors in the virtual world and environmental perceptions, which could help produce video games in compliance with the United Nations (UN) Sustainable Development Goals.


2021 ◽  
pp. 1-14
Author(s):  
Ebtisam Alharbi ◽  
Rigina Skeva ◽  
Nick Juty ◽  
Caroline Jay ◽  
Carole Goble

Abstract The findable, accessible, interoperable, reusable (FAIR) principles for scientific data management and stewardship aim to facilitate data reuse at scale by both humans and machines. Research and development (R&D) in the pharmaceutical industry is becoming increasingly data driven, but managing its data assets according to FAIR principles remains costly and challenging. To date, little scientific evidence exists about how FAIR is currently implemented in practice, what its associated costs and benefits are, and how decisions are made about the retrospective FAIRification of datasets in pharmaceutical R&D. This paper reports the results of semi-structured interviews with 14 pharmaceutical professionals who participate in various stages of drug R&D in 7 pharmaceutical businesses. Inductive thematic analysis identified three primary themes of the benefits and costs of FAIRification, and the elements that influence the decision-making process for FAIRifying legacy datasets. Participants collectively acknowledged the potential contribution of FAIRification to data reusability in diverse research domains and the subsequent potential for cost-savings. Implementation costs, however, were still considered a barrier by participants, with the need for considerable expenditure in terms of resources, and cultural change. How decisions were made about FAIRification was influenced by legal and ethical considerations, management commitment, and data prioritisation. The findings have significant implications for those in the pharmaceutical R&D industry who are engaged in driving FAIR implementation, and for external parties who seek to better understand existing practices and challenges.


2021 ◽  
pp. 1-24
Author(s):  
Minh-Hoang Nguyen

Abstract Urban humans and biodiversity-related concepts are interacting with each other in many negative and positive ways. The biodiversity provides a wide array of provision and cultural-ecological services to urban residents, but it is being overexploited to the point of crisis. The crisis is largely driven by the expanding illegal wildlife trade in developing countries with a high urbanization rate and biodiversity level like Vietnam. While supply-side measures are ineffective in reducing biodiversity loss, researchers have suggested demand-side measures as supplements, such as social marketing campaigns and law enforcement in urban areas. Moreover, urban residents are also potential visitors to urban public parks and national parks, which helps generate finance for biodiversity preservation and conservation in those places. Understanding how urban residents' perceptions towards biodiversity and biodiversity-related behaviors can help improve the effectiveness of conservation efforts and sustainable urban development. Thus, this article presents a dataset of 535 urban residents' wildlife consumption behaviors, multifaceted perceptions and interactions with biodiversity-related concepts, and nature-based recreation demand. The dataset is constructed with six major categories: 1) wildlife product consumption, 2) general biodiversity perceptions, 3) biodiversity at home and neighborhood, 4) public park visitation and motivations, 5) national park visitation and motivations, and 6) socio-demographic profiles. These resources are expected to support researchers in enriching the lax literature regarding the role of urban residents in biodiversity conservation and preservation, and help policymakers to find insights for building up an “eco-surplus culture” among urban residents through effective public communication and policymaking.


2021 ◽  
pp. 1-21
Author(s):  
Wenguang Wang ◽  
Yonglin Xu ◽  
Chunhui Du ◽  
Yunwen Chen ◽  
Yijie Wang ◽  
...  

Abstract With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, knowledge graph technology has been in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.


2021 ◽  
pp. 1-26
Author(s):  
Jeff Z. Pan ◽  
Elspeth Edelstein ◽  
Patrik Bansky ◽  
Adam Wyner

Abstract Recent success of knowledge graphs has spurred interest in applying knowledge graphs in open science, such as on intelligent survey systems for scientists. However, efforts to understand the quality of candidate survey questions provided by these methods have been limited. Indeed, existing methods do not consider the type of on-the-fly content planning that is possible for face-to-face surveys and hence do not guarantee that selection of subsequent questions is based on response to previous questions in a survey. To address this limitation, we propose a dynamic and informative solution for an intelligent survey system that is based on knowledge graphs. To illustrate our proposal, we look into social science surveys, focusing on ordering the questions of a questionnaire component by their level of acceptance, along with conditional triggers that further customise participants’ experience. Our main findings are: (i) evaluation of the proposed approach shows that the dynamic component can be beneficial in terms of lowering the number of questions asked per variable, thus allowing more informative data to be collected in a survey of equivalent length; and (ii) a primary advantage of the proposed approach is that it enables grouping of participants according to their responses, so that participants are not only served appropriate follow-up questions, but their responses to these questions may be analysed in the context of some initial categorisation. We believe that the proposed approach can easily be applied to other social science surveys based on grouping definitions in their contexts. The knowledge-graph-based intelligent survey approach proposed in our work allows online questionnaires to approach face-to-face interaction in their level of informativity and responsiveness, as well as duplicating certain advantages of interview-based data collection.


2021 ◽  
pp. 1-17
Author(s):  
Qian Guo ◽  
Wei Chen ◽  
Huaiyu Wan

Abstract Personalized search is a promising way to improve the quality of web search, and it has attracted much attention from both academic and industrial communities. Much of the current related research is based on commercial search engine data, which can not be released publicly for such reasons as privacy protection and information security. This leads to a serious lack of accessible public datasets in this field. The few available datasets though released to the public have not become widely used in academia due to the complexity of the processing process. The lack of datasets together with the difficulties of data processing have brought obstacles to fair comparison and evaluation of personalized search models. In this paper, we constructed a large-scale dataset AOL4PS to evaluate personalized search methods, collected and processed from AOL query logs. We present the complete and detailed data processing and construction process. Specifically, to address the challenges of processing time and storage space demands brought by massive data volumes, we optimized the process of dataset construction and proposed an improved BM25 algorithm. Experiments are performed on AOL4PS with some classic and state-of-the-art personalized search methods, and the experiment results demonstrate that AOL4PS can measure the effect of personalized search models. AOL4PS is publicly available at http://github.com/wanhuaiyu/AOL4PS.


2021 ◽  
pp. 1-25
Author(s):  
Alexander Arefolov ◽  
Laura Adam ◽  
Shoshana Brown ◽  
Yelena Budovskaya ◽  
Cong Chen ◽  
...  

Abstract The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.


2021 ◽  
pp. 1-13
Author(s):  
Chaojie Wen ◽  
Tao Chen ◽  
Xudong Jia ◽  
Jiang Zhu

Abstract Medical named entity recognition (NER) is an area in which medical named entities are recognized from medical texts, such as diseases, drugs, surgery reports, anatomical parts, examination documents, and so on. Conventional medical NER methods do not make full use of un-labelled medical texts embedded in medical documents. To address this issue, we propose a medical NER approach based on pre-trained language models and a domain dictionary. First, we construct a medical entity dictionary by extracting medical entities from labelled medical texts and collecting medical entities from other resources, such as the Yidu-N4K dataset. Second, we employ this dictionary to train domain-specific pre-trained language models using un-labelled medical texts. Third, we employ a pseudo labelling mechanism in un-labelled medical texts to automatically annotate texts and create pseudo labels. Fourth, the BiLSTM-CRF sequence tagging model is used to fine-tune the pre-trained language models. Our experiments on the un-labelled medical texts, which are extracted from Chinese electronic medical records, show that the proposed NER approach enables the strict and relaxed F1 scores to be 88.7% and 95.3%, respectively.


Sign in / Sign up

Export Citation Format

Share Document