heterogeneous data
Recently Published Documents





2022 ◽  
Vol 40 (3) ◽  
pp. 1-36
Jinyuan Fang ◽  
Shangsong Liang ◽  
Zaiqiao Meng ◽  
Maarten De Rijke

Network-based information has been widely explored and exploited in the information retrieval literature. Attributed networks, consisting of nodes, edges as well as attributes describing properties of nodes, are a basic type of network-based data, and are especially useful for many applications. Examples include user profiling in social networks and item recommendation in user-item purchase networks. Learning useful and expressive representations of entities in attributed networks can provide more effective building blocks to down-stream network-based tasks such as link prediction and attribute inference. Practically, input features of attributed networks are normalized as unit directional vectors. However, most network embedding techniques ignore the spherical nature of inputs and focus on learning representations in a Gaussian or Euclidean space, which, we hypothesize, might lead to less effective representations. To obtain more effective representations of attributed networks, we investigate the problem of mapping an attributed network with unit normalized directional features into a non-Gaussian and non-Euclidean space. Specifically, we propose a hyperspherical variational co-embedding for attributed networks (HCAN), which is based on generalized variational auto-encoders for heterogeneous data with multiple types of entities. HCAN jointly learns latent embeddings for both nodes and attributes in a unified hyperspherical space such that the affinities between nodes and attributes can be captured effectively. We argue that this is a crucial feature in many real-world applications of attributed networks. Previous Gaussian network embedding algorithms break the assumption of uninformative prior, which leads to unstable results and poor performance. In contrast, HCAN embeds nodes and attributes as von Mises-Fisher distributions, and allows one to capture the uncertainty of the inferred representations. Experimental results on eight datasets show that HCAN yields better performance in a number of applications compared with nine state-of-the-art baselines.

2022 ◽  
Bing Sun ◽  
Zhuofang Ju

Abstract Under the background of green development, new energy vehicles(NEVs), as an important strategic emerging industry, play a crucial role in energy conservation and emission reduction. In the post-epidemic era, steadily promoting the promotion of NEVs will be a hot topic. Based on heterogeneous source data, combined with the Latent Dirichlet Allocation (LDA) topic model, Social Network Analysis (SNA), and econometric methods, this paper explores whether individual purchase decisions and company-level cooperative research and development will promote the promotion of new energy vehicles. The results show that whether BEV, HEV, or PHEV, users are more concerned about space dimension, power performance, and design style; Patent collaboration network analysis indicates that NEV enterprises are establishing close partnerships, which will urge the promotion of NEVs; For BEV and HEV models, new energy vehicle companies will invest more patents and R&D investment will better expedite the advancement of NEVs.

Semantic Web ◽  
2022 ◽  
pp. 1-24
Marlene Goncalves ◽  
David Chaves-Fraga ◽  
Oscar Corcho

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

2022 ◽  
Vol 119 (3) ◽  
pp. e2113658119
Guanghua Chi ◽  
Han Fang ◽  
Sourav Chatterjee ◽  
Joshua E. Blumenstock

Many critical policy decisions, from strategic investments to the allocation of humanitarian aid, rely on data about the geographic distribution of wealth and poverty. Yet many poverty maps are out of date or exist only at very coarse levels of granularity. Here we develop microestimates of the relative wealth and poverty of the populated surface of all 135 low- and middle-income countries (LMICs) at 2.4 km resolution. The estimates are built by applying machine-learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, and topographic maps, as well as aggregated and deidentified connectivity data from Facebook. We train and calibrate the estimates using nationally representative household survey data from 56 LMICs and then validate their accuracy using four independent sources of household survey data from 18 countries. We also provide confidence intervals for each microestimate to facilitate responsible downstream use. These estimates are provided free for public use in the hope that they enable targeted policy response to the COVID-19 pandemic, provide the foundation for insights into the causes and consequences of economic development and growth, and promote responsible policymaking in support of sustainable development.

2022 ◽  
Vol 12 (2) ◽  
pp. 670
Jamshid Tursunboev ◽  
Yong-Sung Kang ◽  
Sung-Bum Huh ◽  
Dong-Woo Lim ◽  
Jae-Mo Kang ◽  

Federated learning (FL) allows UAVs to collaboratively train a globally shared machine learning model while locally preserving their private data. Recently, the FL in edge-aided unmanned aerial vehicle (UAV) networks has drawn an upsurge of research interest due to a bursting increase in heterogeneous data acquired by UAVs and the need to build the global model with privacy; however, a critical issue is how to deal with the non-independent and identically distributed (non-i.i.d.) nature of heterogeneous data while ensuring the convergence of learning. To effectively address this challenging issue, this paper proposes a novel and high-performing FL scheme, namely, the hierarchical FL algorithm, for the edge-aided UAV network, which exploits the edge servers located in base stations as intermediate aggregators with employing commonly shared data. Experiment results demonstrate that the proposed hierarchical FL algorithm outperforms several baseline FL algorithms and exhibits better convergence behavior.

2022 ◽  
Vol 0 (0) ◽  
Christoph Gröger

Abstract The digital transformation generates huge amounts of heterogeneous data across the industrial value chain, from simulation data in engineering, over sensor data in manufacturing to telemetry data on product use. Extracting insights from these data constitutes a critical success factor for industrial enterprises, e. g., to optimize processes and enhance product features. This is referred to as industrial analytics, i. e., data analytics for industrial value creation. Industrial analytics is an interdisciplinary subject area between data science and industrial engineering and is at the core of Industry 4.0. Yet, existing literature on industrial analytics is fragmented and specialized. To address this issue, this paper presents a holistic overview of the field of industrial analytics integrating both current research as well as industry experiences on real-world industrial analytics projects. We define key terms, describe typical use cases and discuss characteristics of industrial analytics. Moreover, we present a conceptual framework for industrial analytics that structures essential elements, e. g., data platforms and data roles. Finally, we conclude and highlight future research directions.

2022 ◽  
Vol 2022 ◽  
pp. 1-12
Huazhen Liu ◽  
Wei Wang ◽  
Yihan Zhang ◽  
Renqian Gu ◽  
Yaqi Hao

Explicit feedback and implicit feedback are two important types of heterogeneous data for constructing a recommendation system. The combination of the two can effectively improve the performance of the recommendation system. However, most of the current deep learning recommendation models fail to fully exploit the complementary advantages of two types of data combined and usually only use binary implicit feedback data. Thus, this paper proposes a neural matrix factorization recommendation algorithm (EINMF) based on explicit-implicit feedback. First, neural network is used to learn nonlinear feature of explicit-implicit feedback of user-item interaction. Second, combined with the traditional matrix factorization, explicit feedback is used to accurately reflect the explicit preference and the potential preferences of users to build a recommendation model; a new loss function is designed based on explicit-implicit feedback to obtain the best parameters through the neural network training to predict the preference of users for items; finally, according to prediction results, personalized recommendation list is pushed to the user. The feasibility, validity, and robustness are fully demonstrated in comparison with multiple baseline models on two real datasets.

Sign in / Sign up

Export Citation Format

Share Document