Context-Aware Expert Finding in Tag Based Knowledge Sharing Communities

With the rapid development of online Knowledge Sharing Communities (KSCs), the problem of finding experts becomes increasingly important for knowledge propagation and putting crowd wisdom to work. A recent development trend of KSCs is to allow users to add text tags for annotating their posts, which are more accurate than traditional category information. However, how to leverage these user-generated tags for finding experts is still underdeveloped. To this end, this paper develops a novel approach for finding experts in tag based KSCs by leveraging tag context and the semantic relationship between tags. Specifically, the extracted prior knowledge and user profiles are first used for enriching the query tags to infer tag context, which represents the user’s latent information needs. Specifically, two different approaches for addressing the problem of tag sparseness in authority ranking are proposed. The first is a memory-based collaborative filtering approach, which leverages non-negative matrix factorization (NMF) to find similar users for alleviating tag sparseness. The second approach is based on Latent Dirichlet Allocation (LDA) topic model, which can further capture the latent semantic relationship between tags. A large-scale real-world data set is collected from a tag based Chinese commercial Q&A web site. Experimental results show that the proposed method outperforms several baseline methods with a significant margin.

Download Full-text

Transportation Index Computation: A Development Theme Mining-Based Approach

The Computer Journal ◽

10.1093/comjnl/bxaa102 ◽

2020 ◽

Author(s):

Gang Han ◽

Menggang Li ◽

Yiduo Mei ◽

Deming Li

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Rapid Development ◽

Development Trend ◽

Dynamic Monitoring ◽

Road Transportation ◽

Slow Development ◽

Belt And Road ◽

The Development Trend ◽

The Belt And Road

Abstract In order to comprehensively evaluate the achievements of the 'Belt and Road' in integrated transportation, researchers need to optimize the method of generating evaluation indices and construct the framework structure of the 'Belt and Road' transportation index system. This paper used GDELT database as data source and obtained full text data of English news in 25 countries along ‘the Belt and Road’. The paper also introduced the topic model, combined with the unsupervised method (latent Dirichlet allocation, LDA) and the supervision method (labeled LDA) to mine the topics contained in the news data. It constructed the transportation development model and analyzed the development trend of transportation in various countries. The study found that the development trend of transportation in the countries along the line is unbalanced, which can be divided into four types: rapid development type, stable development type, slow development type and lagging development type. The method of this paper can effectively extract temporal and spatial variation of news events, discover potential risks in various countries, support real-time and dynamic monitoring of the social development situation of the countries along the border and provide auxiliary decision support for implementation of the ‘the Belt and Road’ initiative, which has important application value.

Download Full-text

Decoding brain activity using a large-scale probabilistic functional-anatomical atlas of human cognition

10.1101/059618 ◽

2016 ◽

Cited By ~ 4

Author(s):

Timothy N. Rubin ◽

Oluwasanmi Koyejo ◽

Krzysztof J. Gorgolewski ◽

Michael N. Jones ◽

Russell A. Poldrack ◽

...

Keyword(s):

Large Scale ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Brain Activity ◽

Human Cognition ◽

Brain Images ◽

Whole Brain ◽

Context Sensitive ◽

Cognitive States ◽

Small Set

AbstractA central goal of cognitive neuroscience is to decode human brain activity--i.e., to infer mental processes from observed patterns of whole-brain activation. Previous decoding efforts have focused on classifying brain activity into a small set of discrete cognitive states. To attain maximal utility, a decoding framework must be open-ended, systematic, and context-sensitive--i.e., capable of interpreting numerous brain states, presented in arbitrary combinations, in light of prior information. Here we take steps towards this objective by introducing a Bayesian decoding framework based on a novel topic model---Generalized Correspondence Latent Dirichlet Allocation---that learns latent topics from a database of over 11,000 published fMRI studies. The model produces highly interpretable, spatially-circumscribed topics that enable flexible decoding of whole-brain images. Importantly, the Bayesian nature of the model allows one to “seed” decoder priors with arbitrary images and text--enabling researchers, for the first time, to generative quantitative, context-sensitive interpretations of whole-brain patterns of brain activity.

Download Full-text

Multi-Robot SLAM in Dynamic Environments with Parallel Maps

International Journal of Humanoid Robotics ◽

10.1142/s0219843621500110 ◽

2021 ◽

pp. 2150011

Author(s):

Sajad Badalkhani ◽

Ramazan Havangi ◽

Mohsen Farshad

Keyword(s):

Large Scale ◽

Dynamic Environment ◽

Dynamic Environments ◽

Extensive Literature ◽

Real World Data ◽

Data Set ◽

Cooperative Approach ◽

Localization And Mapping ◽

Multi Robot

There is an extensive literature regarding multi-robot simultaneous localization and mapping (MRSLAM). In most part of the research, the environment is assumed to be static, while the dynamic parts of the environment degrade the estimation quality of SLAM algorithms and lead to inherently fragile systems. To enhance the performance and robustness of the SLAM in dynamic environments (SLAMIDE), a novel cooperative approach named parallel-map (p-map) SLAM is introduced in this paper. The objective of the proposed method is to deal with the dynamics of the environment, by detecting dynamic parts and preventing the inclusion of them in SLAM estimations. In this approach, each robot builds a limited map in its own vicinity, while the global map is built through a hybrid centralized MRSLAM. The restricted size of the local maps, bounds computational complexity and resources needed to handle a large scale dynamic environment. Using a probabilistic index, the proposed method differentiates between stationary and moving landmarks, based on their relative positions with other parts of the environment. Stationary landmarks are then used to refine a consistent map. The proposed method is evaluated with different levels of dynamism and for each level, the performance is measured in terms of accuracy, robustness, and hardware resources needed to be implemented. The method is also evaluated with a publicly available real-world data-set. Experimental validation along with simulations indicate that the proposed method is able to perform consistent SLAM in a dynamic environment, suggesting its feasibility for MRSLAM applications.

Download Full-text

Topology and Topic-Aware Service Clustering

International Journal of Web Services Research ◽

10.4018/ijwsr.2018070102 ◽

2018 ◽

Vol 15 (3) ◽

pp. 18-37 ◽

Cited By ~ 6

Author(s):

Weifeng Pan ◽

Jilei Dong ◽

Kun Liu ◽

Jing Wang

Keyword(s):

Service Discovery ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Empirical Evaluation ◽

Single Type ◽

Bipartite Network ◽

Real World Data ◽

Service Usage ◽

Data Set ◽

Service Clustering

This article describes how the number of services and their types being so numerous makes accurately discovering desired services become a problem. Service clustering is an effective way to facilitate service discovery. However, the existing approaches are usually designed for a single type of service documents, neglecting to fully use the topic and topological information in service profiles and usage histories. To avoid these limitations, this article presents a novel service clustering approach. It adopts a bipartite network to describe the topological structure of service usage histories and uses a SimRank algorithm to measure the topological similarity of services; It applies Latent Dirichlet Allocation to extract topics from service profiles and further quantifies the topic similarity of services; It quantifies the similarity of services by integrating topological and topic similarities; It uses the Chameleon clustering algorithm to cluster the services. The empirical evaluation on real-world data set highlights the benefits provided by the combination of topological and topic similarities.

Download Full-text

Exploring the Citywide Human Mobility Patterns of Taxi Trips through a Topic-Modeling Analysis

Journal of Advanced Transportation ◽

10.1155/2021/6697827 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hui Xiong ◽

Kaiqiang Xie ◽

Lu Ma ◽

Feng Yuan ◽

Rui Shen

Keyword(s):

Power Law ◽

Traffic Management ◽

Large Scale ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Human Mobility ◽

Mobility Patterns ◽

Power Law Distribution ◽

Modeling Analysis ◽

Wide Range

Understanding human mobility patterns is of great importance for a wide range of applications from social networks to transportation planning. Toward this end, the spatial-temporal information of a large-scale dataset of taxi trips was collected via GPS, from March 10 to 23, 2014, in Beijing. The data contain trips generated by a great portion of taxi vehicles citywide. We revealed that the geographic displacement of those trips follows the power law distribution and the corresponding travel time follows a mixture of the exponential and power law distribution. To identify human mobility patterns, a topic model with the latent Dirichlet allocation (LDA) algorithm was proposed to infer the sixty-five key topics. By measuring the variation of trip displacement over time, we find that the travel distance in the morning rush hour is much shorter than that in the other time. As for daily patterns, it shows that taxi mobility presents weekly regularity both on weekdays and on weekends. Among different days in the same week, mobility patterns on Tuesday and Wednesday are quite similar. By quantifying the trip distance along time, we find that Topic 44 exhibits dominant patterns, which means distance less than 10 km is predominant no matter what time in a day. The findings could be references for travelers to arrange trips and policymakers to formulate sound traffic management policies.

Download Full-text

The Development Trend of Modern Technology of Electronic Engineering

Smart Construction Research ◽

10.18063/scr.v0.623 ◽

2018 ◽

Author(s):

Tong Liu ◽

Zhijun Cao ◽

Liang Cheng ◽

Yanfeng Liu ◽

Haipeng Zhang

Keyword(s):

Large Scale ◽

Rapid Development ◽

Modern Society ◽

Development Trend ◽

Modern Technology ◽

Engineering Technology ◽

Electronic Engineering ◽

The Development Trend ◽

Modern Technologies ◽

Universal Application

With the rapid development of science and technology in China, a large number of scientific and modern technologies have been developed and widely used. Among these technologies, electronic engineering technology has received great attention in many domestic industries and has been widely applied. Over the years, the technology of electronic engineering modernization not only is very common in many large-scale projects, but also in the production of household appliances facilities and intelligent integration of manufacturing facilities has a universal application. As one of the modern technologies, electronic engineering technology and modern society People's Daily life and work are closely related. According to the research achievements in this field over the years, the article focuses on analyzing the trend of the development of modern electronic engineering technology, to make contribution to further with the research and application of this technology.

Download Full-text

Locational privacy-preserving distance computations with intersecting sets of randomly labeled grid points

International Journal of Health Geographics ◽

10.1186/s12942-021-00268-y ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Rainer Schnell ◽

Jonas Klingwort ◽

James M. Farrow

Keyword(s):

Programming Languages ◽

Real World ◽

Spatial Data ◽

Large Scale ◽

Privacy Preserving ◽

Real World Data ◽

Data Set ◽

Additional Information ◽

Grid Points ◽

High Level

Abstract Background We introduce and study a recently proposed method for privacy-preserving distance computations which has received little attention in the scientific literature so far. The method, which is based on intersecting sets of randomly labeled grid points, is henceforth denoted as ISGP allows calculating the approximate distances between masked spatial data. Coordinates are replaced by sets of hash values. The method allows the computation of distances between locations L when the locations at different points in time t are not known simultaneously. The distance between $$L_1$$ L 1 and $$L_2$$ L 2 could be computed even when $$L_2$$ L 2 does not exist at $$t_1$$ t 1 and $$L_1$$ L 1 has been deleted at $$t_2$$ t 2 . An example would be patients from a medical data set and locations of later hospitalizations. ISGP is a new tool for privacy-preserving data handling of geo-referenced data sets in general. Furthermore, this technique can be used to include geographical identifiers as additional information for privacy-preserving record-linkage. To show that the technique can be implemented in most high-level programming languages with a few lines of code, a complete implementation within the statistical programming language R is given. The properties of the method are explored using simulations based on large-scale real-world data of hospitals ($$n=850$$ n = 850 ) and residential locations ($$n=13,000$$ n = 13 , 000 ). The method has already been used in a real-world application. Results ISGP yields very accurate results. Our simulation study showed that—with appropriately chosen parameters – 99 % accuracy in the approximated distances is achieved. Conclusion We discussed a new method for privacy-preserving distance computations in microdata. The method is highly accurate, fast, has low computational burden, and does not require excessive storage.

Download Full-text

Something’s Missing? A Procedure for Extending Item Content Data Sets in the Context of Recommender Systems

Information Systems Frontiers ◽

10.1007/s10796-020-10071-y ◽

2020 ◽

Author(s):

Bernd Heinrich ◽

Marcus Hopf ◽

Daniel Lohninger ◽

Alexander Schiller ◽

Michael Szubartowicz

Keyword(s):

Recommender Systems ◽

Additional Data ◽

Rapid Development ◽

Data Sets ◽

Web Portals ◽

Real World Data ◽

Data Set ◽

Item Content ◽

Choice Literature

Abstract The rapid development of e-commerce has led to a swiftly increasing number of competing providers in electronic markets, which maintain their own, individual data describing the offered items. Recommender systems are popular and powerful tools relying on this data to guide users to their individually best item choice. Literature suggests that data quality of item content data has substantial influence on recommendation quality. Thereby, the dimension completeness is expected to be particularly important. Herein resides a considerable chance to improve recommendation quality by increasing completeness via extending an item content data set with an additional data set of the same domain. This paper therefore proposes a procedure for such a systematic data extension and analyzes effects of the procedure regarding items, content and users based on real-world data sets from four leading web portals. The evaluation results suggest that the proposed procedure is indeed effective in enabling improved recommendation quality.

Download Full-text

CLDA: An Effective Topic Model for Mining User Interest Preference under Big Data Background

Complexity ◽

10.1155/2018/2503816 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Lirong Qiu ◽

Jia Yu

Keyword(s):

Big Data ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

User Interest ◽

Text Data ◽

Data Set ◽

Data Sparsity ◽

Short Text ◽

Text Filtering

In the present big data background, how to effectively excavate useful information is the problem that big data is facing now. The purpose of this study is to construct a more effective method of mining interest preferences of users in a particular field in the context of today’s big data. We mainly use a large number of user text data from microblog to study. LDA is an effective method of text mining, but it will not play a very good role in applying LDA directly to a large number of short texts in microblog. In today’s more effective topic modeling project, short texts need to be aggregated into long texts to avoid data sparsity. However, aggregated short texts are mixed with a lot of noise, reducing the accuracy of mining the user’s interest preferences. In this paper, we propose Combining Latent Dirichlet Allocation (CLDA), a new topic model that can learn the potential topics of microblog short texts and long texts simultaneously. The data sparsity of short texts is avoided by aggregating long texts to assist in learning short texts. Short text filtering long text is reused to improve mining accuracy, making long texts and short texts effectively combined. Experimental results in a real microblog data set show that CLDA outperforms many advanced models in mining user interest, and we also confirm that CLDA also has good performance in recommending systems.

Download Full-text

A Tag-Based Improved LDA and Web Page Clustering Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.667.277 ◽

2014 ◽

Vol 667 ◽

pp. 277-285 ◽

Cited By ~ 1

Author(s):

Fang Chen ◽

Yan Hui Zhou

Keyword(s):

Clustering Analysis ◽

Large Scale ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Topic Model ◽

Rapid Development ◽

Expansion Method ◽

Web Page ◽

Web Page Clustering ◽

The Web

With the rapid development of Internet, tag technology has been widely used in various sites. The brief text labels of network resources are greatly convenient for people to access the massive data. Social tags allows the user to use any word ----to tag network objects, and to share these tags, because of its simple and flexible operation, and it has become one of the popular applications. However, there exists some problems like noise of tags, lack of using criteria, and sparse distribution etc. Especially sparsity of tags seriously limits its application in the semantic analysis of web pages. This paper, by exploiting the user-related tag expansion method to overcome this problem, at the same time by using the topic model----LDA to model the web tags, mine its potential topic from the large-scale web page, and obtain the topic distribution of the text to the text clustering analysis. The experimental results show that, compared with the traditional clustering algorithm, the method of based LDA clustering on the analysis of the web tags have a larger increase.

Download Full-text