GOW-Stream: A novel approach of graph-of-words based mixture model for semantic-enhanced text stream clustering

2021 ◽  
Vol 25 (5) ◽  
pp. 1211-1231
Author(s):  
Tham Vo ◽  
Phuc Do

Recently, rapid growth of social networks and online news resources from Internet have made text stream clustering become an insufficient application in multiple domains (e.g.: text retrieval diversification, social event detection, text summarization, etc.) Different from traditional static text clustering approach, text stream clustering task has specific key challenges related to the rapid change of topics/clusters and high-velocity of coming streaming document batches. Recent well-known model-based text stream clustering models, such as: DTM, DCT, MStream, etc. are considered as word-independent evaluation approach which means largely ignoring the relations between words while sampling clusters/topics. It definitely leads to the decrease of overall model accuracy performance, especially for short-length text documents such as comments, microblogs, etc. in social networks. To tackle these existing problems, in this paper we propose a novel approach of graph-of-words (GOWs) based text stream clustering, called GOW-Stream. The application of common GOWs which are generated from each document batch while sampling clusters/topics can support to overcome the word-independent evaluation challenge. Our proposed GOW-Stream is promising to significantly achieve better text stream clustering performance than recent state-of-the-art baselines. Extensive experiments on multiple benchmark real-world datasets demonstrate the effectiveness of our proposed model in both accuracy and time-consuming performances.

2019 ◽  
Vol 8 (2S11) ◽  
pp. 3970-3975

Due to fast growth of internet and continuous expansion of World Wide Web like digital libraries, online news contributes to massive amount of electronic unstructured text documents on the web. Although lot traditional techniques are available to extract the knowledge from large collection of text documents, still to improve precision of the web search retrieval and to find most appropriate documents from huge text collections proficiently is a big challenge. Clustering techniques helps the search engine to retrieve the documents. The proposed system overcomes existing problems using bivariate n-gram frequent item clustering algorithm by concept of maximum frequent set which maintain the sequence and meaning of sentence in order to reduce huge dimension and and frequent item sets finds similarity. Then based on maximum document occurrence we cluster the documents. Thus our method obtains quality of clusters when compared with existing methodologies and improves the efficiency. The experiment is shown for sample Newsgroup dataset for existing K-Mean and FICMDO (Frequent item clustering method based on maximum document occurrence) and proved the f-measure is higher for our algorithm. Since the f-measure increases, obtains efficient clusters. Hence it is faster and efficient big data method which improves the performance when compared with vector space model like K-Means algorithm.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-32
Author(s):  
Quang-huy Duong ◽  
Heri Ramampiaro ◽  
Kjetil Nørvåg ◽  
Thu-lan Dam

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 230
Author(s):  
Xiangwei Dang ◽  
Zheng Rong ◽  
Xingdong Liang

Accurate localization and reliable mapping is essential for autonomous navigation of robots. As one of the core technologies for autonomous navigation, Simultaneous Localization and Mapping (SLAM) has attracted widespread attention in recent decades. Based on vision or LiDAR sensors, great efforts have been devoted to achieving real-time SLAM that can support a robot’s state estimation. However, most of the mature SLAM methods generally work under the assumption that the environment is static, while in dynamic environments they will yield degenerate performance or even fail. In this paper, first we quantitatively evaluate the performance of the state-of-the-art LiDAR-based SLAMs taking into account different pattens of moving objects in the environment. Through semi-physical simulation, we observed that the shape, size, and distribution of moving objects all can impact the performance of SLAM significantly, and obtained instructive investigation results by quantitative comparison between LOAM and LeGO-LOAM. Secondly, based on the above investigation, a novel approach named EMO to eliminating the moving objects for SLAM fusing LiDAR and mmW-radar is proposed, towards improving the accuracy and robustness of state estimation. The method fully uses the advantages of different characteristics of two sensors to realize the fusion of sensor information with two different resolutions. The moving objects can be efficiently detected based on Doppler effect by radar, accurately segmented and localized by LiDAR, then filtered out from the point clouds through data association and accurate synchronized in time and space. Finally, the point clouds representing the static environment are used as the input of SLAM. The proposed approach is evaluated through experiments using both semi-physical simulation and real-world datasets. The results demonstrate the effectiveness of the method at improving SLAM performance in accuracy (decrease by 30% at least in absolute position error) and robustness in dynamic environments.


Author(s):  
Lakshmikanth Paleti ◽  
P. Radha Krishna ◽  
J.V.R. Murthy

Recommendation systems provide reliable and relevant recommendations to users and also enable users’ trust on the website. This is achieved by the opinions derived from reviews, feedbacks and preferences provided by the users when the product is purchased or viewed through social networks. This integrates interactions of social networks with recommendation systems which results in the behavior of users and user’s friends. The techniques used so far for recommendation systems are traditional, based on collaborative filtering and content based filtering. This paper provides a novel approach called User-Opinion-Rating (UOR) for building recommendation systems by taking user generated opinions over social networks as a dimension. Two tripartite graphs namely User-Item-Rating and User-Item-Opinion are constructed based on users’ opinion on items along with their ratings. Proposed approach quantifies the opinions of users and results obtained reveal the feasibility.


Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 135
Author(s):  
Maximilian Felde ◽  
Tom Hanika ◽  
Gerd Stumme

Null model generation for formal contexts is an important task in the realm of formal concept analysis. These random models are in particular useful for, but not limited to, comparing the performance of algorithms. Nonetheless, a thorough investigation of how to generate null models for formal contexts is absent. Thus we suggest a novel approach using Dirichlet distributions. We recollect and analyze the classical coin-toss model, recapitulate some of its shortcomings and examine its stochastic properties. Building upon this we propose a model which is capable of generating random formal contexts as well as null models for a given input context. Through an experimental evaluation we show that our approach is a significant improvement with respect to the variety of contexts generated. Furthermore, we demonstrate the applicability of our null models with respect to real world datasets.


2021 ◽  
Vol 27 (7) ◽  
pp. 667-692
Author(s):  
Lamia Berkani ◽  
Lylia Betit ◽  
Louiza Belarif

Clustering-based approaches have been demonstrated to be efficient and scalable to large-scale data sets. However, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we propose in this article an optimized multiview clustering approach for the recommendation of items in social networks. First, the selection of the initial medoids is optimized using the Bees Swarm optimization algorithm (BSO) in order to generate better partitions (i.e. refining the quality of medoids according to the objective function). Then, the multiview clustering (MV) is applied, where users are iteratively clustered from the views of both rating patterns and social information (i.e. friendships and trust). Finally, a framework is proposed for testing the different alternatives, namely: (1) the standard recommendation algorithms; (2) the clustering-based and the optimized clustering-based recommendation algorithms using BSO; and (3) the MV and the optimized MV (BSO-MV) algorithms. Experimental results conducted on two real-world datasets demonstrate the effectiveness of the proposed BSO-MV algorithm in terms of improving accuracy, as it outperforms the existing related approaches and baselines.


Author(s):  
Yoosin Kim ◽  
Michelle Jeong ◽  
Seung Ryul Jeong

In light of recent research that has begun to examine the link between textual “big data” and social phenomena such as stock price increases, this chapter takes a novel approach to treating news as big data by proposing the intelligent investment decision-making support model based on opinion mining. In an initial prototype experiment, the researchers first built a stock domain-specific sentiment dictionary via natural language processing of online news articles and calculated sentiment scores for the opinions extracted from those stories. In a separate main experiment, the researchers gathered 78,216 online news articles from two different media sources to not only make predictions of actual stock price increases but also to compare the predictive accuracy of articles from different media sources. The study found that opinions that are extracted from the news and treated with proper sentiment analysis can be effective in predicting changes in the stock market.


Author(s):  
Furkan Goz ◽  
Alev Mutlu

Keyword indexing is the problem of assigning keywords to text documents. It is an important task as keywords play crucial roles in several information retrieval tasks. The problem is also challenging as the number of text documents is increasing, and such documents come in different forms (i.e., scientific papers, online news articles, and microblog posts). This chapter provides an overview of keyword indexing and elaborates on keyword extraction techniques. The authors provide the general motivations behind the supervised and the unsupervised keyword extraction and enumerate several pioneering and state-of-the-art techniques. Feature engineering, evaluation metrics, and benchmark datasets used to evaluate the performance of keyword extraction systems are also discussed.


2018 ◽  
Vol 40 (3) ◽  
pp. 318-334 ◽  
Author(s):  
Gregory Phillips ◽  
Peter Lindeman ◽  
Christian N. Adames ◽  
Emily Bettin ◽  
Christopher Bayston ◽  
...  

HIV continues to significantly impact the health of communities, particularly affecting racially and ethnically diverse men who have sex with men and transgender women. In response, health departments often fund a number of community organizations to provide each of these subgroups with comprehensive and culturally responsive services. To this point, evaluators have focused on individual interventions but have largely overlooked the complex environment in which these interventions are implemented, including other programs funded to do similar work. The Evaluation Center was funded by the City of Chicago in 2015 to conduct a citywide evaluation of all HIV prevention programming. This article will describe our novel approach to adapt the principles and methods of the empowerment evaluation approach, to effectively engage with 20 city-funded prevention programs to collect and synthesize multisite evaluation data, and ultimately build capacity at these organizations to foster a learning-focused community.


Sign in / Sign up

Export Citation Format

Share Document