scholarly journals A Self-Aggregated Hierarchical Topic Model for Short Texts

2021 ◽  
Author(s):  
Yue Niu ◽  
Hongjie Zhang

With the growth of the internet, short texts such as tweets from Twitter, news titles from the RSS, or comments from Amazon have become very prevalent. Many tasks need to retrieve information hidden from the content of short texts. So ontology learning methods are proposed for retrieving structured information. Topic hierarchy is a typical ontology that consists of concepts and taxonomy relations between concepts. Current hierarchical topic models are not specially designed for short texts. These methods use word co-occurrence to construct concepts and general-special word relations to construct taxonomy topics. But in short texts, word cooccurrence is sparse and lacking general-special word relations. To overcome this two problems and provide an interpretable result, we designed a hierarchical topic model which aggregates short texts into long documents and constructing topics and relations. Because long documents add additional semantic information, our model can avoid the sparsity of word cooccurrence. In experiments, we measured the quality of concepts by topic coherence metric on four real-world short texts corpus. The result showed that our topic hierarchy is more interpretable than other methods.

Author(s):  
Cao Liu ◽  
Shizhu He ◽  
Kang Liu ◽  
Jun Zhao

By reason of being able to obtain natural language responses, natural answers are more favored in real-world Question Answering (QA) systems. Generative models learn to automatically generate natural answers from large-scale question answer pairs (QA-pairs). However, they are suffering from the uncontrollable and uneven quality of QA-pairs crawled from the Internet. To address this problem, we propose a curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus. Specifically, we employ two practical measures to automatically measure the quality (complexity) of QA-pairs. Based on the measurements, CL-NAG firstly utilizes simple and low-quality QA-pairs to learn a basic model, and then gradually learns to produce better answers with richer contents and more complete syntaxes based on more complex and higher-quality QA-pairs. In this way, all valuable information in the noisy and uneven-quality corpus could be fully exploited. Experiments demonstrate that CL-NAG outperforms the state-of-the-arts, which increases 6.8% and 8.7% in the accuracy for simple and complex questions, respectively.


Author(s):  
Ida Nyoman Tri Darma Putra

The development of information and technology continues to grow in this revolution industry 4.0, especially in the development of technology with the internet. It influences the world of education, especially in learning methods. One of the developments in education and learning methods that are currently in use is Google classroom. The purpose of this study is to identify how students' responses towards learning English Profession using Google Classroom which is applied to the teaching and learning process at the Mataram College of Tourism. This research is survey research. The number of samples of this study was 135 students from Mataram College of Tourism. The variables examined in this study include aspects of ease in accessing, usefulness, communication and interaction, and students’ satisfaction in learning using google classroom. The result of this research shows that students in Mataram Tourism College feel accessing Google Classroom is easy to access, useful, easy for communication and interaction and feel satisfying with Google Classroom. From the interview, the respondents agree that Google Classroom offers helpful features that support the lecturers to manage the course efficiently and effectively. However, from the interview, it was found that the respondents felt that the quality of the learning processing was not better than conventional methods and they were uncomfortable during the learning processing in the google classroom.


2021 ◽  
Vol 2 (1) ◽  
pp. 38-52
Author(s):  
Mustajib Mustajib ◽  
Lia Roikhanatus Sa’adah

When the covid-19 pandemic comes, conventional learning methods must change with online learning methods or commonly called online learning which utilizes media platforms connected to the internet network to distribute materials and communicate between teachers and learners. Even in the event of a covid-19 pandemic learning must still be carried out and cannot be abandoned. Because this kind of learning has never been practiced before, then there are difficulties or problems very potentially arise. To solve this kind of thing, a strategy is needed to solve the problem. The media used must also be considered and adjusted to the conditions of the learners. This activity aims to maintain the quality of learning during the covid-19 pandemic. Even if the learning runs in the network, the quality of learning must still be considered so that the learning runs well and is effectively accepted by learners. This research aims to find out how online learning is done by SD Plus Al Hikmah and how the methods and strategies applied to maintain the quality of online learning so that the learning carried out can run effectively.


2015 ◽  
Vol 12 (1) ◽  
pp. 63-89 ◽  
Author(s):  
Mirjana Maksimovic ◽  
Vladimir Vujovic ◽  
Branko Perisic ◽  
Vladimir Milosevic

The recent proliferation of global networking has an enormous impact on the cooperation of smart elements, of arbitrary kind and purpose that can be located anywhere and interact with each other according to the predefined protocol. Furthermore, these elements have to be intelligently orchestrated in order to support distributed sensing and/or monitoring/control of real world phenomena. That is why the Internet of Things (IoT) concept raises like a new, promising paradigm for Future Internet development. Considering that Wireless Sensor Networks (WSNs) are envisioned as integral part of arbitrary IoTs, and the potentially huge number of cooperating IoTs that are usually used in the real world phenomena monitoring and management, the reliability of individual sensor nodes and the overall network performance monitoring and improvement are definitely challenging issues. One of the most interesting real world phenomena that can be monitored by WSN is indoor or outdoor fire. The incorporation of soft computing technologies, like fuzzy logic, in sensor nodes has to be investigated in order to gain the manageable network performance monitoring/control and the maximal extension of components life cycle. Many aspects, such as routes, channel access, locating, energy efficiency, coverage, network capacity, data aggregation and Quality of Services (QoS) have been explored extensively. In this article two fuzzy logic approaches, with temporal characteristics, are proposed for monitoring and determining confidence of fire in order to optimize and reduce the number of rules that have to be checked to make the correct decisions. We assume that this reduction may lower sensor activities without relevant impact on quality of operation and extend battery life directly contributing the efficiency, robustness and cost effectiveness of sensing network. In order to get a real time verification of proposed approaches a prototype sensor web node, based on Representational State Transfer (RESTful) services, is created as an infrastructure that supports fast critical event signaling and remote access to sensor data via the Internet.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Yanping Chen ◽  
Lu Jiang ◽  
Jianke Zhang ◽  
Xiaoxiao Dong

Nowadays, the number of Web services on the Internet is quickly increasing. Meanwhile, different service providers offer numerous services with the similar functions. Quality of Service (QoS) has become an important factor used to select the most appropriate service for users. The most prominent QoS-based service selection models only take the certain attributes into account, which is an ideal assumption. In the real world, there are a large number of uncertain factors. In particular, at the runtime, QoS may become very poor or unacceptable. In order to solve the problem, a global service selection model based on uncertain QoS was proposed, including the corresponding normalization and aggregation functions, and then a robust optimization model adopted to transform the model. Experiment results show that the proposed method can effectively select services with high robustness and optimality.


Author(s):  
Lei Tang ◽  
Huan Liu ◽  
Jiangping Zhang

The unregulated and open nature of the Internet and the explosive growth of the Web create a pressing need to provide various services for content categorization. The hierarchical classification attempts to achieve both accurate classification and increased comprehensibility. It has also been shown in literature that hierarchical models outperform flat models in training efficiency, classification efficiency, and classification accuracy (Koller & Sahami, 1997; McCallum, Rosenfeld, Mitchell & Ng, 1998; Ruiz & Srinivasan ,1999; Dumais & Chen, 2000; Yang, Zhang & Kisiel, 2003; Cai & Hofmann, 2004; Liu, Yang, Wan, Zeng, Cheng & Ma, 2005). However, the quality of the taxonomy attracted little attention in past works. Actually, different taxonomies can result in differences in classification. So the quality of the taxonomy should be considered for real-world classifications. Even a semantically sound taxonomy does not necessarily lead to the intended classification performance (Tang, Zhang & Liu 2006). Therefore, it is desirable to construct or modify a hierarchy to better suit the hierarchical content classification task.


2021 ◽  
pp. 1-15
Author(s):  
R.M. Noorullah ◽  
Moulana Mohammed

Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.


2021 ◽  
Vol 11 (18) ◽  
pp. 8708
Author(s):  
Yue Niu ◽  
Hongjie Zhang ◽  
Jing Li

In recent years, short texts have become a kind of prevalent text on the internet. Due to the short length of each text, conventional topic models for short texts suffer from the sparsity of word co-occurrence information. Researchers have proposed different kinds of customized topic models for short texts by providing additional word co-occurrence information. However, these models cannot incorporate sufficient semantic word co-occurrence information and may bring additional noisy information. To address these issues, we propose a self-aggregated topic model incorporating document embeddings. Aggregating short texts into long documents according to document embeddings can provide sufficient word co-occurrence information and avoid incorporating non-semantic word co-occurrence information. However, document embeddings of short texts contain a lot of noisy information resulting from the sparsity of word co-occurrence information. So we discard noisy information by changing the document embeddings into global and local semantic information. The global semantic information is the similarity probability distribution on the entire dataset and the local semantic information is the distances of similar short texts. Then we adopt a nested Chinese restaurant process to incorporate these two kinds of information. Finally, we compare our model to several state-of-the-art models on four real-world short texts corpus. The experiment results show that our model achieves better performances in terms of topic coherence and classification accuracy.


2021 ◽  
Author(s):  
Handi Chen ◽  
Xiaojie Wang ◽  
Zhaolong Ning ◽  
Lei Guo

With the advocacy of green renewable energy, Electric Vehicles (EVs) have gradually become the mainstream in the automobile market. Due to the finite edge resources of the Internet of EVs, this paper integrates idle communication, caching and computational resources of EVs to enrich the available resources for vehicular task migration. Considering the limited capacity and resources of EVs, a distributed lightweight imitation learning-based efficient Task cOoperative migration Policy Integrating 3C resource policy, named TOPIC, is proposed to maximize the obtained quality of service. The experimental results based on the real-world traffic dataset of Hangzhou (China) demonstrate the QoS obtained based on the expert policy and agent policy of TOPIC is about 3 times higher than other representative policies.


Author(s):  
Tengfei Ma ◽  
Tetsuya Nasukawa

Topic models have been successfully applied in lexicon extraction. However, most previous methods are limited to document-aligned data. In this paper, we try to address two challenges of applying topic models to lexicon extraction in non-parallel data: 1) hard to model the word relationship and 2) noisy seed dictionary. To solve these two challenges, we propose two new bilingual topic models to better capture the semantic information of each word while discriminating the multiple translations in a noisy seed dictionary. We extend the scope of topic models by inverting the roles of "word" and "document". In addition, to solve the problem of noise in seed dictionary, we incorporate the probability of translation selection in our models. Moreover, we also propose an effective measure to evaluate the similarity of words in different languages and select the optimal translation pairs. Experimental results using real world data demonstrate the utility and efficacy of the proposed models.


Sign in / Sign up

Export Citation Format

Share Document