scholarly journals A Classification Framework of Identifying Major Documents With Search Engine Suggestions and Unsupervised Subtopic Clustering

Author(s):  
Chen Zhao ◽  
Takehito Utsuro ◽  
Yasuhide Kawada

This paper addresses the problem of automatic recognition of out-of-topic documents from a small set of similar documents that are expected to be on some common topic. The objective is to remove documents of noise from a set. A topic model based classification framework is proposed for the task of discovering out-of-topic documents. This paper introduces a new concept of annotated {\it search engine suggests}, where this paper takes whichever search queries were used to search for a page as representations of content in that page. This paper adopted word embedding to create distributed representation of words and documents, and perform similarity comparison on search engine suggests. It is shown that search engine suggests can be highly accurate semantic representations of textual content and demonstrate that our document analysis algorithm using such representation for relevance measure gives satisfactory performance in terms of in-topic content filtering compared to the baseline technique of topic probability ranking.

This paper addresses the problem of automatic recognition of out-of-topic documents from a small set of similar documents that are expected to be on some common topic. The objective is to remove documents of noise from a set. A topic model based classification framework is proposed for the task of discovering out-of-topic documents. This paper introduces a new concept of annotated {\it search engine suggests}, where this paper takes whichever search queries were used to search for a page as representations of content in that page. This paper adopted word embedding to create distributed representation of words and documents, and perform similarity comparison on search engine suggests. It is shown that search engine suggests can be highly accurate semantic representations of textual content and demonstrate that our document analysis algorithm using such representation for relevance measure gives satisfactory performance in terms of in-topic content filtering compared to the baseline technique of topic probability ranking.


2016 ◽  
Author(s):  
Timothy N. Rubin ◽  
Oluwasanmi Koyejo ◽  
Krzysztof J. Gorgolewski ◽  
Michael N. Jones ◽  
Russell A. Poldrack ◽  
...  

AbstractA central goal of cognitive neuroscience is to decode human brain activity--i.e., to infer mental processes from observed patterns of whole-brain activation. Previous decoding efforts have focused on classifying brain activity into a small set of discrete cognitive states. To attain maximal utility, a decoding framework must be open-ended, systematic, and context-sensitive--i.e., capable of interpreting numerous brain states, presented in arbitrary combinations, in light of prior information. Here we take steps towards this objective by introducing a Bayesian decoding framework based on a novel topic model---Generalized Correspondence Latent Dirichlet Allocation---that learns latent topics from a database of over 11,000 published fMRI studies. The model produces highly interpretable, spatially-circumscribed topics that enable flexible decoding of whole-brain images. Importantly, the Bayesian nature of the model allows one to “seed” decoder priors with arbitrary images and text--enabling researchers, for the first time, to generative quantitative, context-sensitive interpretations of whole-brain patterns of brain activity.


2018 ◽  
Vol 6 (3) ◽  
pp. 67-78
Author(s):  
Tian Nie ◽  
Yi Ding ◽  
Chen Zhao ◽  
Youchao Lin ◽  
Takehito Utsuro

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.


2018 ◽  
Vol 5 (2) ◽  
pp. 28
Author(s):  
Fatima Dar

<p><em>The study addressed a cognitive-affective gap in the textual content of a primary</em><br /><em>English curriculum. The research design was qualitative in nature. In the first part of the study, document analysis of the textbooks from grades1-5 was done to prove that empathetic and pro-social themes were under represented in them. The second part of the study was an intervention in which teachers were apprised to highlight empathetic and pro social themes in texts and teach them. The third part of the study noticed if the use of cognitive-affective texts raised awareness among students about the said themes and significantly affected their interest in academic work. The findings from document analysis, observations and interviews indicated that empathetic and pro-social themes were under represented in the textual content. The observations of integrated cognitive-affective lessons brought forth a significant increase in student interest in academic work and raised awareness about the stated themes. This was also authenticated by teachers and students in focused group interviews. The study was significant in terms of raising the importance of the stated skills at the primary level and prove that cognitive-affective use of textual content in schools could raise awareness about affective skills and prepare helpful and caring individuals for the society.</em></p><p><em><br /></em></p><p><strong>Keywords:</strong> cognitive-affective, curriculum, empathy, social-emotional learning,<br />textual content</p><p><em><br /></em></p>


2006 ◽  
Vol 75 (1) ◽  
pp. 73-85 ◽  
Author(s):  
Arnaud Gaudinat ◽  
Patrick Ruch ◽  
Michel Joubert ◽  
Philippe Uziel ◽  
Anne Strauss ◽  
...  

2020 ◽  
Vol 34 (04) ◽  
pp. 6737-6745
Author(s):  
Ce Zhang ◽  
Hady W. Lauw

Oftentimes documents are linked to one another in a network structure,e.g., academic papers cite other papers, Web pages link to other pages. In this paper we propose a holistic topic model to learn meaningful and unified low-dimensional representations for networked documents that seek to preserve both textual content and network structure. On the basis of reconstructing not only the input document but also its adjacent neighbors, we develop two neural encoder architectures. Adjacent-Encoder, or AdjEnc, induces competition among documents for topic propagation, and reconstruction among neighbors for semantic capture. Adjacent-Encoder-X, or AdjEnc-X, extends this to also encode the network structure in addition to document content. We evaluate our models on real-world document networks quantitatively and qualitatively, outperforming comparable baselines comprehensively.


2020 ◽  
Vol 33 ◽  
Author(s):  
Juliana Aparecida Elias Fernandes ◽  
Marília Miranda Forte Gomes ◽  
Bruna da Silva Sousa ◽  
Juliana de Faria Fracon e Romão ◽  
Diana Lúcia Moura Pinho ◽  
...  

Abstract Introduction: The course pedagogical projects (CPPs) of physical therapy programs in Brazil are based on National Curriculum Guidelines for Physiotherapy (NCGP) and the principles of the National Health System (SUS). The CPPs that guide professional training tend to use a biopsychosocial approach and propose familiarizing undergraduate students with the International Classification of Functionality, Disability and Health (ICF); as such, they should include the use of this instrument. Objective: Assess CPPs by exploratory document analysis and determine whether they propose teaching and using the ICF in student training. Method: Qualitative-quantitative study with document analysis of CPPs for physical therapy courses in Midwest Brazil, from which information related to the ICF was extracted. Results: The biopsychosocial model and NGCP were identified in the 10 CPPs analyzed and the ICF was found in the curriculum outline of 6 of these, indicating the incorporation of this framework in student training. However, the ICF was only identified in the course objectives and literature references of 4 and 2 CPPs, respectively, suggesting possible shortcomings in its application in these documents. Conclusion: The inclusion of the ICF in some CPPs indicates a positive change and favors understanding of functioning, but does not preclude the need for a broader approach to teaching this classification framework in the remaining CPPs in order to provide student training within a biopsychosocial context.


2020 ◽  
Vol 13 (2) ◽  
pp. 94-109
Author(s):  
Brij B. Gupta ◽  
Ankit Kumar Jain

The language used in the textual content of the webpage is the barrier in most of the existing anti-phishing methods. Most of the existing anti-phishing methods can identify the fake webpages written in the English language only. Therefore, we present a search engine-based method in this article, which identifies phishing webpages accurately regardless of the textual language used within the webpage. The proposed search engine-based method uses a lightweight, consistent and language independent search query to detect the legality of the suspicious URL. We have also integrated five heuristics with the search engine-based mechanism to improve the detection accuracy, as some newly created legitimate sites may not appear in the search engine. The proposed method can also correctly classify the newly created legitimate sites that are not classified by available search engine-based methods. Evaluation results show that our method outperforms the available search-based techniques and achieves 98.15% TPR of and only 0.05% FPR.


Sign in / Sign up

Export Citation Format

Share Document