A Classification Framework of Identifying Major Documents With Search Engine Suggestions and Unsupervised Subtopic Clustering

This paper addresses the problem of automatic recognition of out-of-topic documents from a small set of similar documents that are expected to be on some common topic. The objective is to remove documents of noise from a set. A topic model based classification framework is proposed for the task of discovering out-of-topic documents. This paper introduces a new concept of annotated {\it search engine suggests}, where this paper takes whichever search queries were used to search for a page as representations of content in that page. This paper adopted word embedding to create distributed representation of words and documents, and perform similarity comparison on search engine suggests. It is shown that search engine suggests can be highly accurate semantic representations of textual content and demonstrate that our document analysis algorithm using such representation for relevance measure gives satisfactory performance in terms of in-topic content filtering compared to the baseline technique of topic probability ranking.

Download Full-text

A Targeted Topic Model based Multi-Label Deep Learning Classification Framework for Aspect-based Opinion Mining

2020 12th International Conference on Knowledge and Systems Engineering (KSE) ◽

10.1109/kse50997.2020.9287397 ◽

2020 ◽

Author(s):

Thi-Cham Nguyen ◽

Thi-Ngan Pham ◽

Hoang-Quynh Le ◽

Tri-Thanh Nguyen ◽

Hong-Nhung Bui ◽

...

Keyword(s):

Deep Learning ◽

Opinion Mining ◽

Topic Model ◽

Model Based ◽

Classification Framework

Download Full-text

Decoding brain activity using a large-scale probabilistic functional-anatomical atlas of human cognition

10.1101/059618 ◽

2016 ◽

Cited By ~ 4

Author(s):

Timothy N. Rubin ◽

Oluwasanmi Koyejo ◽

Krzysztof J. Gorgolewski ◽

Michael N. Jones ◽

Russell A. Poldrack ◽

...

Keyword(s):

Large Scale ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Brain Activity ◽

Human Cognition ◽

Brain Images ◽

Whole Brain ◽

Context Sensitive ◽

Cognitive States ◽

Small Set

AbstractA central goal of cognitive neuroscience is to decode human brain activity--i.e., to infer mental processes from observed patterns of whole-brain activation. Previous decoding efforts have focused on classifying brain activity into a small set of discrete cognitive states. To attain maximal utility, a decoding framework must be open-ended, systematic, and context-sensitive--i.e., capable of interpreting numerous brain states, presented in arbitrary combinations, in light of prior information. Here we take steps towards this objective by introducing a Bayesian decoding framework based on a novel topic model---Generalized Correspondence Latent Dirichlet Allocation---that learns latent topics from a database of over 11,000 published fMRI studies. The model produces highly interpretable, spatially-circumscribed topics that enable flexible decoding of whole-brain images. Importantly, the Bayesian nature of the model allows one to “seed” decoder priors with arbitrary images and text--enabling researchers, for the first time, to generative quantitative, context-sensitive interpretations of whole-brain patterns of brain activity.

Download Full-text

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

International Journal of Software Innovation ◽

10.4018/ijsi.2018070105 ◽

2018 ◽

Vol 6 (3) ◽

pp. 67-78

Author(s):

Tian Nie ◽

Yi Ding ◽

Chen Zhao ◽

Youchao Lin ◽

Takehito Utsuro

Keyword(s):

Search Engine ◽

Information Needs ◽

Web Search ◽

Topic Model ◽

Japanese Version ◽

Word Embedding ◽

Coarse Grained ◽

Web Pages ◽

Word Embeddings

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Download Full-text

Using Texts to Teach Cognitive-Affective Curriculum

Journal of Education and Educational Development ◽

10.22555/joeed.v5i2.1976 ◽

2018 ◽

Vol 5 (2) ◽

pp. 28

Author(s):

Fatima Dar

Keyword(s):

Document Analysis ◽

Academic Work ◽

Student Interest ◽

Social Emotional ◽

English Curriculum ◽

Affective Curriculum ◽

Teachers And Students ◽

Group Interviews ◽

Textual Content ◽

Affective Skills

The study addressed a cognitive-affective gap in the textual content of a primary English curriculum. The research design was qualitative in nature. In the first part of the study, document analysis of the textbooks from grades1-5 was done to prove that empathetic and pro-social themes were under represented in them. The second part of the study was an intervention in which teachers were apprised to highlight empathetic and pro social themes in texts and teach them. The third part of the study noticed if the use of cognitive-affective texts raised awareness among students about the said themes and significantly affected their interest in academic work. The findings from document analysis, observations and interviews indicated that empathetic and pro-social themes were under represented in the textual content. The observations of integrated cognitive-affective lessons brought forth a significant increase in student interest in academic work and raised awareness about the stated themes. This was also authenticated by teachers and students in focused group interviews. The study was significant in terms of raising the importance of the stated skills at the primary level and prove that cognitive-affective use of textual content in schools could raise awareness about affective skills and prepare helpful and caring individuals for the society. Keywords: cognitive-affective, curriculum, empathy, social-emotional learning, textual content

Download Full-text

Health search engine with e-document analysis for reliable search results

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2005.11.002 ◽

2006 ◽

Vol 75 (1) ◽

pp. 73-85 ◽

Cited By ~ 13

Author(s):

Arnaud Gaudinat ◽

Patrick Ruch ◽

Michel Joubert ◽

Philippe Uziel ◽

Anne Strauss ◽

...

Keyword(s):

Search Engine ◽

Document Analysis ◽

Search Results

Download Full-text

Topic Modeling on Document Networks with Adjacent-Encoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6152 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6737-6745

Author(s):

Ce Zhang ◽

Hady W. Lauw

Keyword(s):

Network Structure ◽

Real World ◽

Topic Modeling ◽

Topic Model ◽

Web Pages ◽

Low Dimensional ◽

Textual Content

Oftentimes documents are linked to one another in a network structure,e.g., academic papers cite other papers, Web pages link to other pages. In this paper we propose a holistic topic model to learn meaningful and unified low-dimensional representations for networked documents that seek to preserve both textual content and network structure. On the basis of reconstructing not only the input document but also its adjacent neighbors, we develop two neural encoder architectures. Adjacent-Encoder, or AdjEnc, induces competition among documents for topic propagation, and reconstruction among neighbors for semantic capture. Adjacent-Encoder-X, or AdjEnc-X, extends this to also encode the network structure in addition to document content. We evaluate our models on real-world document networks quantitatively and qualitatively, outperforming comparable baselines comprehensively.

Download Full-text

The ICF in the pedagogical projects of Physiotherapy courses in Midwest Brazil

Fisioterapia em Movimento ◽

10.1590/1980-5918.033.ao44 ◽

2020 ◽

Vol 33 ◽

Author(s):

Juliana Aparecida Elias Fernandes ◽

Marília Miranda Forte Gomes ◽

Bruna da Silva Sousa ◽

Juliana de Faria Fracon e Romão ◽

Diana Lúcia Moura Pinho ◽

...

Keyword(s):

Physical Therapy ◽

Undergraduate Students ◽

Professional Training ◽

National Curriculum ◽

National Health System ◽

Document Analysis ◽

Classification Framework ◽

Curriculum Guidelines ◽

Student Training ◽

Physical Therapy Programs

Abstract Introduction: The course pedagogical projects (CPPs) of physical therapy programs in Brazil are based on National Curriculum Guidelines for Physiotherapy (NCGP) and the principles of the National Health System (SUS). The CPPs that guide professional training tend to use a biopsychosocial approach and propose familiarizing undergraduate students with the International Classification of Functionality, Disability and Health (ICF); as such, they should include the use of this instrument. Objective: Assess CPPs by exploratory document analysis and determine whether they propose teaching and using the ICF in student training. Method: Qualitative-quantitative study with document analysis of CPPs for physical therapy courses in Midwest Brazil, from which information related to the ICF was extracted. Results: The biopsychosocial model and NGCP were identified in the 10 CPPs analyzed and the ICF was found in the curriculum outline of 6 of these, indicating the incorporation of this framework in student training. However, the ICF was only identified in the course objectives and literature references of 4 and 2 CPPs, respectively, suggesting possible shortcomings in its application in these documents. Conclusion: The inclusion of the ICF in some CPPs indicates a positive change and favors understanding of functioning, but does not preclude the need for a broader approach to teaching this classification framework in the remaining CPPs in order to provide student training within a biopsychosocial context.

Download Full-text

Clustering search engine suggests by integrating a topic model and word embeddings

2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) ◽

10.1109/snpd.2017.8022782 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tian Nie ◽

Yi Ding ◽

Chen Zhao ◽

Youchao Lin ◽

Takehito Utsuro ◽

...

Keyword(s):

Search Engine ◽

Topic Model ◽

Word Embeddings ◽

Clustering Search

Download Full-text

Phishing Attack Detection using a Search Engine and Heuristics-based Technique

Journal of Information Technology Research ◽

10.4018/jitr.2020040106 ◽

2020 ◽

Vol 13 (2) ◽

pp. 94-109

Author(s):

Brij B. Gupta ◽

Ankit Kumar Jain

Keyword(s):

Search Engine ◽

English Language ◽

Attack Detection ◽

Detection Accuracy ◽

Search Query ◽

Textual Content

The language used in the textual content of the webpage is the barrier in most of the existing anti-phishing methods. Most of the existing anti-phishing methods can identify the fake webpages written in the English language only. Therefore, we present a search engine-based method in this article, which identifies phishing webpages accurately regardless of the textual language used within the webpage. The proposed search engine-based method uses a lightweight, consistent and language independent search query to detect the legality of the suspicious URL. We have also integrated five heuristics with the search engine-based mechanism to improve the detection accuracy, as some newly created legitimate sites may not appear in the search engine. The proposed method can also correctly classify the newly created legitimate sites that are not classified by available search engine-based methods. Evaluation results show that our method outperforms the available search-based techniques and achieves 98.15% TPR of and only 0.05% FPR.

Download Full-text