RISING OF THE TEXT DOCUMENTS SEARCH PRECISION BY USING THE ADAPTIVE ONTOLOGY

International Journal of Computing ◽

10.47839/ijc.6.1.424 ◽

2014 ◽

pp. 51-58

Author(s):

Romana Darevych

Keyword(s):

Information Needs ◽

Text Processing ◽

Semantic Content ◽

Domain Ontology ◽

Automatic Tuning ◽

Conceptual Graphs ◽

Text Documents ◽

Search Precision ◽

Content Similarity ◽

Method Of Evaluation

Conceptual graphs are an effective tool for representation of the semantic content of text documents and domain ontology as well. In this article the new method of evaluation of text documents content similarity is proposed. The method consists in representation compared texts as its weighted conceptual graphs supplemented by related context from domain ontology and estimation of a distance between semantic weights centers of these graphs. It is shown that the method satisfies axioms of a metric. Procedures of the automatic tuning of ontology to the specified domain and information needs of user are developed. The results of experiment shows that the taking into account semantics of the used concepts, assertions and significance coefficients from adaptive ontology during the text processing rises the search precision on average 20 %.

Download Full-text

Applying the Bell’s Test to Chinese Texts

Entropy ◽

10.3390/e22030275 ◽

2020 ◽

Vol 22 (3) ◽

pp. 275

Author(s):

Igor A. Bessmertny ◽

Xiaoxi Huang ◽

Aleksei V. Platonov ◽

Chuqiao Yu ◽

Julia A. Koroleva

Keyword(s):

Quantum Entanglement ◽

Chinese Text ◽

Search Engines ◽

Text Processing ◽

Word Segmentation ◽

Significant Problem ◽

Text Segmentation ◽

Text Documents ◽

Segmentation Algorithms ◽

Chinese Texts

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.

Download Full-text

Automated Chinese Domain Ontology Construction from Text Documents

Bio-Inspired Computational Intelligence and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-540-74769-7_68 ◽

2007 ◽

pp. 639-648 ◽

Cited By ~ 1

Author(s):

Yu Zheng ◽

Wenxiang Dou ◽

Gengfeng Wu ◽

Xin Li

Keyword(s):

Domain Ontology ◽

Text Documents ◽

Ontology Construction

Download Full-text

Using text processing techniques to automatically enrich a domain ontology

Proceedings of the international conference on Formal Ontology in Information Systems - FOIS '01 ◽

10.1145/505168.505194 ◽

2001 ◽

Cited By ~ 68

Author(s):

Paola Velardi ◽

Paolo Fabriani ◽

Michele Missikoff

Keyword(s):

Text Processing ◽

Domain Ontology ◽

Processing Techniques

Download Full-text

Modeling the semantic content of the socio-tagged images based on the extended conceptual graphs formalism

Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media - MoMM '16 ◽

10.1145/3007120.3007160 ◽

2016 ◽

Cited By ~ 3

Author(s):

Mariam Bouchakwa ◽

Yassine Ayadi ◽

Ikram Amous

Keyword(s):

Semantic Content ◽

Conceptual Graphs

Download Full-text

BUILDING QUESTION ANSWERING SYSTEM BASED ON COMPUTING DOMAIN ONTOLOGY

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v38i02.294 ◽

2020 ◽

Vol 38 (02) ◽

Author(s):

TẠ DUY CÔNG CHIẾN

Keyword(s):

Language Processing ◽

Digital Libraries ◽

Question Answering ◽

Domain Ontology ◽

Text Documents ◽

Question Answering System ◽

Domain Specific ◽

Sql Database ◽

Question Answering Systems ◽

Education Business

Question answering systems are applied to many different fields in recent years, such as education, business, and surveys. The purpose of these systems is to answer automatically the questions or queries of users about some problems. This paper introduces a question answering system is built based on a domain specific ontology. This ontology, which contains the data and the vocabularies related to the computing domain are built from text documents of the ACM Digital Libraries. Consequently, the system only answers the problems pertaining to the information technology domains such as database, network, machine learning, etc. We use the methodologies of Natural Language Processing and domain ontology to build this system. In order to increase performance, I use a graph database to store the computing ontology and apply no-SQL database for querying data of computing ontology.

Download Full-text

Optimizing Text Categorization for Indonesian Text Using Clustering Label Technique

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.947 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1483-1491

Author(s):

Syopiansyah Jaya Putra Et.al

Keyword(s):

Text Categorization ◽

Intelligent System ◽

Text Processing ◽

Cluster Formation ◽

Clustering Algorithms ◽

Experimental Result ◽

Text Documents ◽

A Value ◽

Digital Format ◽

Clustering Quality

Text Categorization plays an important role for clustering the rapidly growing, yet unstructured, Indonesian text in digital format. Furthermore, it is deemed even more important since access to digital format text has become more necessary and widespread. There are many clustering algorithms used for text categorization. Unfortunately, clustering algorithms for text categorization cannot easily cluster the texts due to imperfect process of stemming and stopword of Indonesian language. This paper presents an intelligent system that categorizes Indonesian text documents into meaningful cluster labels. Label Induction Grouping Algorithm (LINGO) and Bisecting K- means are applied to process it through five phases, namely the pre-processing, frequent phrase extraction, cluster label induction, content discovery and final cluster formation. The experimental result showed that the system could categorize Indonesian text and reach to 93%. Furthermore, clustering quality evaluation indicates that text categorization using LINGO has high Precision and Recall with a value of 0.85 and 1, respectively, compare to Bisecting K-means which has a value of 0.78 and 0.99. Therefore, the result shows that LINGO is suitable for categorizing Indonesian text. The main contribution of this study is to optimize the clustering results by applying and maximizing text processing using Indonesian stemmer and stopword.

Download Full-text

Learning a Lightweight Ontology for Semantic Retrieval in Patient-Centered Information Systems

International Journal of Knowledge Management ◽

10.4018/jkm.2011070102 ◽

2011 ◽

Vol 7 (3) ◽

pp. 11-26 ◽

Cited By ~ 3

Author(s):

Ulrich Reimer ◽

Edith Maier ◽

Stephan Streit ◽

Thomas Diggelmann ◽

Manfred Hoffleisch

Keyword(s):

Information Needs ◽

Relevant Information ◽

Specific Information ◽

Actual Behavior ◽

Semantic Retrieval ◽

Patient Centered ◽

Text Documents ◽

Web Based ◽

Information Portal ◽

Traditional Approaches

The paper introduces a web-based eHealth platform currently being developed that will assist patients with certain chronic diseases. The ultimate aim is behavioral change. This is supported by online assessment and feedback which visualizes actual behavior in relation to target behavior. Disease-specific information is provided through an information portal that utilizes lightweight ontologies (associative networks) in combination with text mining. The paper argues that classical word-based information retrieval is often not sufficient for providing patients with relevant information, but that their information needs are better addressed by concept-based retrieval. The focus of the paper is on the semantic retrieval component and the learning of a lightweight ontology from text documents, which is achieved by using a biologically inspired neural network. The paper concludes with preliminary results of the evaluation of the proposed approach in comparison with traditional approaches.

Download Full-text

A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine

Advances in Decision Sciences ◽

10.1155/2015/925935 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Rapeeporn Chamchong ◽

Chun Che Fung

Keyword(s):

Selection Process ◽

Text Processing ◽

Imbalanced Data ◽

Support Vector ◽

Local Contrast ◽

Text Documents ◽

Image Characteristics ◽

Vector Machines ◽

High Degree ◽

Palm Leaf

Challenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing text documents have been proposed, the performance of these techniques varies and depends on the image characteristics. Therefore, selecting binarization techniques can be a key idea to achieve improved results. This paper proposes a framework for selecting binarizing techniques of palm leaf manuscripts using Support Vector Machines (SVMs). The overall process is divided into three steps: (i) feature extraction: feature patterns are extracted from grayscale images based on global intensity, local contrast, and intensity; (ii) treatment of imbalanced data: imbalanced dataset is balanced by using Synthetic Minority Oversampling Technique as to improve the performance of prediction; and (iii) selection: SVM is applied in order to select the appropriate binarization techniques. The proposed framework has been evaluated with palm leaf manuscript images and benchmarking dataset from DIBCO series and compared the performance of prediction between imbalanced and balanced datasets. Experimental results showed that the proposed framework can be used as an integral part of an automatic selection process.

Download Full-text

Research on Classification Algorithm of News Pages Based on Domain Ontology

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.336-338.2217 ◽

2013 ◽

Vol 336-338 ◽

pp. 2217-2220

Author(s):

Cai Yun Xie ◽

Xiao Rong Hu

Keyword(s):

Threshold Value ◽

Domain Ontology ◽

Classification Algorithm ◽

Semantic Classification ◽

Correlation Degree ◽

Current Classification ◽

The Common ◽

Law Of Cosines ◽

Content Similarity ◽

Semantic Dimension

This paper proposes the classification algorithm of news pages based on domain Ontology. In order to improve the shortage of current classification algorithm that only considers the content similarity, this paper presents the semantic classification method which considers both content similarity and structural correlation. Firstly, it parses the Ontology to get Ontology category vector, extracts keywords of news pages texts and drops semantic dimension. At this time, finding out the same vocabulary and ontology category vector in page texts to constitute the text expectation vector, and then calculating the content similarity between ontology category vector and expectation vector of text by using the law of cosines. Secondly, the common vocabularies are mapped to the ontology hierarchy chart, and the structural relevancy is obtained by calculating weighted path of this directed acyclic graph. Finally, it calculates the correlation degree of the news pages and Ontology by combining both, and determines the category of news pages by judging the size relationship between the result and the initial threshold value.

Download Full-text

Multilevel term analysis for adaptive document filtering1

Intelligent Data Analysis ◽

10.3233/ida-200006 ◽

2020 ◽

Vol 24 ◽

pp. 3-14

Author(s):

Adrian Fonseca Bruzón ◽

Aurelio López-López ◽

José E. Medina Pagola

Keyword(s):

Information Needs ◽

User Profile ◽

Cold Start ◽

Semantic Content ◽

Textual Structure ◽

Document Filtering ◽

The Impact ◽

Term Analysis ◽

Cold Start Problem

Humans tend to organize information in documents in a logical and intentional way. This organization, which we call textual structure, is commonly in terms of sections, chapters, paragraphs, or sentences. This structure facilitates the understanding of the content that we want to transmit to the readers. However, such structure, in which we usually encode the semantic content of information, is not usually exploited by the filtering methods for the construction of a user profile. In this work, we propose the use of term relations considering different context levels for enhancing document filtering. We propose methods for obtaining the representation, considering the existence of imbalance between the documents that satisfy the information needs of users, as well as the Cold Start problem (having scarce information) during the initial construction of the user profile. The experiments carried out allowed to assess the impact, in terms of T11SU measure, on the filtering task of the proposed representation.

Download Full-text