Exploiting Semantic Annotations andQ-Learning for Constructing an Efficient Hierarchy/Graph Texts Organization

Tremendous growth in the number of textual documents has produced daily requirements for effective development to explore, analyze, and discover knowledge from these textual documents. Conventional text mining and managing systems mainly use the presence or absence of key words to discover and analyze useful information from textual documents. However, simple word counts and frequency distributions of term appearances do not capture the meaning behind the words, which results in limiting the ability to mine the texts. This paper proposes an efficient methodology for constructing hierarchy/graph-based texts organization and representation scheme based on semantic annotation andQ-learning. This methodology is based on semantic notions to represent the text in documents, to infer unknown dependencies and relationships among concepts in a text, to measure the relatedness between text documents, and to apply mining processes using the representation and the relatedness measure. The representation scheme reflects the existing relationships among concepts and facilitates accurate relatedness measurements that result in a better mining performance. An extensive experimental evaluation is conducted on real datasets from various domains, indicating the importance of the proposed approach.

Download Full-text

IoTSAS: An Integrated System for Real-Time Semantic Annotation and Interpretation of IoT Sensor Stream Data

Computers ◽

10.3390/computers10100127 ◽

2021 ◽

Vol 10 (10) ◽

pp. 127

Author(s):

Besmir Sejdiu ◽

Florije Ismaili ◽

Lule Ahmedi

Keyword(s):

Air Quality ◽

Real Time ◽

Contextual Information ◽

Semantic Annotation ◽

Weather Conditions ◽

Water Quality Monitoring ◽

Integrated System ◽

Quality Monitoring ◽

Stream Data ◽

Semantic Annotations

Sensors and other Internet of Things (IoT) technologies are increasingly finding application in various fields, such as air quality monitoring, weather alerts monitoring, water quality monitoring, healthcare monitoring, etc. IoT sensors continuously generate large volumes of observed stream data; therefore, processing requires a special approach. Extracting the contextual information essential for situational knowledge from sensor stream data is very difficult, especially when processing and interpretation of these data are required in real time. This paper focuses on processing and interpreting sensor stream data in real time by integrating different semantic annotations. In this context, a system named IoT Semantic Annotations System (IoTSAS) is developed. Furthermore, the performance of the IoTSAS System is presented by testing air quality and weather alerts monitoring IoT domains by extending the Open Geospatial Consortium (OGC) standards and the Sensor Observations Service (SOS) standards, respectively. The developed system provides information in real time to citizens about the health implications from air pollution and weather conditions, e.g., blizzard, flurry, etc.

Download Full-text

Use of text mining techniques for unsupervised organization of digital procedural acts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.83581 ◽

2018 ◽

Vol 25 (4) ◽

pp. 74

Author(s):

Alfredo Silveira Araújo Neto ◽

Marcos Negreiros

Keyword(s):

Data Mining ◽

Text Mining ◽

Text Documents ◽

Digital Format ◽

Large Databases ◽

Context Data ◽

Self Discovery ◽

The Many ◽

Many Sources ◽

And Storage

The rapid advances in technologies related to the capture and storage of data in digital format have allowed to organizations the accumulation of a volume of information extremely high, constituted a higher proportion of data in unstructured format, represented by texts. However, it is noted that the retrieval of useful information from these large repositories has been a very challenging activity. In this context, data mining is presented as a self-discovery process that acts on large databases and enables the knowledge extraction from raw text documents. Among the many sources of textual documents are electronic diaries of justice, which are intended to make public officially all the acts of the Judiciary. Despite the publication in digital form has provided improvements represented by the removal of imperfections related to divulgation at printed format, it is observed that the application of data mining methods could render more rapid analysis of its contents. In this sense, this article establishes a tool capable of automatically grouping and categorizing digital procedural acts, based on the evaluation of text mining techniques applied to groups determination activity. In addition, the strategy of defining the descriptors of the groups, that is usually conducted based on the most frequent words in the documents, was evaluated and remodeled in order to use, instead of words, the most regularly identified concepts in the texts.

Download Full-text

Incorporating Text OLAP in Business Intelligence

Business Intelligence Applications and the Web - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-61350-038-5.ch004 ◽

2011 ◽

pp. 77-101 ◽

Cited By ~ 1

Author(s):

Byung-Kwon Park ◽

Il-Yeol Song

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Business Intelligence ◽

Multidimensional Analysis ◽

Web Pages ◽

Data Types ◽

Text Documents ◽

Text Data ◽

Platform Architecture ◽

Unstructured Text

As the amount of data grows very fast inside and outside of an enterprise, it is getting important to seamlessly analyze both data types for total business intelligence. The data can be classified into two categories: structured and unstructured. For getting total business intelligence, it is important to seamlessly analyze both of them. Especially, as most of business data are unstructured text documents, including the Web pages in Internet, we need a Text OLAP solution to perform multidimensional analysis of text documents in the same way as structured relational data. We first survey the representative works selected for demonstrating how the technologies of text mining and information retrieval can be applied for multidimensional analysis of text documents, because they are major technologies handling text data. And then, we survey the representative works selected for demonstrating how we can associate and consolidate both unstructured text documents and structured relation data for obtaining total business intelligence. Finally, we present a future business intelligence platform architecture as well as related research topics. We expect the proposed total heterogeneous business intelligence architecture, which integrates information retrieval, text mining, and information extraction technologies all together, including relational OLAP technologies, would make a better platform toward total business intelligence.

Download Full-text

Document Clustering

Pattern and Data Analysis in Healthcare Settings - Advances in Medical Technologies and Clinical Practice ◽

10.4018/978-1-5225-0536-5.ch013 ◽

2017 ◽

pp. 264-281

Author(s):

Harsha Patil ◽

R. S. Thakur

Keyword(s):

Text Mining ◽

Clustering Algorithms ◽

Document Clustering ◽

Web Pages ◽

Digital Form ◽

Search Query ◽

Text Documents ◽

Keen Interest ◽

Use Of Internet

As we know use of Internet flourishes with its full velocity and in all dimensions. Enormous availability of Text documents in digital form (email, web pages, blog post, news articles, ebooks and other text files) on internet challenges technology to appropriate retrieval of document as a response for any search query. As a result there has been an eruption of interest in people to mine these vast resources and classify them properly. It invigorates researchers and developers to work on numerous approaches of document clustering. Researchers got keen interest in this problem of text mining. The aim of this chapter is to summarised different document clustering algorithms used by researchers.

Download Full-text

Semantic Annotation of Objects

Handbook of Research on Social Dimensions of Semantic Technologies and Web Services ◽

10.4018/978-1-60566-650-1.ch011 ◽

2011 ◽

pp. 223-238

Author(s):

Petr Kremen ◽

Miroslav Blaško ◽

Zdenek Kouba

Keyword(s):

Knowledge Management ◽

Semantic Annotation ◽

Description Logics ◽

Practical Experience ◽

Video Clips ◽

Semantic Annotations ◽

Multimedia Resources ◽

Audio Video ◽

Representation Techniques ◽

Reasoning Algorithm

Compared to traditional ways of annotating multimedia resources (textual documents, photographs, audio/video clips etc.) by keywords in form of text fragments, semantic annotations are based on tagging such multimedia resources with meaning of objects (like cultural/historical artifacts) the resource is dealing with. The search for multimedia resources stored in a repository enriched with semantic annotations makes use of an appropriate reasoning algorithm. Knowledge management and Semantic Web communities have developed a number of relevant formalisms and methods. This chapter is motivated by practical experience with authoring of semantic annotations of cultural heritage related resources/objects. Keeping this experience in mind, the chapter compares various knowledge representation techniques, like frame-based formalisms, RDF(S), and description logics based formalisms from the viewpoint of their appropriateness for resource annotations and their ability to automatically support the semantic annotation process through advanced inference services, like error explanations and expressive construct modeling, namely n-ary relations.

Download Full-text

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Machine Learning and Knowledge Extraction ◽

10.3390/make1020034 ◽

2019 ◽

Vol 1 (2) ◽

pp. 575-589 ◽

Cited By ~ 1

Author(s):

Blaž Škrlj ◽

Jan Kralj ◽

Nada Lavrač ◽

Senja Pollak

Keyword(s):

Text Mining ◽

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Semantic Knowledge ◽

Text Documents ◽

Neural Architecture ◽

Classification Tasks ◽

And Gender ◽

Semantic Resources

Deep neural networks are becoming ubiquitous in text mining and natural language processing, but semantic resources, such as taxonomies and ontologies, are yet to be fully exploited in a deep learning setting. This paper presents an efficient semantic text mining approach, which converts semantic information related to a given set of documents into a set of novel features that are used for learning. The proposed Semantics-aware Recurrent deep Neural Architecture (SRNA) enables the system to learn simultaneously from the semantic vectors and from the raw text documents. We test the effectiveness of the approach on three text classification tasks: news topic categorization, sentiment analysis and gender profiling. The experiments show that the proposed approach outperforms the approach without semantic knowledge, with highest accuracy gain (up to 10%) achieved on short document fragments.

Download Full-text

Extracting kinetic information from literature with KineticRE

Journal of Integrative Bioinformatics ◽

10.1515/jib-2015-282 ◽

2015 ◽

Vol 12 (4) ◽

pp. 56-68

Author(s):

Ana Alão Freitas ◽

Hugo Costa ◽

Isabel Rocha

Keyword(s):

Text Mining ◽

Metabolic Networks ◽

Scientific Literature ◽

Kluyveromyces Lactis ◽

Relevant Information ◽

Text Documents ◽

Kinetic Information ◽

Mining Tool ◽

Text Mining Tool

Summary To better understand the dynamic behavior of metabolic networks in a wide variety of conditions, the field of Systems Biology has increased its interest in the use of kinetic models. The different databases, available these days, do not contain enough data regarding this topic. Given that a significant part of the relevant information for the development of such models is still wide spread in the literature, it becomes essential to develop specific and powerful text mining tools to collect these data. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The approach proposed integrates the development of a novel plug-in over the text mining framework @Note2. In the end, the pipeline developed was validated with a case study on Kluyveromyces lactis, spanning the analysis and results of 20 full text documents.

Download Full-text

Dual Scaling in Data Mining from Text Databases

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2006.p0451 ◽

2006 ◽

Vol 10 (4) ◽

pp. 451-457 ◽

Cited By ~ 3

Author(s):

Junzo Watada ◽

◽

Keisuke Aoki ◽

Masahiro Kawano ◽

Muhammad Suzuri Hitam ◽

...

Keyword(s):

Multivariate Analysis ◽

Text Mining ◽

Kansei Engineering ◽

Semantic Meaning ◽

Dual Scaling ◽

Text Documents ◽

Text Data ◽

Text Document ◽

Text Information ◽

Quantification Model

The availability of multimedia text document information has disseminated text mining among researchers. Text documents, integrate numerical and linguistic data, making text mining interesting and challenging. We propose text mining based on a fuzzy quantification model and fuzzy thesaurus. In text mining, we focus on: 1) Sentences included in Japanese text that are broken down into words. 2) Fuzzy thesaurus for finding words matching keywords in text. 3) Fuzzy multivariate analysis to analyze semantic meaning in predefined case studies. We use a fuzzy thesaurus to translate words using Chinese and Japanese characters into keywords. This speeds up processing without requiring a dictionary to separate words. Fuzzy multivariate analysis is used to analyze such processed data and to extract latent mutual related structures in text data, i.e., to extract otherwise obscured knowledge. We apply dual scaling to mining library and Web page text information, and propose integrating the result in Kansei engineering for possible application in sales, marketing, and production.

Download Full-text

SEMANTIC ANNOTATIONS ON HERITAGE MODELS: 2D/3D APPROACHES AND FUTURE RESEARCH CHALLENGES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-829-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 829-836

Author(s):

V. Croce ◽

G. Caroti ◽

L. De Luca ◽

A. Piemonte ◽

P. Véron

Keyword(s):

Cultural Heritage ◽

Contextual Information ◽

Semantic Annotation ◽

External Information ◽

Future Research ◽

Digital Information ◽

Semantic Annotations ◽

Related Information ◽

Research Challenges ◽

Application Fields

Abstract. Research in the field of Cultural Heritage is increasingly moving towards the creation of digital information systems, in which the geometric representation of an artifact is linked to some external information, through meaningful tags. The process of attributing additional and structured information to various elements in a given digital model is customarily identified with the term semantic annotation; the added contextual information is associated, for instance, to analysis and conservation terms. Starting from the existing literature, aim of this work is to discuss how semantic annotations are used, in digital architectural heritage models, to link the geometrical representation of an artefact with knowledge-related information. Most consolidated methods -such as traditional mapping on 2D media, are compared with more recent approaches making the most of 3D representation. Reference is made, in particular, to Heritage-BIM techniques and to collaborative reality-based platforms, such as Aïoli (http://aioli.cloud). Potentialities and limits of the different solutions proposed in literature are critically discussed, also addressing future research challenges in Cultural Heritage application fields.

Download Full-text

Research of Clustering Algorithms using Enhanced Feature Selection

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b5115.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4612-4615

Keyword(s):

Feature Selection ◽

Text Mining ◽

Clustering Algorithms ◽

Present Situation ◽

Features Selection ◽

Entity Extraction ◽

Text Documents ◽

Clustering Techniques ◽

Selection For ◽

Video And Audio

In Present situation, a huge quantity of data is recorded in variety of forms like text, image, video, and audio and is estimated to enhance in future. The major tasks related to text are entity extraction, information extraction, entity relation modeling, document summarization are performed by using text mining. This paper main focus is on document clustering, a sub task of text mining and to measure the performance of different clustering techniques. In this paper we are using an enhanced features selection for clustering of text documents to prove that it produces better results compared to traditional feature selection.

Download Full-text