Information retrieval for unstructured text documents in Serbian into the crime domain

2011 ◽

pp. 77-101 ◽

Cited By ~ 1

Author(s):

Byung-Kwon Park ◽

Il-Yeol Song

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Business Intelligence ◽

Multidimensional Analysis ◽

Web Pages ◽

Data Types ◽

Text Documents ◽

Text Data ◽

Platform Architecture ◽

Unstructured Text

As the amount of data grows very fast inside and outside of an enterprise, it is getting important to seamlessly analyze both data types for total business intelligence. The data can be classified into two categories: structured and unstructured. For getting total business intelligence, it is important to seamlessly analyze both of them. Especially, as most of business data are unstructured text documents, including the Web pages in Internet, we need a Text OLAP solution to perform multidimensional analysis of text documents in the same way as structured relational data. We first survey the representative works selected for demonstrating how the technologies of text mining and information retrieval can be applied for multidimensional analysis of text documents, because they are major technologies handling text data. And then, we survey the representative works selected for demonstrating how we can associate and consolidate both unstructured text documents and structured relation data for obtaining total business intelligence. Finally, we present a future business intelligence platform architecture as well as related research topics. We expect the proposed total heterogeneous business intelligence architecture, which integrates information retrieval, text mining, and information extraction technologies all together, including relational OLAP technologies, would make a better platform toward total business intelligence.

Download Full-text

A NEW METHODOLOGY FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213011000565 ◽

2011 ◽

Vol 20 (06) ◽

pp. 1157-1170 ◽

Cited By ~ 8

Author(s):

BOUCHRA FRIKH ◽

AHMED SAID DJAANFAR ◽

BRAHIM OUHBI

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Retrieval System ◽

Domain Ontology ◽

Retrieval Performance ◽

Text Documents ◽

Unstructured Text ◽

Processing Information ◽

The Web ◽

Extract Information

Resources like ontologies are used in a number of applications, including natural language processing, information retrieval(especially from the Internet). Different methods have been proposed to build such resources. This paper proposes a new method to extract information from the Web to build a taxonomy of terms and Web resources for a given domain. Firstly, a (CHIR) method is used to identify candidat terms. Then a similarity (SIM) measure is introduced to select relevant concepts to build the ontology. Our new algorithm, called (CHIRSIM), is easy to implement and can be efficiently integrated into an information retrieval system to help improve the retrieval performance. Experimental results show that the proposed approach can effectively and efficiently construct a cancer domain ontology from unstructured text documents.

Download Full-text

Term weighting using contextual information for categorization of unstructured text documents

2015 Annual IEEE India Conference (INDICON) ◽

10.1109/indicon.2015.7443216 ◽

2015 ◽

Cited By ~ 1

Author(s):

Anagha Kulkarni ◽

Vrinda Tokekar ◽

Parag Kulkarni

Keyword(s):

Contextual Information ◽

Term Weighting ◽

Text Documents ◽

Unstructured Text

Download Full-text

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation (Preprint)

10.2196/preprints.17376 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sijia Liu ◽

Yanshan Wang ◽

Andrew Wen ◽

Liwei Wang ◽

Na Hong ◽

...

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Model ◽

Structured Data ◽

Common Data Model ◽

Concept System ◽

Unstructured Text ◽

Electronic Health

BACKGROUND Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.

Download Full-text

Automatic Keyword Extraction From Text Documents

Digital Technology Advancements in Knowledge Management - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-7998-6792-0.ch004 ◽

2021 ◽

pp. 71-91

Author(s):

Furkan Goz ◽

Alev Mutlu

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Online News ◽

Evaluation Metrics ◽

Keyword Extraction ◽

Feature Engineering ◽

Extraction Techniques ◽

Text Documents ◽

Scientific Papers ◽

Benchmark Datasets

Keyword indexing is the problem of assigning keywords to text documents. It is an important task as keywords play crucial roles in several information retrieval tasks. The problem is also challenging as the number of text documents is increasing, and such documents come in different forms (i.e., scientific papers, online news articles, and microblog posts). This chapter provides an overview of keyword indexing and elaborates on keyword extraction techniques. The authors provide the general motivations behind the supervised and the unsupervised keyword extraction and enumerate several pioneering and state-of-the-art techniques. Feature engineering, evaluation metrics, and benchmark datasets used to evaluate the performance of keyword extraction systems are also discussed.

Download Full-text

Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/3-540-46154-x_13 ◽

2002 ◽

pp. 99-106

Author(s):

Jan Žižka ◽

Aleš Bourek

Keyword(s):

Text Documents ◽

Unstructured Text ◽

Large Numbers

Download Full-text

Enrichment of text documents using information retrieval techniques in a distributed environment

Expert Systems with Applications ◽

10.1016/j.eswa.2010.05.048 ◽

2010 ◽

Vol 37 (12) ◽

pp. 8348-8358 ◽

Cited By ~ 2

Author(s):

Francisco Bueno ◽

Ana García-Serrano ◽

José L. Martínez-Fernández

Keyword(s):

Information Retrieval ◽

Distributed Environment ◽

Text Documents

Download Full-text

A Framework To Automatically Categorize The Unstructured Text Documents

Indian Journal of Science and Technology ◽

10.17485/ijst/2017/v10i8/10947 ◽

2017 ◽

Vol 10 (1) ◽

pp. 1-8

Author(s):

Anshika Singh

Keyword(s):

Text Documents ◽

Unstructured Text

Download Full-text

DOCUMENT CLUSTERING BY DYNAMIC HIERARCHICAL ALGORITHM BASED ON FUZZY SET TYPE-II FROM FREQUENT ITEMSET

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v9i2.383 ◽

2016 ◽

Vol 9 (2) ◽

pp. 88

Author(s):

Saiful Bahri Musa ◽

Andi Baso Kaswar ◽

Supria Supria ◽

Susiana Sari

Keyword(s):

Information Retrieval ◽

Fuzzy Set ◽

Document Clustering ◽

Static Method ◽

Frequent Itemset ◽

Type Ii ◽

Retrieval Process ◽

Text Documents ◽

F Measure

One of ways to facilitate process of information retrieval is by performing clustering toward collection of the existing documents. The existing text documents are often unstructured. The forms are varied and their groupings are ambiguous. This cases cause difficulty on information retrieval process. Moreover, every second new documents emerge and need to be clustered. Generally, static document clustering method performs clustering of document after whole documents are collected. However, performing re-clustering toward whole documents when new document arrives causes inefficient clustering process. In this paper, we proposed a new method for document clustering with dynamic hierarchy algorithm based on fuzzy set type - II from frequent itemset. To achieve the goals, there are three main phases, namely: determination of key-term, the extraction of candidates clusters and cluster hierarchical construction. Based on the experiment, it resulted the value of F-measure 0.40 for Newsgroup, 0.62 for Classic and 0.38 for Reuters. Meanwhile, time of computation when addition of new document is lower than to the previous static method. The result shows that this method is suitable to produce solution of clustering with hierarchy in dynamical environment effectively and efficiently. This method also gives accurate clustering result.

Download Full-text

Clustering Unstructured Text Documents Using Naive Bayesian Concept and Shape Pattern Matching

International Journal of Advancements in Computing Technology ◽

10.4156/ijact.vol1.issue1.8 ◽

2009 ◽

Vol 1 (1) ◽

pp. 52-63 ◽

Cited By ~ 2

Author(s):

Durga Toshniwal ◽

Rishiraj Saha Roy

Keyword(s):

Pattern Matching ◽

Text Documents ◽

Naive Bayesian ◽

Unstructured Text ◽

Naïve Bayesian ◽

Shape Pattern

Download Full-text

Information retrieval for unstructured text documents in Serbian into the crime domain

Incorporating Text OLAP in Business Intelligence

A NEW METHODOLOGY FOR DOMAIN ONTOLOGY CONSTRUCTION FROM THE WEB

Term weighting using contextual information for categorization of unstructured text documents

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation (Preprint)

Automatic Keyword Extraction From Text Documents

Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA

Enrichment of text documents using information retrieval techniques in a distributed environment

A Framework To Automatically Categorize The Unstructured Text Documents

DOCUMENT CLUSTERING BY DYNAMIC HIERARCHICAL ALGORITHM BASED ON FUZZY SET TYPE-II FROM FREQUENT ITEMSET

Clustering Unstructured Text Documents Using Naive Bayesian Concept and Shape Pattern Matching

Export Citation Format