From Frequencies to Vectors

Current databases are able to store several Tbytes of free-text documents. The main purpose of a database from the user’s viewpoint is the efficient information retrieval. In the case of textual data, information retrieval mostly concerns the selection and the ranking of documents. We present here the particular solution of Oracle; there for making the full-text querying more efficient, a special engine was developed that performs the preparation of full-text queries and provides a set of language and semantic specific query operators.

Download Full-text

Document Retrieval Using Efficient Indexing Techniques

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch079 ◽

2018 ◽

pp. 1745-1764 ◽

Cited By ~ 1

Author(s):

Shweta Gupta ◽

Sunita Yadav ◽

Rajesh Prasad

Keyword(s):

Data Structure ◽

Full Text ◽

Crucial Role ◽

Document Retrieval ◽

Inverted Index ◽

Text Compression ◽

Text Documents ◽

Unique Data ◽

Indexing Techniques ◽

Key Terms

Document retrieval plays a crucial role in retrieving relevant documents. Relevancy depends upon the occurrences of query keywords in a document. Several documents include a similar key terms and hence they need to be indexed. Most of the indexing techniques are either based on inverted index or full-text index. Inverted index create lists and support word-based pattern queries. While full-text index handle queries comprise of any sequence of characters rather than just words. Problems arise when text cannot be separated as words in some western languages. Also, there are difficulties in space used by compressed versions of full-text indexes. Recently, one of the unique data structure called wavelet tree has been popular in the text compression and indexing. It indexes words or characters of the text documents and help in retrieving top ranked documents more efficiently. This paper presents a review on most recent efficient indexing techniques used in document retrieval.

Download Full-text

Document Retrieval using Efficient Indexing Techniques

International Journal of Business Analytics ◽

10.4018/ijban.2016100104 ◽

2016 ◽

Vol 3 (4) ◽

pp. 64-82 ◽

Cited By ~ 2

Author(s):

Shweta Gupta ◽

Sunita Yadav ◽

Rajesh Prasad

Keyword(s):

Data Structure ◽

Full Text ◽

Crucial Role ◽

Document Retrieval ◽

Inverted Index ◽

Text Compression ◽

Text Documents ◽

Unique Data ◽

Indexing Techniques ◽

Key Terms

Document retrieval plays a crucial role in retrieving relevant documents. Relevancy depends upon the occurrences of query keywords in a document. Several documents include a similar key terms and hence they need to be indexed. Most of the indexing techniques are either based on inverted index or full-text index. Inverted index create lists and support word-based pattern queries. While full-text index handle queries comprise of any sequence of characters rather than just words. Problems arise when text cannot be separated as words in some western languages. Also, there are difficulties in space used by compressed versions of full-text indexes. Recently, one of the unique data structure called wavelet tree has been popular in the text compression and indexing. It indexes words or characters of the text documents and help in retrieving top ranked documents more efficiently. This paper presents a review on most recent efficient indexing techniques used in document retrieval.

Download Full-text

The research of semantic information retrieval for Mongolian data structure

Advanced Materials and Information Technology Processing ◽

10.2495/amitp130781 ◽

2014 ◽

Author(s):

Yila Su ◽

Chuan Fan ◽

Hongbo Chen ◽

Xiulan Xie ◽

Haitao Liu

Keyword(s):

Information Retrieval ◽

Data Structure ◽

Semantic Information ◽

Semantic Information Retrieval

Download Full-text

Convolutional Neural Network for Customer’s Opinion on Amazon Products

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5670.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 6634-6643 ◽

Cited By ~ 1

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

Opinion Mining ◽

Text Documents ◽

Customer Churn ◽

Learning Classifier ◽

Review Spam

Opinion mining and sentiment analysis are valuable to extract the useful subjective information out of text documents. Predicting the customer’s opinion on amazon products has several benefits like reducing customer churn, agent monitoring, handling multiple customers, tracking overall customer satisfaction, quick escalations, and upselling opportunities. However, performing sentiment analysis is a challenging task for the researchers in order to find the users sentiments from the large datasets, because of its unstructured nature, slangs, misspells and abbreviations. To address this problem, a new proposed system is developed in this research study. Here, the proposed system comprises of four major phases; data collection, pre-processing, key word extraction, and classification. Initially, the input data were collected from the dataset: amazon customer review. After collecting the data, preprocessing was carried-out for enhancing the quality of collected data. The pre-processing phase comprises of three systems; lemmatization, review spam detection, and removal of stop-words and URLs. Then, an effective topic modelling approach Latent Dirichlet Allocation (LDA) along with modified Possibilistic Fuzzy C-Means (PFCM) was applied to extract the keywords and also helps in identifying the concerned topics. The extracted keywords were classified into three forms (positive, negative and neutral) by applying an effective machine learning classifier: Convolutional Neural Network (CNN). The experimental outcome showed that the proposed system enhanced the accuracy in sentiment analysis up to 6-20% related to the existing systems.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

Overview of information retrieval in a hospital-based health technology assessment center in a Swedish region

International Journal of Technology Assessment in Health Care ◽

10.1017/s0266462321000106 ◽

2021 ◽

Vol 37 (1) ◽

Author(s):

Ida Stadig ◽

Therese Svanberg

Keyword(s):

Information Retrieval ◽

Literature Review ◽

Health Technology Assessment ◽

Technology Assessment ◽

Full Text ◽

Health Technology ◽

University Hospital ◽

Literature Searching ◽

Medical Library ◽

Full Text Screening

Abstract Objectives This article aims to provide a brief review of information retrieval and hospital-based health technology assessment (HB-HTA) and describe library experiences and working methods at a regional HB-HTA center from the center's inception to the present day. Methods For this brief literature review, searches in PubMed and LISTA were conducted to identify studies reporting on HB-HTA and information retrieval. The description of the library's involvement in the HTA center and its working methods is based on the authors’ experience and internal and/or unpublished documents. Results Region Västra Götaland is the second largest healthcare region in Sweden and has had a regional HB-HTA center since 2007 (HTA-centrum). Assessments are performed by clinicians supported by HTA methodologists. The medical library at Sahlgrenska University Hospital works closely with HTA-centrum, with one HTA librarian responsible for coordinating the work. Conclusion In the literature on HB-HTA, we found limited descriptions of the role librarians and information specialists play in different units. The librarians at HTA-centrum play an important role, not only in literature searching but also in abstract and full-text screening.

Download Full-text

Increasing the Reliability of Full Text Documents Based on the Use of Mechanisms for Extraction of Statistical and Semantic Links of Elements

2020 International Conference on Information Science and Communications Technologies (ICISCT) ◽

10.1109/icisct50599.2020.9351397 ◽

2020 ◽

Author(s):

Jumanov Isroil ◽

Karshiev Khusan

Keyword(s):

Full Text ◽

Text Documents

Download Full-text

Understanding the nature and scope of clinical research commentaries in PubMed

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz209 ◽

2019 ◽

Vol 27 (3) ◽

pp. 449-456

Author(s):

James R Rogers ◽

Hollis Mills ◽

Lisa V Grossman ◽

Andrew Goldstein ◽

Chunhua Weng

Keyword(s):

Clinical Research ◽

Sentiment Analysis ◽

Random Sample ◽

Full Text ◽

Topic Modeling ◽

Critical Appraisal ◽

Research Articles ◽

Disease Specific ◽

Research Studies

Abstract Scientific commentaries are expected to play an important role in evidence appraisal, but it is unknown whether this expectation has been fulfilled. This study aims to better understand the role of scientific commentary in evidence appraisal. We queried PubMed for all clinical research articles with accompanying comments and extracted corresponding metadata. Five percent of clinical research studies (N = 130 629) received postpublication comments (N = 171 556), resulting in 178 882 comment–article pairings, with 90% published in the same journal. We obtained 5197 full-text comments for topic modeling and exploratory sentiment analysis. Topics were generally disease specific with only a few topics relevant to the appraisal of studies, which were highly prevalent in letters. Of a random sample of 518 full-text comments, 67% had a supportive tone. Based on our results, published commentary, with the exception of letters, most often highlight or endorse previous publications rather than serve as a prominent mechanism for critical appraisal.

Download Full-text

DIGITAL CONTENT OF PERIODICALS IN THE ELECTRONIC CATALOGUE OF THE CENTRAL SCIENCE LIBRARY OF THE NAS OF BELARUS

БИБЛИОТЕКИ В ИНФОРМАЦИОННОМ ОБЩЕСТВЕ: СОХРАНЕНИЕ ТРАДИЦИЙ И РАЗВИТИЕ НОВЫХ ТЕХНОЛОГИЙ ◽

10.47612/978-985-884-010-5-2020-287-295 ◽

2020 ◽

Author(s):

I. P. Komenda

Keyword(s):

Full Text ◽

Digital Content ◽

Text Documents ◽

Bibliographic Records

The publication deals with the initial stages of inclusion into the electronic catalogue of bibliographic records of electronic periodicals from eLIBRARY.RU platform and electronic serials which have been subscribed by the Central Science Library of the NAS of Belarus. The activities on addition of full text documents and tables of contents of periodicals into bibliographic records have been considered.

Download Full-text