document databases Latest Research Papers

Goals and objectives. Graphical models have proven to be a reliable, clear and convenient tool for creating sketch models of databases. Most of the existing notations are designed for the relational data model, the dominant data model for the last thirty years. However, the development of information technologies has led to an increase in the popularity of non-relational data models, primarily the document model. One of the problems of its application in practice is the lack of suitable tools that allow performing graphical modeling of the database, taking into account the features of the document model, at the stage of logical design. The development of appropriate tools is an important and actual task, since their application in practical research makes it possible to identify, classify and analyze typical modeling errors that allow the designer to reduce the risk of their occurrence in the future. The purpose of this article is to develop a graphical notation that, on the one hand, providing convenience for the designer, and on the other hand, taking into account the peculiarities of creating and functioning of the noSQL document storage model.Materials and methods. The materials for the study were numerous publications devoted to the development of graphical notations in problems and their application to database design for various information systems. The selected materials were analyzed and the main graphical notations used to describe the relational data model were identified. Three notations were selected from them, a set of graphic stereotypes, which were most different from each other, the analysis of which allowed us to identify the main image patterns of the components of the relational model.The resulting patterns were applied to the main elements of the document database, which were obtained by analyzing the documentation of the popular MongoDB DBMS.Results. The result of the research was the creation of a new tool for modeling document databases at the logical level, which consists of a set of graphic stereotypes and rules for their application. On the one hand, the development is well known to practitioners who have previously worked with relational data models, since its development took into account many years of experience in using graphical models in the field of relational database design, and on the other hand, it reflects the features of the structure of the document model.Conclusion. The practical application of the developed model has shown the convenience of its use both in the process of designing document databases and in the process of teaching students within this subject area. The use of graphical models constructed in the proposed graphical notation will allow researchers to create and illustrate typical patterns of document databases, which will undoubtedly have a positive impact on the dynamics of the development of promising data storage technologies.

Download Full-text

Document Ranking for Curated Document Databases Using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

10.1007/978-3-030-86534-4_10 ◽

2021 ◽

pp. 116-127

Author(s):

Iqra Muhammad ◽

Danushka Bollegala ◽

Frans Coenen ◽

Carrol Gamble ◽

Anna Kearney ◽

...

Keyword(s):

Knowledge Graph ◽

Graph Embeddings ◽

Document Ranking ◽

Document Databases

Download Full-text

Designing Document Databases: A Comprehensive Requirements Perspective

10.1007/978-3-030-88358-4_2 ◽

2021 ◽

pp. 15-25

Author(s):

Noa Roy-Hubara ◽

Arnon Sturm ◽

Peretz Shoval

Keyword(s):

Document Databases

Download Full-text

Ranking Top Similar Documents for User Query Based on Normalized Vector Cosine Similarity Model

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9330 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4531-4534

Author(s):

Deepa Yogish ◽

T. N. Manjunath ◽

H. K. Yogish ◽

Ravindra S. Hegadi

Keyword(s):

Text Categorization ◽

Question Answering ◽

Cosine Similarity ◽

Similarity Function ◽

Interesting Problem ◽

Question Answering System ◽

Digital World ◽

User Query ◽

Novel Approach ◽

Document Databases

As the technology is developing information in each fields like literature, technology, science, medicine etc., also increasing in high pace. To extract related document in huge collection of documents based on user query in digital world is an interesting problem. Documents similarity Technique used in many applications like text categorization, plagiarism discernment, document clustering, information retrieval, machine translation and question answering system. Many algorithms have been developed for this purpose that take a document or input query and match it with the document databases. This paper proposes novel approach to vectorize each document and query with normalized TF-IDF method and applying Cosine Similarity function to extract top 3 documents based on user query.

Download Full-text

Intelligent information extraction from scholarly document databases

Journal of Intelligence Studies in Business ◽

10.37380/jisib.v10i2.584 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Fernando Vegas Fernandez

Keyword(s):

Literature Review ◽

Information Extraction ◽

Semantic Search ◽

Specific Information ◽

Information Requirements ◽

Research Project ◽

Importance Index ◽

Intelligent Information ◽

Document Databases

Extracting knowledge from big document databases has long been a challenge.Most researchers do a literature review and manage their document databases with tools thatjust provide a bibliography and when retrieving information (a list of concepts and ideas), thereis a severe lack of functionality. Researchers do need to extract specific information from theirscholarly document databases depending on their predefined breakdown structure. Thosedatabases usually contain a few hundred documents, information requirements are distinct ineach research project, and technique algorithms are not always the answer. As most retrievingand information extraction algorithms require manual training, supervision, and tuning, itcould be shorter and more efficient to do it by hand and dedicate time and effort to perform aneffective semantic search list definition that is the key to obtain the desired results. A robustrelative importance index definition is the final step to obtain a ranked importance concept listthat will be helpful both to measure trends and to find a quick path to the most appropriatepaper in each case.

Download Full-text

Creating Collections with Embedded Documents for Document Databases Taking into Account the Queries

Computation ◽

10.3390/computation8020045 ◽

2020 ◽

Vol 8 (2) ◽

pp. 45

Author(s):

Yulia Shichkina ◽

Muon Ha

Keyword(s):

Initial Data ◽

Set Theory ◽

Relational Database ◽

Document Database ◽

Execution Speed ◽

Document Databases

In this article, we describe a new formalized method for constructing the NoSQL document database of MongoDB, taking into account the structure of queries planned for execution to the database. The method is based on set theory. The initial data are the properties of objects, information about which is stored in the database, and the set of queries that are most often executed or whose execution speed should be maximum. In order to determine the need to create embedded documents, our method uses the type of relationship between tables in a relational database. Our studies have shown that this method is in addition to the method of creating collections without embedded documents. In the article, we also describe a methodology for determining in which cases which methods should be used to make working with databases more efficient. It should be noted that this approach can be used for translating data from MySQL to MongoDB and for the consolidation of these databases.

Download Full-text

Score Level Fusion for Improving Writer Retrieval in Handwritten Document Databases

020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP) ◽

10.1109/ccssp49278.2020.9151717 ◽

2020 ◽

Author(s):

Mohamed Lamine Bouibed ◽

Hassiba Nemmour ◽

Sara Derdouche ◽

Asma Leslous ◽

Youcef Chibani

Keyword(s):

Score Level Fusion ◽

Handwritten Document ◽

Level Fusion ◽

Document Databases

Download Full-text

Maintaining Curated Document Databases Using a Learning to Rank Model: The ORRCA Experience

Lecture Notes in Computer Science - Artificial Intelligence XXXVII ◽

10.1007/978-3-030-63799-6_26 ◽

2020 ◽

pp. 345-357

Author(s):

Iqra Muhammad ◽

Danushka Bollegala ◽

Frans Coenen ◽

Carol Gamble ◽

Anna Kearney ◽

...

Keyword(s):

Learning To Rank ◽

Document Databases

Download Full-text

Determining the Composition of Collections for Key-Document Databases Based on a Given Set of Object Properties and Database Querie

Computer Tools in Education ◽

10.32603/2071-2340-2019-3-15-28 ◽

2019 ◽

pp. 15-28

Author(s):

Van Muon Ha ◽

◽

Yulia A. Shichkina ◽

Sergey V. Kostichev ◽

◽

...

Keyword(s):

Initial Data ◽

Set Theory ◽

Relational Databases ◽

Data Organization ◽

Object Properties ◽

Document Databases

The work of transforming a database from one format periodically appears in different organizations for various reasons. Today, the mechanism for changing the format of relational databases is well developed. However, with the advent of new types of databases, such as NoSQL, this problem is prevalent due to the radically different ways of data organization at the various databases. This article discusses a formalized method based on set theory, at the choice of the number and composition of collections for a key-value type database. The initial data are the properties of objects, about which information is stored in the database, and the set of queries that are most frequently executed. The considered method can be applied not only when creating a new keyvalue database, but also when transforming an existing one, when moving from relational databases to NoSQL, when consolidating databases.

Download Full-text

A three-phase mapreduce-based algorithm for searching biomedical document databases

IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING ◽

10.7251/ijeec1901001g ◽

2019 ◽

Vol 3 (1) ◽

Author(s):

Milana Grbić

Keyword(s):

Parallel Algorithm ◽

Second Phase ◽

High Quality ◽

Inverse Document Frequency ◽

Third Phase ◽

Ranking Criteria ◽

Document Frequency ◽

Three Phase ◽

Query Word ◽

Document Databases

Retrieving information from large document databases is in the focus of scientific research in recent years. In this paper, a parallel algorithm for searching biomedical documents based on the MapReduce technique is presented. The algorithm consists of three phases: preprocessing phase, document representation phase, and searching phase. In the first phase, lemmatization and elimination of stop words are performed. In the second phase, each of the documents is represented as a list of pairs (word, tf-idf index of the word). The third phase represents the main searching procedure. It uses a specially designed ranking criterion, which is based on a combination of the term frequency - inverse document frequency (tf-idf) index and the indicator function for each query word. Four different versions of ranking criteria are proposed and analyzed. The algorithm performances are tested on different subsets of the large and well-known PubMed biomedical document database. The results obtained by the experiments indicate that the proposed parallel algorithm succeeds in finding high-quality results in a reasonable time. Comparing to the sequential variant of the algorithm, the experiments show that the parallel algorithm is more efficient since it finds high-quality solutions in significantly less time.

Download Full-text

document databases
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Graphical Notation for Document Database Modeling

Document Ranking for Curated Document Databases Using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

Designing Document Databases: A Comprehensive Requirements Perspective

Ranking Top Similar Documents for User Query Based on Normalized Vector Cosine Similarity Model

Intelligent information extraction from scholarly document databases

Creating Collections with Embedded Documents for Document Databases Taking into Account the Queries

Score Level Fusion for Improving Writer Retrieval in Handwritten Document Databases

Maintaining Curated Document Databases Using a Learning to Rank Model: The ORRCA Experience

Determining the Composition of Collections for Key-Document Databases Based on a Given Set of Object Properties and Database Querie

A three-phase mapreduce-based algorithm for searching biomedical document databases

Export Citation Format

document databasesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Graphical Notation for Document Database Modeling

Document Ranking for Curated Document Databases Using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

Designing Document Databases: A Comprehensive Requirements Perspective

Ranking Top Similar Documents for User Query Based on Normalized Vector Cosine Similarity Model

Intelligent information extraction from scholarly document databases

Creating Collections with Embedded Documents for Document Databases Taking into Account the Queries

Score Level Fusion for Improving Writer Retrieval in Handwritten Document Databases

Maintaining Curated Document Databases Using a Learning to Rank Model: The ORRCA Experience

Determining the Composition of Collections for Key-Document Databases Based on a Given Set of Object Properties and Database Querie

A three-phase mapreduce-based algorithm for searching biomedical document databases

document databases
Recently Published Documents