document databases
Recently Published Documents


TOTAL DOCUMENTS

90
(FIVE YEARS 11)

H-INDEX

11
(FIVE YEARS 1)

2021 ◽  
Vol 25 (5) ◽  
pp. 50-60
Author(s):  
M. V. Smirnov ◽  
R. S. Tolmasov

Goals and objectives. Graphical models have proven to be a reliable, clear and convenient tool for creating sketch models of databases. Most of the existing notations are designed for the relational data model, the dominant data model for the last thirty years. However, the development of information technologies has led to an increase in the popularity of non-relational data models, primarily the document model. One of the problems of its application in practice is the lack of suitable tools that allow performing graphical modeling of the database, taking into account the features of the document model, at the stage of logical design. The development of appropriate tools is an important and actual task, since their application in practical research makes it possible to identify, classify and analyze typical modeling errors that allow the designer to reduce the risk of their occurrence in the future. The purpose of this article is to develop a graphical notation that, on the one hand, providing convenience for the designer, and on the other hand, taking into account the peculiarities of creating and functioning of the noSQL document storage model.Materials and methods. The materials for the study were numerous publications devoted to the development of graphical notations in problems and their application to database design for various information systems. The selected materials were analyzed and the main graphical notations used to describe the relational data model were identified. Three notations were selected from them, a set of graphic stereotypes, which were most different from each other, the analysis of which allowed us to identify the main image patterns of the components of the relational model.The resulting patterns were applied to the main elements of the document database, which were obtained by analyzing the documentation of the popular MongoDB DBMS.Results. The result of the research was the creation of a new tool for modeling document databases at the logical level, which consists of a set of graphic stereotypes and rules for their application. On the one hand, the development is well known to practitioners who have previously worked with relational data models, since its development took into account many years of experience in using graphical models in the field of relational database design, and on the other hand, it reflects the features of the structure of the document model.Conclusion. The practical application of the developed model has shown the convenience of its use both in the process of designing document databases and in the process of teaching students within this subject area. The use of graphical models constructed in the proposed graphical notation will allow researchers to create and illustrate typical patterns of document databases, which will undoubtedly have a positive impact on the dynamics of the development of promising data storage technologies.


2021 ◽  
pp. 116-127
Author(s):  
Iqra Muhammad ◽  
Danushka Bollegala ◽  
Frans Coenen ◽  
Carrol Gamble ◽  
Anna Kearney ◽  
...  

2021 ◽  
pp. 15-25
Author(s):  
Noa Roy-Hubara ◽  
Arnon Sturm ◽  
Peretz Shoval
Keyword(s):  

2020 ◽  
Vol 17 (9) ◽  
pp. 4531-4534
Author(s):  
Deepa Yogish ◽  
T. N. Manjunath ◽  
H. K. Yogish ◽  
Ravindra S. Hegadi

As the technology is developing information in each fields like literature, technology, science, medicine etc., also increasing in high pace. To extract related document in huge collection of documents based on user query in digital world is an interesting problem. Documents similarity Technique used in many applications like text categorization, plagiarism discernment, document clustering, information retrieval, machine translation and question answering system. Many algorithms have been developed for this purpose that take a document or input query and match it with the document databases. This paper proposes novel approach to vectorize each document and query with normalized TF-IDF method and applying Cosine Similarity function to extract top 3 documents based on user query.


2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Fernando Vegas Fernandez

Extracting knowledge from big document databases has long been a challenge.Most researchers do a literature review and manage their document databases with tools thatjust provide a bibliography and when retrieving information (a list of concepts and ideas), thereis a severe lack of functionality. Researchers do need to extract specific information from theirscholarly document databases depending on their predefined breakdown structure. Thosedatabases usually contain a few hundred documents, information requirements are distinct ineach research project, and technique algorithms are not always the answer. As most retrievingand information extraction algorithms require manual training, supervision, and tuning, itcould be shorter and more efficient to do it by hand and dedicate time and effort to perform aneffective semantic search list definition that is the key to obtain the desired results. A robustrelative importance index definition is the final step to obtain a ranked importance concept listthat will be helpful both to measure trends and to find a quick path to the most appropriatepaper in each case.


Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 45
Author(s):  
Yulia Shichkina ◽  
Muon Ha

In this article, we describe a new formalized method for constructing the NoSQL document database of MongoDB, taking into account the structure of queries planned for execution to the database. The method is based on set theory. The initial data are the properties of objects, information about which is stored in the database, and the set of queries that are most often executed or whose execution speed should be maximum. In order to determine the need to create embedded documents, our method uses the type of relationship between tables in a relational database. Our studies have shown that this method is in addition to the method of creating collections without embedded documents. In the article, we also describe a methodology for determining in which cases which methods should be used to make working with databases more efficient. It should be noted that this approach can be used for translating data from MySQL to MongoDB and for the consolidation of these databases.


Author(s):  
Iqra Muhammad ◽  
Danushka Bollegala ◽  
Frans Coenen ◽  
Carol Gamble ◽  
Anna Kearney ◽  
...  

2019 ◽  
pp. 15-28
Author(s):  
Van Muon Ha ◽  
◽  
Yulia A. Shichkina ◽  
Sergey V. Kostichev ◽  
◽  
...  

The work of transforming a database from one format periodically appears in different organizations for various reasons. Today, the mechanism for changing the format of relational databases is well developed. However, with the advent of new types of databases, such as NoSQL, this problem is prevalent due to the radically different ways of data organization at the various databases. This article discusses a formalized method based on set theory, at the choice of the number and composition of collections for a key-value type database. The initial data are the properties of objects, about which information is stored in the database, and the set of queries that are most frequently executed. The considered method can be applied not only when creating a new keyvalue database, but also when transforming an existing one, when moving from relational databases to NoSQL, when consolidating databases.


Author(s):  
Milana Grbić

Retrieving information from large document databases is in the focus of scientific research in recent years. In this paper, a parallel algorithm for searching biomedical documents based on the MapReduce technique is presented. The algorithm consists of three phases: preprocessing phase, document representation phase, and searching phase. In the first phase, lemmatization and elimination of stop words are performed. In the second phase, each of the documents is represented as a list of pairs (word, tf-idf index of the word). The third phase represents the main searching procedure. It uses a specially designed ranking criterion, which is based on a combination of the term frequency - inverse document frequency (tf-idf) index and the indicator function for each query word. Four different versions of ranking criteria are proposed and analyzed. The algorithm performances are tested on different subsets of the large and well-known PubMed biomedical document database. The results obtained by the experiments indicate that the proposed parallel algorithm succeeds in finding high-quality results in a reasonable time. Comparing to the sequential variant of the algorithm, the experiments show that the parallel algorithm is more efficient since it finds high-quality solutions in significantly less time.


Sign in / Sign up

Export Citation Format

Share Document