efficient retrieval
Recently Published Documents


TOTAL DOCUMENTS

293
(FIVE YEARS 22)

H-INDEX

21
(FIVE YEARS 0)

2021 ◽  
Vol 14 (1) ◽  
pp. 19
Author(s):  
Zineddine Kouahla ◽  
Ala-Eddine Benrazek ◽  
Mohamed Amine Ferrag ◽  
Brahim Farou ◽  
Hamid Seridi ◽  
...  

The past decade has been characterized by the growing volumes of data due to the widespread use of the Internet of Things (IoT) applications, which introduced many challenges for efficient data storage and management. Thus, the efficient indexing and searching of large data collections is a very topical and urgent issue. Such solutions can provide users with valuable information about IoT data. However, efficient retrieval and management of such information in terms of index size and search time require optimization of indexing schemes which is rather difficult to implement. The purpose of this paper is to examine and review existing indexing techniques for large-scale data. A taxonomy of indexing techniques is proposed to enable researchers to understand and select the techniques that will serve as a basis for designing a new indexing scheme. The real-world applications of the existing indexing techniques in different areas, such as health, business, scientific experiments, and social networks, are presented. Open problems and research challenges, e.g., privacy and large-scale data mining, are also discussed.



Author(s):  
Mohamed Minhaj

Wikipedia is among the most prominent and comprehensive sources of information available on the WWW. However, its unstructured form impedes direct interpretation by machines. Knowledge Base (KB) creation is a line of research that enables interpretation of Wikipedia's concealed knowledge by machines. In light of the efficacy of KBs for the storage and efficient retrieval of semantic information required for powering several IT applications such Question-Answering System, many large-scale knowledge bases have been developed. These KBs have employed different approaches for data curation and storage. The retrieval mechanism facilitated by these KBs is also different. Further, they differ in their depth and breadth of knowledge. This paper endeavours to explicate the process of KB creation using Wikipedia and compare the prominent KBs developed using the big data of Wikipedia.



2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yu Zhao

A new document image retrieval algorithm is proposed in view of the inefficient retrieval of information resources in a digital library. First of all, in order to accurately characterize the texture and enhance the ability of image differentiation, this paper proposes the statistical feature method of the double-tree complex wavelet. Secondly, according to the statistical characteristic method, combined with the visual characteristics of the human eye, the edge information in the document image is extracted. On this basis, we construct the meaningful texture features and use texture features to define the characteristic descriptors of document images. Taking the descriptor as the clue, the content characteristics of the document image are combined organically, and appropriate similarity measurement criteria are used for efficient retrieval. Experimental results show that the algorithm not only has high retrieval efficiency but also reduces the complexity of the traditional document image retrieval algorithm.



2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Mary Subaja Christo ◽  
V. Elizabeth Jesi ◽  
Uma Priyadarsini ◽  
V. Anbarasu ◽  
Hridya Venugopal ◽  
...  

Hospital data management is one of the functional parts of operations to store and access healthcare data. Nowadays, protecting these from hacking is one of the most difficult tasks in the healthcare system. As the user’s data collected in the field of healthcare is very sensitive, adequate security measures have to be taken in this field to protect the networks. To maintain security, an effective encryption technology must be utilised. This paper focuses on implementing the elliptic curve cryptography (ECC) technique, a lightweight authentication approach to share the data effectively. Many researches are in place to share the data wirelessly, among which this work uses Electronic Medical Card (EMC) to store the healthcare data. The work discusses two important data security issues: data authentication and data confidentiality. To ensure data authentication, the proposed system employs a secure mechanism to encrypt and decrypt the data with a 512-bit key. Data confidentiality is ensured by using the Blockchain ledger technique which allows ethical users to access the data. Finally, the encrypted data is stored on the edge device. The edge computing technology is used to store the medical reports within the edge network to access the data in a very fast manner. An authenticated user can decrypt the data and process the data at optimum speed. After processing, the updated data is stored in the Blockchain and in the cloud server. This proposed method ensures secure maintenance and efficient retrieval of medical data and reports.



2021 ◽  
Vol 13 (16) ◽  
pp. 3208
Author(s):  
Yinyi Cheng ◽  
Kefa Zhou ◽  
Jinlin Wang ◽  
Philippe De Maeyer ◽  
Tim Van de Voorde ◽  
...  

The spatial calculation of vector data is crucial for geochemical analysis in geological big data. However, large volumes of geochemical data make for inefficient management. Therefore, this study proposed a shapefile storage method based on MongoDB in GeoJSON form (SSMG) and a shapefile storage method based on PostgreSQL with open location code (OLC) geocoding (SSPOG) to solve the problem of low efficiency of electronic form management. The SSMG method consists of a JSONification tier and a cloud storage tier, while the SSPOG method consists of a geocoding tier, an extension tier, and a storage tier. Using MongoDB and PostgreSQL as databases, this study achieved two different types of high-throughput and high-efficiency methods for geochemical data storage and retrieval. Xinjiang, the largest province in China, was selected as the study area in which to test the proposed methods. Using geochemical data from shapefile as a data source, several experiments were performed to improve geochemical data storage efficiency and achieve efficient retrieval. The SSMG and SSPOG methods can be applied to improve geochemical data storage using different architectures, so as to achieve management of geochemical data organization in an efficient way, through time consumed and data compression ratio (DCR), in order to better support geological big data. The purpose of this study was to find ways to build a storage method that can improve the speed of geochemical data insertion and retrieval by using excellent big data technology to help us efficiently solve problem of geochemical data preprocessing and provide support for geochemical analysis.



2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Yidi Cui ◽  
Bo Gao ◽  
Lihong Liu ◽  
Jing Liu ◽  
Yan Zhu

Abstract Background Formula is an important means of traditional Chinese medicine (TCM) to treat diseases and has great research significance. There are many formula databases, but accessing rich information efficiently is difficult due to the small-scale data and lack of intelligent search engine. Methods We selected 38,000 formulas from a semi-structured database, and then segmented text, extracted information, and standardized terms. After that, we constructed a structured formula database based on ontology and an intelligent retrieval engine by calculating the weight of decoction pieces of formulas. Results The intelligent retrieval system named AMFormulaS (means Ancient and Modern Formula system) was constructed based on the structured database, ontology, and intelligent retrieval engine, so the retrieval and statistical analysis of formulas and decoction pieces were realized. Conclusions AMFormulaS is a large-scale intelligent retrieval system which includes a mass of formula data, efficient information extraction system and search engine. AMFormulaS could provide users with efficient retrieval and comprehensive data support. At the same time, the statistical analysis of the system can enlighten scientific research ideas and support patent review as well as new drug research and development.



2021 ◽  
Vol 15 (02) ◽  
pp. 189-213
Author(s):  
Petra Budikova ◽  
Jan Sedmidubsky ◽  
Jan Horvath ◽  
Pavel Zezula

With the increasing availability of human motion data captured in the form of 2D or 3D skeleton sequences, more complex motion recordings need to be processed. In this paper, we focus on similarity-based indexing and efficient retrieval of motion episodes — medium-sized skeleton sequences that consist of multiple semantic actions and correspond to some logical motion unit (e.g. a figure skating performance). As a first step toward efficient retrieval, we apply the motion-word technique to transform spatio-temporal skeleton sequences into compact text-like documents. Based on these documents, we introduce a two-phase retrieval scheme that first finds a set of candidate query results and then re-ranks these candidates with more expensive application-specific methods. We further index the motion-word documents using inverted files, which allows us to retrieve the candidate documents in an efficient and scalable manner. We also propose additional query-reduction techniques that accelerate both the retrieval phases by removing semantically irrelevant parts of the motion query. Experimental evaluation is used to analyze the effects of the individual proposed techniques on the retrieval efficiency and effectiveness.



2021 ◽  
Vol 36 (3) ◽  
pp. 693-706
Author(s):  
Da-Yu Jia ◽  
Jun-Chang Xin ◽  
Zhi-Qiong Wang ◽  
Han Lei ◽  
Guo-Ren Wang


2021 ◽  
Vol 70 ◽  
pp. 1441-1479
Author(s):  
Dung D. Le ◽  
Hady Lauw

Top-k recommendation seeks to deliver a personalized list of k items to each individual user. An established methodology in the literature based on matrix factorization (MF), which usually represents users and items as vectors in low-dimensional space, is an effective approach to recommender systems, thanks to its superior performance in terms of recommendation quality and scalability. A typical matrix factorization recommender system has two main phases: preference elicitation and recommendation retrieval. The former analyzes user-generated data to learn user preferences and item characteristics in the form of latent feature vectors, whereas the latter ranks the candidate items based on the learnt vectors and returns the top-k items from the ranked list. For preference elicitation, there have been numerous works to build accurate MF-based recommendation algorithms that can learn from large datasets. However, for the recommendation retrieval phase, naively scanning a large number of items to identify the few most relevant ones may inhibit truly real-time applications. In this work, we survey recent advances and state-of-the-art approaches in the literature that enable fast and accurate retrieval for MF-based personalized recommendations. Also, we include analytical discussions of approaches along different dimensions to provide the readers with a more comprehensive understanding of the surveyed works.



Sign in / Sign up

Export Citation Format

Share Document