scholarly journals An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

In the context of big data and the 4.0 industrial revolution era, enhancing document/information retrieval frameworks efficiency to handle the ever‐growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.

2020 ◽  
Vol 10 (2) ◽  
pp. 1-4
Author(s):  
Evgeny Soloviov ◽  
Alexander Danilov

The Phygital word itself is the combination pf physical and digital technology application.This paper will highlight the detail of phygital world and its importance, also we will discuss why its matter in the world of technology along with advantages and disadvantages.It is the concept and technology is the bridge between physical and digital world which bring unique experience to the users by providing purpose of phygital world. It is the technology used in 21st century to bring smart data as opposed to big data and mix into the broader address of array of learning styles. It can bring new experience to every sector almost like, retail, medical, aviation, education etc. to maintain some reality in today’s world which is developing technology day to day. It is a general reboot which can keep economy moving and guarantee the wellbeing of future in terms of both online and offline.


2021 ◽  
Vol 1839 (1) ◽  
pp. 012004
Author(s):  
W Sardjono ◽  
G Rama Putra ◽  
E Selviyanti ◽  
A Cholidin ◽  
G Salim

2021 ◽  
pp. 1-11
Author(s):  
Zhinan Gou ◽  
Yan Li

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.


2014 ◽  
Vol 28 (4) ◽  
pp. 344-359 ◽  
Author(s):  
Gyeong June Hahm ◽  
Mun Yong Yi ◽  
Jae Hyun Lee ◽  
Hyo Won Suh

2008 ◽  
Author(s):  
Makoto Terao ◽  
Takafumi Koshinaka ◽  
Shinichi Ando ◽  
Ryosuke Isotani ◽  
Akitoshi Okumura

2017 ◽  
Vol 139 (11) ◽  
Author(s):  
Feng Shi ◽  
Liuqing Chen ◽  
Ji Han ◽  
Peter Childs

With the advent of the big-data era, massive information stored in electronic and digital forms on the internet become valuable resources for knowledge discovery in engineering design. Traditional document retrieval method based on document indexing focuses on retrieving individual documents related to the query, but is incapable of discovering the various associations between individual knowledge concepts. Ontology-based technologies, which can extract the inherent relationships between concepts by using advanced text mining tools, can be applied to improve design information retrieval in the large-scale unstructured textual data environment. However, few of the public available ontology database stands on a design and engineering perspective to establish the relations between knowledge concepts. This paper develops a “WordNet” focusing on design and engineering associations by integrating the text mining approaches to construct an unsupervised learning ontology network. Subsequent probability and velocity network analysis are applied with different statistical behaviors to evaluate the correlation degree between concepts for design information retrieval. The validation results show that the probability and velocity analysis on our constructed ontology network can help recognize the high related complex design and engineering associations between elements. Finally, an engineering design case study demonstrates the use of our constructed semantic network in real-world project for design relations retrieval.


2015 ◽  
Vol 5 (4) ◽  
pp. 31-45 ◽  
Author(s):  
Jagendra Singh ◽  
Aditi Sharan

Pseudo-relevance feedback (PRF) is a type of relevance feedback approach of query expansion that considers the top ranked retrieved documents as relevance feedback. In this paper the authors focus is to capture the limitation of co-occurrence and PRF based query expansion approach and the authors proposed a hybrid method to improve the performance of PRF based query expansion by combining query term co-occurrence and query terms contextual information based on corpus of top retrieved feedback documents in first pass. Firstly, the paper suggests top retrieved feedback documents based query term co-occurrence approach to select an optimal combination of query terms from a pool of terms obtained using PRF based query expansion. Second, contextual window based approach is used to select the query context related terms from top feedback documents. Third, comparisons were made among baseline, co-occurrence and contextual window based approaches using different performance evaluating metrics. The experiments were performed on benchmark data and the results show significant improvement over baseline approach.


Sign in / Sign up

Export Citation Format

Share Document