web information retrieval
Recently Published Documents


TOTAL DOCUMENTS

233
(FIVE YEARS 12)

H-INDEX

14
(FIVE YEARS 1)

2021 ◽  
Vol 50 (02) ◽  
Author(s):  
TẠ DUY CÔNG CHIẾN

There are many applications related to semantic web, information retrieval, information extraction, and question answering applying ontologies in recent years. To avoid the conceptual and terminological confusion, an ontology is built as a taxonomy ontology which identifies and distinguishes concepts as well as terminology. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. There are some methods to represent ontologies, such as Resource Description Framework (RDF), Web Ontology Language (OWL), databases etc. depending on the characteristic of data. RDF, OWL usually is used the cases when data structure is objects which the relationship among the objects is simple. But if the relationship among the objects is more complex, using databases for storing ontologies is an approach to be better. However, using relational databases do not sufficiently support the semantic orientated search by Structured Query Language (SQL) and the searching speed is slow. Therefore, this paper introduces an approach to extending query sentences for semantic oriented search on knowledge graph.


Author(s):  
Suruchi Chawla

Convolution neural network (CNN) is the most popular deep learning method that has been used for various applications like image recognition, computer vision, and natural language processing. In this chapter, application of CNN in web query session mining for effective information retrieval is explained. CNN has been used for document analysis to capture the rich contextual structure in a search query or document content. The document content represented in matrix form using Word2Vec is applied to CNN for convolution as well as maxpooling operations to generate the fixed length document feature vector. This fixed length document feature vector is input to fully connected neural network (FNN) and generates the semantic document vector. These semantic document vectors are clustered to group similar document for effective web information retrieval. An experiment was performed on the data set of web query sessions, and results confirm the effectiveness of CNN in web query session mining for effective information retrieval.


2020 ◽  
Vol 4 (2) ◽  
pp. 14-25 ◽  
Author(s):  
Sandeep Suri ◽  
Arushi Gupta ◽  
Kapil Sharma

With the evolution in technology huge amount of data is being generated, and extracts the necessary data from large volumes of data. This process is significantly complex. Generally the web contains bulk of raw data and the process of converting this data to information mining process can be performed. At whatever point the user places some inquiry on particular web search tool, outcomes are produced with respect to the requests which are dependent on the magnitude of the document created via web information retrieval tools. The results are obtained using calculations and implementation of well written algorithms. Well known web search tools like Google and other varied engines contain their specific manner to compute the page rank, various outcomes are obtained on various web crawlers for a same inquiry because the method for deciding the importance of the sites contrasts among number of algorithm. In this research, an attempt to analyze well-known page ranking calculation on the basis of their quality and shortcomings. This paper places the light on a portion of the extremely mainstream ranking algorithm and attempts to discover a better arrangement that can optimize the time spent on looking through the list of sites.


Study in the field of Information Retrieval (IR) has long been developed and thrived over time. And most of them use the available standard dataset for testing and evaluation. In line with that, the existence of new dataset has also increased to meet the needs of their respective studies. However, to the best of our knowledge, there is no dataset collected from web document that focuses on fruit domain. Therefore, in this paper we contribute to this field by publishing a dataset of web document for fruit focusing on durian fruit. This durian fruit dataset is suitable for query reformulation experiment, searching system, web information retrieval and any search engine experiment. This dataset contains a collection of web document for fruit and durian fruit, a collection of queries and a set of relevant judgement. In addition, in this paper we also publish a list of frequently asked query regarding durian, and an extended list of query characteristic categories.


Information retrieval is a key technology in accessing the vast amount of data present on today’s World Wide Web. Numerous challenges arise at various stages of information retrieval from the web, such as missing of plenteous relevant documents, static user queries, ever changing and tremendous amount of document collection and so forth. Therefore, more powerful strategies are required to search for relevant documents. In this paper, a PSO methodology is proposed which is hybridized with Simulated Annealing with the aim of optimizing Web Information Retrieval (WIR) process. Hybridized PSO has a high impact on reducing the query response time of the system and hence subsidizes the system efficiency. A novel similarity measure called SMDR acts as a fitness function in the hybridized PSO-SA algorithm. Evaluations measures such as accuracy, MRR, MAP, DCG, IDCG, F-measure and specificity are used to measure the effectiveness of the proposed system and to compare it with existing system as well. Ultimately, experiments are extensively carried out on a huge RCV1 collections. Achieved precision-recall rates demonstrate the considerably improved effectiveness of the proposed system than that of existing one.


Sign in / Sign up

Export Citation Format

Share Document