International Journal of Information Retrieval Research
Latest Publications


TOTAL DOCUMENTS

227
(FIVE YEARS 140)

H-INDEX

7
(FIVE YEARS 4)

Published By Igi Global

2155-6385, 2155-6377
Updated Friday, 05 November 2021

2022 ◽  
Vol 12 (1) ◽  
pp. 1-18
Author(s):  
Umamageswari Kumaresan ◽  
Kalpana Ramanujam

The intent of this research is to come up with an automated web scraping system which is capable of extracting structured data records embedded in semi-structured web pages. Most of the automated extraction techniques in the literature captures repeated pattern among a set of similarly structured web pages, thereby deducing the template used for the generation of those web pages and then data records extraction is done. All of these techniques exploit computationally intensive operations such as string pattern matching or DOM tree matching and then perform manual labeling of extracted data records. The technique discussed in this paper departs from the state-of-the-art approaches by determining informative sections in the web page through repetition of informative content rather than syntactic structure. From the experiments, it is clear that the system has identified data rich region with 100% precision for web sites belonging to different domains. The experiments conducted on the real world web sites prove the effectiveness and versatility of the proposed approach.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The Cubic Cell Formation Problem (CCFP) in cellular manufacturing systems consists in decomposing a production system into a set of manufacturing cells, and assigning workers to cells besides parts and machines. The major objective is to obtain manageable cells. Manageable cells mean cells with a minimum value of inter-cell moves of parts and workers and a minimum value of heterogeneity within cells. In this paper, a solution methodology based on a modified simulated annealing heuristic with a proposed neighbourhood search procedure is proposed. The methodology allows building multiple configurations by giving to the decision-maker the ability to control some parameters. Experimental results show that the proposed algorithm gives a promising performance for all problem instances found in the literature.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Understanding the actual need of user from a question is very crucial in non-factoid why-question answering as Why-questions are complex and involve ambiguity and redundancy in their understanding. The precise requirement is to determine the focus of question and reformulate them accordingly to retrieve expected answers to a question. The paper analyzes different types of why-questions and proposes an algorithm for each class to determine the focus and reformulate it into a query by appending focal terms and cue phrase ‘because’ with it. Further, a user interface is implemented which asks input why-question, applies different components of question , reformulates it and finally retrieve web pages by posing query to Google search engine. To measure the accuracy of the process, user feedback is taken which asks them to assign scoring from 1 to 10, on how relevant are the retrieved web pages according to their understanding. The results depict that maximum precision of 89% is achieved in Informational type why-questions and minimum of 48% in opinionated type why-questions.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Improving the quality of education is a challenging activity in every educational institution. Through this research paper, a model has been proposed representing the challenges in order to manage the trade-off to maintain the philosophy of continuous quality improvement and strict control based on Higher Education Institutions (HEIs). Several standards criteria, performance parameters, and Key Performance Indicators are studied and suggested for a quality self-assessment approach. After the data is collected, the significant features are selected for analysis of data using dedicated gain, which are designed by integrating the information gain and the dedicated weight constants. After that, deep learning methodologies like regression analysis, the artificial neural network, and the Matlab model are used for evaluating the academic quality of institutions. Finally, areas of development have been recommended using the probabilistic model to the administrators of the institutions based on the prediction made using a deep neural network.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The irregularity of Indian grid system increases, with increase in the power demand. The quality of power supplied by the power grid is also poor due to continuous variation in frequency and voltage. To overcome this problem of power deficit, Captive Power Plants installed capacity has grown at a faster rate. Here short term load forecasting of Yara Fertilizers India Private limited installed at Babrala, Uttar Pradesh is performed using multi-layer feed-forward Neural network in MATLAB. The algorithm used is a Levenberg Marquardt algorithm. However, the training and results from ANN are very fast and accurate. Inputs given to the Neural Network are time, ambient air temperature from the compressor, cool air temperature at the compressor and IGV opening. The need, benefits and growth of CPP in India and use of ANN for short term load forecasting of CPP has been explained in detail in the paper.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

A new deep learning-based classification model called the Stochastic Dilated Residual Ghost (SDRG) was proposed in this work for categorizing histopathology images of breast cancer. The SDRG model used the proposed Multiscale Stochastic Dilated Convolution (MSDC) model, a ghost unit, stochastic upsampling, and downsampling units to categorize breast cancer accurately. This study addresses four primary issues: first, strain normalization was used to manage color divergence, data augmentation with several factors was used to handle the overfitting. The second challenge is extracting and enhancing tiny and low-level information such as edge, contour, and color accuracy; it is done by the proposed multiscale stochastic and dilation unit. The third contribution is to remove redundant or similar information from the convolution neural network using a ghost unit. According to the assessment findings, the SDRG model scored overall 95.65 percent accuracy rates in categorizing images with a precision of 99.17 percent, superior to state-of-the-art approaches.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Time evolving networks tend to have an element of regularity. This regularity is characterized by existence of repetitive patterns in the data sequences of the graph metrics. As per our research, the relevance of such regular patterns to the network has not been adequately explored. Such patterns in certain data sequences are indicative of properties like popularity, activeness etc. which are of vital significance for any network. These properties are closely indicated by data sequences of graph metrics - degree prestige, degree centrality and occurrence. In this paper, (a) an improved mining algorithm has been used to extract regular patterns in these sequences, and (b) a methodology has been proposed to quantitatively analyse the behavior of the obtained patterns. To analyze this behavior, a quantification measure coined as "Sumscore" has been defined to compare the relative significance of such patterns. The patterns are ranked according to their Sumscores and insights are then drawn upon it. The efficacy of this method is demonstrated by experiments on two real world datasets.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Landsat 7 Enhanced Thematic Mapper Plus satellite images presents an important data source for many applications related to remote sensing. An effective image restoration method is proposed to fill the missing information in the satellite images. The segmentation of satellite images to find the SLIC Super pixels and then to find the image Segments. The Boundary Reconstruction is performed using Edge Matching to find the area of the missing region. Peak Signal to Noise Ratio and Root Mean Square Error using with boundary reconstruction and without boundary reconstruction to evaluate the quality and the error rate of the satellite images. The results show the capability to predict the missing values accurately in terms of quality, time without need of external information.The values for PSNR has changed from 25 to 90 and RMSE has changed from 180 to 4 in Red Channel of an image.This indicates that quality of the image is high and error rate is less.


Sign in / Sign up

Export Citation Format

Share Document