Real-time feedback query expansion technique for supporting scholarly search using citation network analysis

2019 ◽  
pp. 016555151986334 ◽  
Author(s):  
Shah Khalid ◽  
Shengli Wu ◽  
Aftab Alam ◽  
Irfan Ullah

Scholars routinely search relevant papers to discover and put a new idea into proper context. Despite ongoing advances in scholarly retrieval technologies, locating relevant papers through keyword queries is still quite challenging due to the massive expansion in the size of the research paper repository. To tackle this problem, we propose a novel real-time feedback query expansion technique, which is a two-stage interactive scholarly search process. Upon receiving the initial search query, the retrieval system provides a ranked list of results. In the second stage, a user selects a few relevant papers, from which useful terms are extracted for query expansion. The newly expanded query is run against the index in real time to generate the final list of research papers. In both stages, citation analysis is involved in further improving the quality of the results. The novelty of the approach lies in the combined exploitation of query expansion and citation analysis that may bring the most relevant papers to the top of the search results list. The experimental results on the Association of Computational Linguistics (ACL) Anthology Network data set demonstrate that this technique is effective and robust for locating relevant papers regarding normalised discounted cumulative gain (nDCG), precision and recall rates than several state-of-the-art approaches.

2019 ◽  
Vol 37 (3) ◽  
pp. 429-446 ◽  
Author(s):  
Michal Kačmařík ◽  
Jan Douša ◽  
Florian Zus ◽  
Pavel Václavovic ◽  
Kyriakos Balidakis ◽  
...  

Abstract. An analysis of processing settings impacts on estimated tropospheric gradients is presented. The study is based on the benchmark data set collected within the COST GNSS4SWEC action with observations from 430 Global Navigation Satellite Systems (GNSS) reference stations in central Europe for May and June 2013. Tropospheric gradients were estimated in eight different variants of GNSS data processing using precise point positioning (PPP) with the G-Nut/Tefnut software. The impacts of the gradient mapping function, elevation cut-off angle, GNSS constellation, observation elevation-dependent weighting and real-time versus post-processing mode were assessed by comparing the variants by each to other and by evaluating them with respect to tropospheric gradients derived from two numerical weather models (NWMs). Tropospheric gradients estimated in post-processing GNSS solutions using final products were in good agreement with NWM outputs. The quality of high-resolution gradients estimated in (near-)real-time PPP analysis still remains a challenging task due to the quality of the real-time orbit and clock corrections. Comparisons of GNSS and NWM gradients suggest the 3∘ elevation angle cut-off and GPS+GLONASS constellation for obtaining optimal gradient estimates provided precise models for antenna-phase centre offsets and variations, and tropospheric mapping functions are applied for low-elevation observations. Finally, systematic errors can affect the gradient components solely due to the use of different gradient mapping functions, and still depending on observation elevation-dependent weighting. A latitudinal tilting of the troposphere in a global scale causes a systematic difference of up to 0.3 mm in the north-gradient component, while large local gradients, usually pointing in a direction of increasing humidity, can cause differences of up to 1.0 mm (or even more in extreme cases) in any component depending on the actual direction of the gradient. Although the Bar-Sever gradient mapping function provided slightly better results in some aspects, it is not possible to give any strong recommendation on the gradient mapping function selection.


2020 ◽  
Author(s):  
Ashok G V ◽  
Dr.Vasanthi Kumari P

The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations and real time call processing (Call Detail Record (CDR)). This large data set has the capability to give valuable business insights - for example, real-time user quality of service, network issues, call drop issues, customer satisfaction index, customer churn, network capacity forecast and many more revenue impacting insights. As even setting up of more towers for better coverage would also directly affect the health of habitants around. In this paper, the overall condition of call drops has been reviewed and possible ways to minimize the spectacles of network call drops. Applied Linear Regression algorithm which is used type of predictive analysis. Three major uses for regression analysis Determining the strength of predictors, Forecasting an effect and Trend forecasting. This paper gives to telecom service providers to improve their networks and minimize the network call drops with security. Deliver quality of services to their subscribers using the advanced technologies with accurate algorithms.


Author(s):  
Ganesh Chandra ◽  
Sanjay K. Dwivedi

The quality of retrieval documents in CLIR is often poor compared to IR system due to (1) query mismatching, (2) multiple representations of query terms, and (3) un-translated query terms. The inappropriate translation may lead to poor quality of results. Hence, automated query translation is performed using the back-translation approach for improvement of query translation. This chapter mainly focuses on query expansion (Q.E) and proposes an algorithm to address the drift query issue for Hindi-English CLIR. The system uses FIRE datasets and a set of 50 queries of Hindi language for evaluation. The purpose of a term ordering-based algorithm is to resolve the drift query issue in Q.E. The result shows that the relevancy of Hindi-English CLIR is improved by performing Q.E. using a term ordering-based algorithm. The outcome achieved 60.18% accuracy of results where Q.E has been performed using a term ordering based algorithm, whereas the result of Q.E without a term ordering-based algorithm stands at 57.46%.


2018 ◽  
Author(s):  
Michal Kačmařík ◽  
Jan Douša ◽  
Florian Zus ◽  
Pavel Václavovic ◽  
Kyriakos Balidakis ◽  
...  

Abstract. An analysis of processing settings impact on estimated tropospheric gradients is presented. The study is based on the benchmark data set collected within the COST GNSS4SWEC action with observations from 430 GNSS reference stations in central Europe for May and June 2013. Tropospheric gradients were estimated in eight different variants of GNSS data processing using Precise Point Positioning with the G-Nut/Tefnut software. The impact of the gradient mapping function, elevation cut-off angle, GNSS constellation and real-time versus post-processing mode were assessed by comparing the variants by each to other and by evaluating them with respect to tropospheric gradients derived from two numerical weather prediction models. Generally, all the solutions in the post-processing mode provided a robust tropospheric gradient estimation with a clear relation to real weather conditions. The quality of tropospheric gradient estimates in real-time mode mainly depends on the actual quality of the real-time orbits and clocks. Best results were achieved using the 3° elevation angle cut-off and a combined GPS + GLONASS constellation. Systematic effects of up to 0.3 mm were observed in estimated tropospheric gradients when using different gradient mapping functions which depend on the applied observation elevation-dependent weighting. While the latitudinal troposphere tilting causes a systematic difference in the north gradient component on a global scale, large local wet gradients pointing to a direction of increased humidity cause systematic differences in both gradient components depending on the gradient direction.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8160
Author(s):  
Meijing Gao ◽  
Yang Bai ◽  
Zhilong Li ◽  
Shiyu Li ◽  
Bozhi Zhang ◽  
...  

In recent years, jellyfish outbreaks have frequently occurred in offshore areas worldwide, posing a significant threat to the marine fishery, tourism, coastal industry, and personal safety. Effective monitoring of jellyfish is a vital method to solve the above problems. However, the optical detection method for jellyfish is still in the primary stage. Therefore, this paper studies a jellyfish detection method based on convolution neural network theory and digital image processing technology. This paper studies the underwater image preprocessing algorithm because the quality of underwater images directly affects the detection results. The results show that the image quality is better after applying the three algorithms namely prior defogging, adaptive histogram equalization, and multi-scale retinal enhancement, which is more conducive to detection. We establish a data set containing seven species of jellyfishes and fish. A total of 2141 images are included in the data set. The YOLOv3 algorithm is used to detect jellyfish, and its feature extraction network Darknet53 is optimized to ensure it is conducted in real-time. In addition, we introduce label smoothing and cosine annealing learning rate methods during the training process. The experimental results show that the improved algorithms improve the detection accuracy of jellyfish on the premise of ensuring the detection speed. This paper lays a foundation for the construction of an underwater jellyfish optical imaging real-time monitoring system.


Author(s):  
Brad Morantz

Mining a large data set can be time consuming, and without constraints, the process could generate sets or rules that are invalid or redundant. Some methods, for example clustering, are effective, but can be extremely time consuming for large data sets. As the set grows in size, the processing time grows exponentially. In other situations, without guidance via constraints, the data mining process might find morsels that have no relevance to the topic or are trivial and hence worthless. The knowledge extracted must be comprehensible to experts in the field. (Pazzani, 1997) With time-ordered data, finding things that are in reverse chronological order might produce an impossible rule. Certain actions always precede others. Some things happen together while others are mutually exclusive. Sometimes there are maximum or minimum values that can not be violated. Must the observation fit all of the requirements or just most. And how many is “most?” Constraints attenuate the amount of output (Hipp & Guntzer, 2002). By doing a first-stage constrained mining, that is, going through the data and finding records that fulfill certain requirements before the next processing stage, time can be saved and the quality of the results improved. The second stage also might contain constraints to further refine the output. Constraints help to focus the search or mining process and attenuate the computational time. This has been empirically proven to improve cluster purity. (Wagstaff & Cardie, 2000)(Hipp & Guntzer, 2002) The theory behind these results is that the constraints help guide the clustering, showing where to connect, and which ones to avoid. The application of user-provided knowledge, in the form of constraints, reduces the hypothesis space and can reduce the processing time and improve the learning quality.


2020 ◽  
Vol 10 (4) ◽  
pp. 6102-6108
Author(s):  
S. Khalid ◽  
S. Wu

Published scholarly articles have increased exponentially in recent years. This growth has brought challenges for academic researchers in locating the most relevant papers in their fields of interest. The reasons for this vary. There is the fundamental problem of synonymy and polysemy, the query terms might be too short, thus making it difficult to distinguish between papers. Also, a new researcher has limited knowledge and often is not sure about what she is looking for until the results are displayed. These issues obstruct scholarly retrieval systems in locating highly relevant publications for a given search query. Researchers seek to tackle these issues. However, the user's intent cannot be addressed entirely by introducing a direct information retrieval technique. In this paper, a novel approach is proposed, which combines query expansion and citation analysis for supporting the scholarly search. It is a two-stage academic search process. Upon receiving the initial search query, in the first stage, the retrieval system provides a ranked list of results. In the second stage, the highest-scoring Term Frequency–Inverse Document Frequency (TF-IDF) terms are obtained from a few top-ranked papers for query expansion behind the scene. In both stages, citation analysis is used in further refining the quality of the academic search. The originality of the approach lies in the combined exploitation of both query expansion by pseudo relevance feedback and citation networks analysis that may bring the most relevant papers to the top of the search results list. The approach is evaluated on the ACL dataset. The experimental results reveal that the technique is effective and robust for locating relevant papers regarding normalized Discounted Cumulative Gain (nDCG), precision, and recall.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Miguel Angel M. A. Sanchez-Tena ◽  
Cristina C. Alvarez-Peregrina ◽  
Cesar C. Villa-Collar

Introduction. Dry eye is one of the most frequent eye problems with prevalence and incidence from 5% to 50%. Citation network analysis allows us to simplify information in a visual way and provides a better understanding of the research done in a specific field. The objective of this paper is to quantify and analyse the relationships among the scientific literature in this field using citation network analysis. Materials and Methods. The program used to analyse the citations was CitNetExplorer®. Previously, papers published in the research field during a predefined period were found using the keywords defined in Web of ScienceTM (WOS). Results. Using the keyword “dry eye,” during the period 2007 to 2018, the most cited paper is by Lemp, MA (2007), with a citation index score of 913 in our citation network containing 6,500 most relevant papers. Analysing clustering, we found 5 relevant groups that match the main areas of research in this field: definition and classification, treatment, retina, refractive surgery, and quality of vision. Core Publication is composed of 64% of the papers in the network, which is a high percentage. It indicates a clear focus on the research carried out in this field. Conclusions. This citation network analysis shows definition and classification of dry eye to be the most researched area in this field, followed by treatment.


2018 ◽  
Vol 37 (1) ◽  
pp. 39-51
Author(s):  
Kuo-Chung Chu ◽  
Hsin-Ke Lu ◽  
Wen-I Liu

Online e-journal databases enable scholars to search the literature in a research domain, or to cross-search an interdisciplinary field. The key literature can thereby be efficiently mapped out. This study builds a Web-based citation analysis system consisting of four modules: (1) literature search; (2) statistics; (3) articles analysis; and (4) co-citation analysis. The system focuses on the PubMed Central dataset and facilitates specific keyword searches in each research domain in terms of authors, journals, and core issues. In addition, we use data mining techniques for co-citation analysis. The results could assist researchers to develop an in-depth understanding of the research domain. An automated system for co-citation analysis promises to facilitate understanding of the changing trends that affect the journal structure of research domains. The proposed system has the potential to become a value-added database of the healthcare domain, which will benefit researchers.


Sign in / Sign up

Export Citation Format

Share Document