Named Entity Based Ranking with Term Proximity for XML Retrieval

2018 ◽  
Vol 8 (2) ◽  
pp. 57-77 ◽  
Author(s):  
Abubakar Roko ◽  
Shyamala Doraisamy ◽  
Azreen Azman ◽  
Azrul Hazri Jantan

In this article, an indexing scheme that includes the named entity category for each indexed term is proposed. Based on this, two methods are proposed, one to infer the semantics of an XML element based on its data content, called the confidence value of the element, and the second method computes the proximity scores of the query terms. The confidence value of an element is obtained based on the probability of a named entity category in the data content of the underlying XML element. The proximity score of the query terms measures the proximity and ordering of the query term within an XML element. The article then shows how a ranking function uses the confidence value of an XML element and proximity score to mitigate the impact of higher frequency terms and compute the relevance between a keyword query and an XML fragment. Finally, a keyword search system is introduced and experiments show that the proposed system outperforms existing approaches in terms of search quality and achieve a higher efficiency.

2015 ◽  
Vol 11 (1) ◽  
pp. 33-53
Author(s):  
Abubakar Roko ◽  
Shyamala Doraisamy ◽  
Azrul Hazri Jantan ◽  
Azreen Azman

Purpose – The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine while retaining the simple keyword search query interface. A more effective way for searching XML database is to use structured queries. However, using query languages to express queries prove to be difficult for most users since this requires learning a query language and knowledge of the underlying data schema. On the other hand, the success of Web search engines has made many users to be familiar with keyword search and, therefore, they prefer to use a keyword search query interface to search XML data. Design/methodology/approach – Existing query structuring approaches require users to provide structural hints in their input keyword queries even though their interface is keyword base. Other problems with existing systems include their inability to put keyword query ambiguities into consideration during query structuring and how to select the best generated structure query that best represents a given keyword query. To address these problems, this study allows users to submit a schema independent keyword query, use named entity recognition (NER) to categorize query keywords to resolve query ambiguities and compute semantic information for a node from its data content. Algorithms were proposed that find user search intentions and convert the intentions into a set of ranked structured queries. Findings – Experiments with Sigmod and IMDB datasets were conducted to evaluate the effectiveness of the method. The experimental result shows that the XKQSS is about 20 per cent more effective than XReal in terms of return nodes identification, a state-of-art systems for XML retrieval. Originality/value – Existing systems do not take keyword query ambiguities into account. XKSS consists of two guidelines based on NER that help to resolve these ambiguities before converting the submitted query. It also include a ranking function computes a score for each generated query by using both semantic information and data statistic, as opposed to data statistic only approach used by the existing approaches.


Neurology ◽  
2021 ◽  
pp. 10.1212/WNL.0000000000011892
Author(s):  
Yeonwoo Kim ◽  
Erica Twardzik ◽  
Suzanne E. Judd ◽  
Natalie Colabianchi

ObjectiveTo summarize overall patterns of the impact of neighborhood socioeconomic status (nSES) on incidence stroke and uncover potential gaps in the literature, we conducted a systematic review of studies examining the association between nSES and incident stroke, independent of individual socioeconomic status (SES).MethodsFour electronic databases and reference lists of included articles were searched, and corresponding authors were contacted to locate additional studies. A keyword search strategy included the three broad domains of neighborhood, SES, and stroke. Eight studies met our inclusion criteria (e.g., nSES as an exposure, individual SES as a covariate, and incident stroke as an outcome). We coded study methodology and findings across the eight studies.ResultsThe results provide evidence for the overall nSES and incident stroke association in Sweden and Japan, but not within the United States. Findings were inconclusive when examining the nSES-incident stroke association stratified by race. We found evidence for the mediating role of biological factors in the nSES-incident stroke association.ConclusionsHigher neighborhood disadvantage was found to be associated with higher stroke risk, but it was not significant in all the studies. The relationship between nSES and stroke risk within different racial groups in the United States was inconclusive. Inconsistencies may be driven by differences in covariate adjustment (e.g., individual-level sociodemographic characteristics, neighborhood-level racial composition). Additional research is needed to investigate potential intermediate and modifiable factors of the nSES and incident stroke association, which could serve as intervention points.


Author(s):  
Luiz Henrique Bonifacio ◽  
Paulo Arantes Vilela ◽  
Gustavo Rocha Lobato ◽  
Eraldo Rezende Fernandes

2021 ◽  
pp. 1-10
Author(s):  
Zhucong Li ◽  
Zhen Gan ◽  
Baoli Zhang ◽  
Yubo Chen ◽  
Jing Wan ◽  
...  

Abstract This paper describes our approach for the Chinese Medical named entity recognition(MER) task organized by the 2020 China conference on knowledge graph and semantic computing(CCKS) competition. In this task, we need to identify the entity boundary and category labels of six entities from Chinese electronic medical record(EMR). We construct a hybrid system composed of a semi-supervised noisy label learning model based on adversarial training and a rule postprocessing module. The core idea of the hybrid system is to reduce the impact of data noise by optimizing the model results. Besides, we use post-processing rules to correct three cases of redundant labeling, missing labeling, and wrong labeling in the model prediction results. Our method proposed in this paper achieved strict criteria of 0.9156 and relax criteria of 0.9660 on the final test set, ranking first.


The movement of clients from desktop to mobility devices, made a major stage in the portable trade. All the up and coming advancements, parts, delicate products are very composed by the portable. As versatility is unavoidable prerequisite by the clients, the outline of programming with less battery utilization are generally invited. The calculation procedure is relative to the battery utilization. The calculation at the cell phones genuinely influences the series of the portable. Hence making the calculation at the cloud has an awesome arrangement in diminishing the battery utilization. The delegate calculation inquiry is a productive approach to safeguard the battery of the mobile devices. Indeed, even the encryption/unscrambling of records takes control so proposing IOPE for scrambling the document which is a basic plan


2021 ◽  
pp. 1-12
Author(s):  
Anita Ramalingam ◽  
Subalalitha Chinnaudayar Navaneethakrishnan

Thirukkural, a Tamil classic literature, which was written in 300 BCE is a didactic literature. Though Thirukkural comprises 1330 couplets which are organized into three sections and 133 chapters, in order to retrieve meaningful Thirukkural for a given query in search systems, a better organization of the Thirukkural is needed. This paper lays such a foundation by classifying the Thirukkural into ten new categories called superclasses that is helpful for building a better Information Retrieval (IR) system. The classifier is trained using Multinomial Naïve Bayes algorithm. Each superclass is further classified into two subcategories based on the didactic information. The proposed classification framework is evaluated using precision, recall and F-score metrics and achieved an overall F-score of 82.33% and a comparison analysis has been done with the Support Vector Machine, Logistic Regression and Random Forest algorithms. An IR system is built on top of the proposed system and the performance comparison has been done with the Google search and a locally built keyword search. The proposed classification framework has achieved a mean average precision score of 89%, whereas the Google search and keyword search have yielded 59% and 68% respectively.


Sign in / Sign up

Export Citation Format

Share Document