term weighting
Recently Published Documents


TOTAL DOCUMENTS

380
(FIVE YEARS 86)

H-INDEX

26
(FIVE YEARS 4)

2022 ◽  
Vol 12 (1) ◽  
pp. 86
Author(s):  
Shang-Ming Zhou ◽  
Ronan A. Lyons ◽  
Muhammad A. Rahman ◽  
Alexander Holborow ◽  
Sinead Brophy

(1) Background: This study investigates influential risk factors for predicting 30-day readmission to hospital for Campylobacter infections (CI). (2) Methods: We linked general practitioner and hospital admission records of 13,006 patients with CI in Wales (1990–2015). An approach called TF-zR (term frequency-zRelevance) technique was presented to evaluates how relevant a clinical term is to a patient in a cohort characterized by coded health records. The zR is a supervised term-weighting metric to assign weight to a term based on relative frequencies of the term across different classes. Cost-sensitive classifier with swarm optimization and weighted subset learning was integrated to identify influential clinical signals as predictors and optimal model for readmission prediction. (3) Results: From a pool of up to 17,506 variables, 33 most predictive factors were identified, including age, gender, Townsend deprivation quintiles, comorbidities, medications, and procedures. The predictive model predicted readmission with 73% sensitivity and 54% specificity. Variables associated with readmission included male gender, recurrent tonsillitis, non-healing open wounds, operation for in-gown toenails. Cystitis, paracetamol/codeine use, age (21–25), and heliclear triple pack use, were associated with a lower risk of readmission. (4) Conclusions: This study gives a profile of clustered variables that are predictive of readmission associated with campylobacteriosis.


Author(s):  
Hongyu Jiang ◽  
Zhiqi Lei ◽  
Yanghui Rao ◽  
Haoran Xie ◽  
Fu Lee Wang

Author(s):  
Anbuselvan Sangodiah ◽  
Yong Tien Fui ◽  
Lim Ean Heng ◽  
Norazira A Jalil ◽  
Ramesh Kumar Ayyasamy ◽  
...  

Author(s):  
Mohammed Rais ◽  
Mohammed Bekkali ◽  
Abdelmonaime Lachkar

Searching for the best sense for a polysemous word remains one of the greatest challenges in the representation of biomedical text. To this end, Word Sense Disambiguation (WSD) algorithms mostly rely on an External Source of Knowledge, like a Thesaurus or Ontology, for automatically selecting the proper concept of an ambiguous term in a given Window of Context using semantic similarity and relatedness measures. In this paper, we propose a Web-based Kernel function for measuring the semantic relatedness between concepts to disambiguate an expression versus multiple possible concepts. This measure uses the large volume of documents returned by PubMed Search engine to determine the greater context for a biomedical short text through a new term weighting scheme based on Rough Set Theory (RST). To illustrate the efficiency of our proposed method, we evaluate a WSD algorithm based on this measure on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. The obtained results demonstrate promising improvements.


Informatica ◽  
2021 ◽  
Vol 45 (3) ◽  
Author(s):  
Surender Singh Samant ◽  
NL Bhanu Murthy ◽  
Aruna Malapati

Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 360
Author(s):  
Pablo Ormeño ◽  
Marcelo Mendoza ◽  
Carlos Valle

Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.


Sign in / Sign up

Export Citation Format

Share Document