Relatedness and TBox-Driven Rule Learning in Large Knowledge Bases

Giuseppe Pirrò

doi:10.1609/aaai.v34i03.5690

Relatedness and TBox-Driven Rule Learning in Large Knowledge Bases

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5690 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2975-2982

Author(s):

Giuseppe Pirrò

Keyword(s):

Quality Assessment ◽

Rule Learning ◽

Semantic Relatedness ◽

Knowledge Bases ◽

The Body ◽

Traversal Algorithm

We present RARL, an approach to discover rules of the form body ⇒ head in large knowledge bases (KBs) that typically include a set of terminological facts (TBox) and a set of TBox-compliant assertional facts (ABox). RARL's main intuition is to learn rules by leveraging TBox-information and the semantic relatedness between the predicate(s) in the atoms of the body and the predicate in the head. RARL uses an efficient relatedness-driven TBox traversal algorithm, which given an input rule head, generates the set of most semantically related candidate rule bodies. Then, rule confidence is computed in the ABox based on a set of positive and negative examples. Decoupling candidate generation and rule quality assessment offers greater flexibility than previous work.

Download Full-text

A statistical quality assessment method for longitudinal observations in electronic health record data with an application to the VA million veteran program

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01643-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Hui Wang ◽

Ilana Belitskaya-Levy ◽

Fan Wu ◽

Jennifer S. Lee ◽

Mei-Chiung Shih ◽

...

Keyword(s):

Electronic Health Record ◽

Quality Assessment ◽

The Body ◽

Quality Data ◽

Health Record ◽

Continuous Variables ◽

Electronic Health Record Data ◽

Thresholding Method ◽

Electronic Health ◽

Weight Data

Abstract Background To describe an automated method for assessment of the plausibility of continuous variables collected in the electronic health record (EHR) data for real world evidence research use. Methods The most widely used approach in quality assessment (QA) for continuous variables is to detect the implausible numbers using prespecified thresholds. In augmentation to the thresholding method, we developed a score-based method that leverages the longitudinal characteristics of EHR data for detection of the observations inconsistent with the history of a patient. The method was applied to the height and weight data in the EHR from the Million Veteran Program Data from the Veteran’s Healthcare Administration (VHA). A validation study was also conducted. Results The receiver operating characteristic (ROC) metrics of the developed method outperforms the widely used thresholding method. It is also demonstrated that different quality assessment methods have a non-ignorable impact on the body mass index (BMI) classification calculated from height and weight data in the VHA’s database. Conclusions The score-based method enables automated and scaled detection of the problematic data points in health care big data while allowing the investigators to select the high-quality data based on their need. Leveraging the longitudinal characteristics in EHR will significantly improve the QA performance.

Download Full-text

Guided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback

10.29007/ppgx ◽

2020 ◽

Author(s):

Yan Wu ◽

Jinchuan Chen ◽

Plarent Haxhidauti ◽

Vinu Ellampallil Venugopal ◽

Martin Theobald

Keyword(s):

Logic Programming ◽

Inductive Logic Programming ◽

Large Scale ◽

Inductive Logic ◽

Rule Learning ◽

Knowledge Bases ◽

User Feedback ◽

Learning Approaches ◽

Guided Inductive

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both of these large-scale KBs achieve very high average precision values (above 95% for YAGO3), subtle mistakes in a few of the underlying ex- traction rules may still impose a substantial amount of systematic extraction mistakes for specific relations. For example, by applying the same regular expressions to extract per- son names of both Asian and Western nationality, YAGO erroneously swaps most of the family and given names of Asian person entities. For traditional rule-learning approaches based on Inductive Logic Programming (ILP), it is very difficult to detect these systematic extraction mistakes, since they usually occur only in a relatively small subdomain of the relations’ arguments. In this paper, we thus propose a guided form of ILP, coined “GILP”, that iteratively asks for small amounts of user feedback over a given KB to learn a set of data-cleaning rules that (1) best match the feedback and (2) also generalize to a larger portion of facts in the KB. We propose both algorithms and respective metrics to automatically assess the quality of the learned rules with respect to the user feedback.

Download Full-text

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation

10.1101/039008 ◽

2016 ◽

Author(s):

Neil R Smalheiser ◽

Gary Bonifield

Keyword(s):

Scientific Discovery ◽

Semantic Relatedness ◽

Mesh Term ◽

The Body ◽

Similarity Metrics ◽

Name Disambiguation ◽

Medical Subject Headings ◽

Mesh Terms ◽

Author Name Disambiguation ◽

Subject Headings

In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same individual (using the 2009 Author-ity author name disambiguation dataset as a gold standard). The two metrics are only modestly correlated with each other (r = 0.50), indicating that they capture different aspects of term usage. The article-based metric provides a measure of semantic relatedness, and MeSH term pairs that co-occur more often than expected by chance may reflect relations between the two terms. In contrast, the author metric is indicative of how individuals practice science, and may have value for author name disambiguation and studies of scientific discovery. We have calculated article metrics for all MeSH terms appearing in at least 25 articles in MEDLINE (as of 2014) and author metrics for MeSH terms published as of 2009. The dataset is freely available for download and can be queried at http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html.

Download Full-text

MODELING OF NATURAL GAS QUALITY ASSESSMENT USING FUZZY KNOWLEDGE BASES

Modern technology materials and design in construction ◽

10.31649/2311-1429-2019-2-114-122 ◽

2020 ◽

Vol 27 (2) ◽

pp. 114-122

Author(s):

K. Predun ◽

◽

О. Obodianska ◽

Y. Franchuk ◽

◽

...

Keyword(s):

Natural Gas ◽

Quality Assessment ◽

Knowledge Bases ◽

Fuzzy Knowledge Bases ◽

Fuzzy Knowledge

Download Full-text

Guarded hybrid knowledge bases

Theory and Practice of Logic Programming ◽

10.1017/s1471068407003201 ◽

2008 ◽

Vol 8 (3) ◽

pp. 411-429 ◽

Cited By ~ 12

Author(s):

STIJN HEYMANS ◽

JOS DE BRUIJN ◽

LIVIA PREDOIU ◽

CRISTINA FEIER ◽

DAVY VAN NIEWENBORGH

Keyword(s):

Description Logic ◽

Description Logics ◽

Logic Program ◽

Knowledge Bases ◽

The Body ◽

Knowledge Representation And Reasoning ◽

Advantages And Disadvantages ◽

Hybrid Knowledge ◽

Satisfiability Checking ◽

Answer Set Semantics

AbstractRecently, there has been a lot of interest in the integration of Description Logics (DL) and rules on the Semantic Web. We defineguarded hybrid knowledge bases(org-hybrid knowledge bases) as knowledge bases that consist of a Description Logic knowledge base and aguardedlogic program, similar to the$\mathcal{DL}$+logknowledge bases from Rosati (In Proceedings of the 10th International Conference on Principles of Knowledge Representation and Reasoning, AAAI Press, Menlo Park, CA, 2006, pp. 68–78.). g-Hybrid knowledge bases enable an integration of Description Logics and Logic Programming where, unlike in other approaches, variables in the rules of a guarded program do not need to appear in positive non-DL atoms of the body, i.e., DL atoms can act asguardsas well. Decidability of satisfiability checking of g-hybrid knowledge bases is shown for the particular DL$\mathcal{DLRO}^{\-{le}}$, which is close to OWL DL, by a reduction to guarded programs under the open answer set semantics. Moreover, we show 2-Exptime-completeness for satisfiability checking of such g-hybrid knowledge bases. Finally, we discuss advantages and disadvantages of our approach compared with$\mathcal{DL}$+logknowledge bases.

Download Full-text

Efficient Weighted Semantic Score Based on the Huffman Coding Algorithm and Knowledge Bases for Word Sequences Embedding

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2020040107 ◽

2020 ◽

Vol 16 (2) ◽

pp. 126-142

Author(s):

Nada Ben-Lhachemi ◽

El Habib Nfaoui

Keyword(s):

Language Processing ◽

Recommendation System ◽

Semantic Relatedness ◽

Knowledge Bases ◽

Word Embedding ◽

Huffman Coding ◽

Text Representation ◽

Text Data ◽

New Feature ◽

Embedding Methods

Learning text representation is forming a core for numerous natural language processing applications. Word embedding is a type of text representation that allows words with similar meaning to have similar representation. Word embedding techniques categorize semantic similarities between linguistic items based on their distributional properties in large samples of text data. Although these techniques are very efficient, handling semantic and pragmatics ambiguity with high accuracy is still a challenging research task. In this article, we propose a new feature as a semantic score which handles ambiguities between words. We use external knowledge bases and the Huffman Coding algorithm to compute this score that depicts the semantic relatedness between all fragments composing a given text. We combine this feature with word embedding methods to improve text representation. We evaluate our method on a hashtag recommendation system in Twitter where text is noisy and short. The experimental results demonstrate that, compared with state-of-the-art algorithms, our method achieves good results.

Download Full-text

Combining Fact Extraction and Verification with Neural Semantic Matching Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016859 ◽

2019 ◽

Vol 33 ◽

pp. 6859-6866 ◽

Cited By ~ 7

Author(s):

Yixin Nie ◽

Haonan Chen ◽

Mohit Bansal

Keyword(s):

Vector Space ◽

Semantic Relatedness ◽

Document Retrieval ◽

Knowledge Bases ◽

Semantic Matching ◽

Matching Problem ◽

Matching Networks ◽

Three Stages ◽

Matching Models ◽

Semantic Awareness

The increasing concern with misinformation has stimulated research efforts on automatic fact checking. The recentlyreleased FEVER dataset introduced a benchmark factverification task in which a system is asked to verify a claim using evidential sentences from Wikipedia documents. In this paper, we present a connected system consisting of three homogeneous neural semantic matching models that conduct document retrieval, sentence selection, and claim verification jointly for fact extraction and verification. For evidence retrieval (document retrieval and sentence selection), unlike traditional vector space IR models in which queries and sources are matched in some pre-designed term vector space, we develop neural models to perform deep semantic matching from raw textual input, assuming no intermediate term representation and no access to structured external knowledge bases. We also show that Pageview frequency can also help improve the performance of evidence retrieval results, that later can be matched by using our neural semantic matching network. For claim verification, unlike previous approaches that simply feed upstream retrieved evidence and the claim to a natural language inference (NLI) model, we further enhance the NLI model by providing it with internal semantic relatedness scores (hence integrating it with the evidence retrieval modules) and ontological WordNet features. Experiments on the FEVER dataset indicate that (1) our neural semantic matching method outperforms popular TF-IDF and encoder models, by significant margins on all evidence retrieval metrics, (2) the additional relatedness score and WordNet features improve the NLI model via better semantic awareness, and (3) by formalizing all three subtasks as a similar semantic matching problem and improving on all three stages, the complete model is able to achieve the state-of-the-art results on the FEVER test set (two times greater than baseline results).1

Download Full-text

Two-Way Impacts Between Macrophages on Vascular Endothelium and Characteristics of TCM Syndromes in Dyslipidemic Mice with the Phlegm-Dampness Retention syndrome and the Spleen and Kidney Yang Deficiency syndrome Using RNA-Seq

10.21203/rs.3.rs-259192/v1 ◽

2021 ◽

Author(s):

Jing Chen ◽

Chao Ye ◽

Zheng Yang ◽

Tieshan Wang ◽

Bing Xu ◽

...

Keyword(s):

Quality Assessment ◽

The Body ◽

Transcriptomic Analysis ◽

Differentially Expressed ◽

Test Model ◽

Biological Processes ◽

Rna Seq ◽

Pathogenic Factors ◽

Kidney Yang Deficiency ◽

Protective Processes

Abstract Background: ‘Treating the same disease with different methods’ is a Traditional Chinese Medicine (TCM) therapeutic concept. That means although patients are diagnosed with the same disease, they may have different syndromes that require distinct drug administrations. This study aimed to identify the differentially expressed genes and related biological processes in dyslipidemia with the Phlegm-Dampness Retention (PDR) syndrome and the Spleen and Kidney Yang Deficiency (SKYD) syndrome using transcriptomic analysis.Methods: Ten ApoE knockout (ApoE-/-) mice were used for the establishment of dyslipidemic disease-syndrome models via multifactor-hybrid modeling, with 5 in the the PDR group and 5 in the SKYD group. Five C57BL/6J mice were employed as normal controls (NC) group. Test model quality. Aortic endothelial macrophages in mice were screened using flow cytometry. Transcriptomic analysis was performed for macrophages using RNA-Seq.Results: ①The quality assessment of the disease-syndrome model showed that TG, TC, and LDL-C levels significantly increased in the PDR and SKYD groups versus the NC group (P < 0.05). Combined with HE staining of aorta, the disease model was successfully established. ②The quality assessment of the syndrome models showed that mice in the PDR group presented with typical manifestations of the PDR syndrome, and mice in the SKYD group had the related manifestations of the SKYD syndrome, indicating that the syndrome models were successfully constructed. ③After comparing the differentially expressed gene (DEG) expressions in macrophages in dyslipidemia mice with different syndromes, 4142 genes were identified with statistical significance (P < 0.05). The Gene Ontology (GO) analysis for the DEGs showed that biological process of difference between PDR group and SKYD group include both adverse and protective processes were included.Conclusion: The DEGs between the PDR syndrome and the SKYD syndrome indicate different biological mechanisms between the onset of the two syndromes. They have distinctive biological processes, including adverse and protective processes, corresponding to the invasion of pathogenic factors into the body and the fight of healthy qi against pathogenic factors, respectively, in the TCM theory. Our results have demonstrated the biological evidence behind ‘treating the same disease with different treatments’ in TCM.

Download Full-text

Assessment of relation between Dhatusarata and Dehabala w.s.r. to Harvard Step Test

Journal of Ayurveda and Integrated Medical Sciences (JAIMS) ◽

10.21760/jaims.5.6.11 ◽

2020 ◽

Vol 5 (06) ◽

pp. 87-90

Author(s):

Amar Baliram Abhrange ◽

Archana Amar Abhrange ◽

Sachin S. Waghmare

Keyword(s):

Physical Fitness ◽

Quality Assessment ◽

Human Body ◽

The Body ◽

Step Test ◽

Practical Application ◽

Tissue Quality ◽

Physical Exercises ◽

Five Elements ◽

Bodily Movements

The growth and existence of the human body is dependent on these seven Dhatus. These seven Dhatus are composed of five elements or Panchmahabhutas. Dhatu Sarata or Tissue excellence is a quality assessment of seven Dhatu. Examination of Dhatu Sarata is done at physical and psychological level. For determining the Dhatu Sarata, when the positive features are present above 75 %, it will be considered as best tissue quality (Uttam Sarata). When the positive features are present between 75 % and 25 %, it will be considered as moderate tissue quality (Madhyam Sarata). When positive features are present below 25 %, it will be labeled as poor tissue quality (Heen Sarata). The bodily movements which are meant for producing firmness and strength in the body are known as Vyayama or physical exercises. „Dehabala’ (Physical fitness) of subjects will be determined by Harvard step test. “Harvard Step Test” is a practical application of Ayurvedic Principal that “Bala should be measured by Vyamshakti” (Balam Vyayamshakty Parikshet). The person should be examined with reference to his capacity for exercise which is determined by one‟s ability to perform work. Therefore this study will estimate Dehabala and study the Dhatusarata and their association between them.

Download Full-text