scholarly journals Entity Type Recognition for Heterogeneous Semantic Graphs

AI Magazine ◽  
2015 ◽  
Vol 36 (1) ◽  
pp. 75-86 ◽  
Author(s):  
Jennifer Sleeman ◽  
Tim Finin ◽  
Anupam Joshi

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

2021 ◽  
Vol 102 ◽  
pp. 02001
Author(s):  
Anja Wilhelm ◽  
Wolfgang Ziegler

The primary focus of technical communication (TC) in the past decade has been the system-assisted generation and utilization of standardized, structured, and classified content for dynamic output solutions. Nowadays, machine learning (ML) approaches offer a new opportunity to integrate unstructured data into existing knowledge bases without the need to manually organize information into topic-based content enriched with semantic metadata. To make the field of artificial intelligence (AI) more accessible for technical writers and content managers, cloud-based machine learning as a service (MLaaS) solutions provide a starting point for domain-specific ML modelling while unloading the modelling process from extensive coding, data processing and storage demands. Therefore, information architects can focus on information extraction tasks and on prospects to include pre-existing knowledge from other systems into the ML modelling process. In this paper, the capability and performance of a cloud-based ML service, IBM Watson, are analysed to assess their value for semantic context analysis. The ML model is based on a supervised learning method and features deep learning (DL) and natural language processing (NLP) techniques. The subject of the analysis is a corpus of scientific publications on the 2019 Coronavirus disease. The analysis focuses on information extractions regarding preventive measures and effects of the pandemic on healthcare workers.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ritaban Dutta ◽  
Cherry Chen ◽  
David Renshaw ◽  
Daniel Liang

AbstractExtraordinary shape recovery capabilities of shape memory alloys (SMAs) have made them a crucial building block for the development of next-generation soft robotic systems and associated cognitive robotic controllers. In this study we desired to determine whether combining video data analysis techniques with machine learning techniques could develop a computer vision based predictive system to accurately predict force generated by the movement of a SMA body that is capable of a multi-point actuation performance. We identified that rapid video capture of the bending movements of a SMA body while undergoing external electrical excitements and adapting that characterisation using computer vision approach into a machine learning model, can accurately predict the amount of actuation force generated by the body. This is a fundamental area for achieving a superior control of the actuation of SMA bodies. We demonstrate that a supervised machine learning framework trained with Restricted Boltzmann Machine (RBM) inspired features extracted from 45,000 digital thermal infrared video frames captured during excitement of various SMA shapes, is capable to estimate and predict force and stress with 93% global accuracy with very low false negatives and high level of predictive generalisation.


2020 ◽  
pp. 1-26
Author(s):  
Joshua Eykens ◽  
Raf Guns ◽  
Tim C.E. Engels

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.


Cancers ◽  
2021 ◽  
Vol 13 (14) ◽  
pp. 3611
Author(s):  
Risa K. Kawaguchi ◽  
Masamichi Takahashi ◽  
Mototaka Miyake ◽  
Manabu Kinoshita ◽  
Satoshi Takahashi ◽  
...  

Radiogenomics use non-invasively obtained imaging data, such as magnetic resonance imaging (MRI), to predict critical biomarkers of patients. Developing an accurate machine learning (ML) technique for MRI requires data from hundreds of patients, which cannot be gathered from any single local hospital. Hence, a model universally applicable to multiple cohorts/hospitals is required. We applied various ML and image pre-processing procedures on a glioma dataset from The Cancer Image Archive (TCIA, n = 159). The models that showed a high level of accuracy in predicting glioblastoma or WHO Grade II and III glioma using the TCIA dataset, were then tested for the data from the National Cancer Center Hospital, Japan (NCC, n = 166) whether they could maintain similar levels of high accuracy. Results: we confirmed that our ML procedure achieved a level of accuracy (AUROC = 0.904) comparable to that shown previously by the deep-learning methods using TCIA. However, when we directly applied the model to the NCC dataset, its AUROC dropped to 0.383. Introduction of standardization and dimension reduction procedures before classification without re-training improved the prediction accuracy obtained using NCC (0.804) without a loss in prediction accuracy for the TCIA dataset. Furthermore, we confirmed the same tendency in a model for IDH1/2 mutation prediction with standardization and application of dimension reduction that was also applicable to multiple hospitals. Our results demonstrated that overfitting may occur when an ML method providing the highest accuracy in a small training dataset is used for different heterogeneous data sets, and suggested a promising process for developing an ML method applicable to multiple cohorts.


Metabolites ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 243 ◽  
Author(s):  
Ulf W. Liebal ◽  
An N. T. Phan ◽  
Malvika Sudhakar ◽  
Karthik Raman ◽  
Lars M. Blank

The metabolome of an organism depends on environmental factors and intracellular regulation and provides information about the physiological conditions. Metabolomics helps to understand disease progression in clinical settings or estimate metabolite overproduction for metabolic engineering. The most popular analytical metabolomics platform is mass spectrometry (MS). However, MS metabolome data analysis is complicated, since metabolites interact nonlinearly, and the data structures themselves are complex. Machine learning methods have become immensely popular for statistical analysis due to the inherent nonlinear data representation and the ability to process large and heterogeneous data rapidly. In this review, we address recent developments in using machine learning for processing MS spectra and show how machine learning generates new biological insights. In particular, supervised machine learning has great potential in metabolomics research because of the ability to supply quantitative predictions. We review here commonly used tools, such as random forest, support vector machines, artificial neural networks, and genetic algorithms. During processing steps, the supervised machine learning methods help peak picking, normalization, and missing data imputation. For knowledge-driven analysis, machine learning contributes to biomarker detection, classification and regression, biochemical pathway identification, and carbon flux determination. Of important relevance is the combination of different omics data to identify the contributions of the various regulatory levels. Our overview of the recent publications also highlights that data quality determines analysis quality, but also adds to the challenge of choosing the right model for the data. Machine learning methods applied to MS-based metabolomics ease data analysis and can support clinical decisions, guide metabolic engineering, and stimulate fundamental biological discoveries.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Fanghuai Hu ◽  
Zhiqing Shao ◽  
Tong Ruan

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.


2021 ◽  
Author(s):  
Ritaban Dutta ◽  
Cherry Chen ◽  
David Renshaw ◽  
Daniel Liang

Abstract Extraordinary shape recovery capabilities of shape memory alloys (SMAs) have made them a crucial building block for the development of next-generation soft robotic systems and associated cognitive robotic controllers. In this study we desired to determine whether combining video data analysis techniques with machine learning techniques could develop a computer vision based predictive system to accurately predict force generated by the movement of a SMA body that is capable of a multi-point actuation performance. We identified that rapid video capture of the bending movements of a SMA body while undergoing external electrical excitements and adapting that characterisation using computer vision approach into a machine learning model, can accurately predict the amount of actuation force generated by the body. This is a fundamental area for achieving a superior control of the actuation of SMA bodies. We demonstrate that a supervised machine learning framework trained with Restricted Boltzmann Machine (RBM) inspired features extracted from 45000 digital thermal infrared video frames captured during excitement of various SMA shapes, is capable to estimate and predict force and stress with 93% global accuracy with very low false negatives and high level of predictive generalisation.


Author(s):  
KRISZTIÁN BALÁZS ◽  
LÁSZLÓ T. KÓCZY

In this paper a family of new methods are proposed for constructing hierarchical-interpolative fuzzy rule bases in the frame of a fuzzy rule based supervised machine learning system modeling black box systems defined by input-output pairs. The resulting hierarchical rule base is constructed by using structure building pure evolutionary and memetic techniques, namely, Genetic and Bacterial Programming Algorithms and their memetic variants containing local search steps. Applying hierarchical-interpolative fuzzy rule bases is a rather efficient way of reducing the complexity of knowledge bases, whereas evolutionary methods (including memetic techniques) ensure a relatively fast convergence in the learning process. As it is presented in the paper, by applying a newly proposed representation schema these approaches can be combined to form hierarchical-interpolative machine learning systems.


Sign in / Sign up

Export Citation Format

Share Document