Entity Type Recognition for Heterogeneous Semantic Graphs

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

Download Full-text

Coreference resolution of Korean anaphoric zero objects: Towards a supervised machine learning approach

International Journal of Computer Science and Information Technology for Education ◽

10.21742/ijcsite.2016.1.01 ◽

2016 ◽

Vol 1 (1) ◽

pp. 1-6

Author(s):

Euhee Kim ◽

◽

Myung-Kwan Park ◽

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Coreference Resolution ◽

Machine Learning Approach

Download Full-text

Extending semantic context analysis using machine learning services to process unstructured data

SHS Web of Conferences ◽

10.1051/shsconf/202110202001 ◽

2021 ◽

Vol 102 ◽

pp. 02001

Author(s):

Anja Wilhelm ◽

Wolfgang Ziegler

Keyword(s):

Machine Learning ◽

Language Processing ◽

Knowledge Bases ◽

Semantic Context ◽

Unstructured Data ◽

Context Analysis ◽

Scientific Publications ◽

Starting Point ◽

Processing And Storage ◽

Modelling Process

The primary focus of technical communication (TC) in the past decade has been the system-assisted generation and utilization of standardized, structured, and classified content for dynamic output solutions. Nowadays, machine learning (ML) approaches offer a new opportunity to integrate unstructured data into existing knowledge bases without the need to manually organize information into topic-based content enriched with semantic metadata. To make the field of artificial intelligence (AI) more accessible for technical writers and content managers, cloud-based machine learning as a service (MLaaS) solutions provide a starting point for domain-specific ML modelling while unloading the modelling process from extensive coding, data processing and storage demands. Therefore, information architects can focus on information extraction tasks and on prospects to include pre-existing knowledge from other systems into the ML modelling process. In this paper, the capability and performance of a cloud-based ML service, IBM Watson, are analysed to assess their value for semantic context analysis. The ML model is based on a supervised learning method and features deep learning (DL) and natural language processing (NLP) techniques. The subject of the analysis is a corpus of scientific publications on the 2019 Coronavirus disease. The analysis focuses on information extractions regarding preventive measures and effects of the pandemic on healthcare workers.

Download Full-text

Vision based supervised restricted Boltzmann machine helps to actuate novel shape memory alloy accurately

Scientific Reports ◽

10.1038/s41598-021-95939-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ritaban Dutta ◽

Cherry Chen ◽

David Renshaw ◽

Daniel Liang

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Shape Memory ◽

Video Data ◽

The Body ◽

Restricted Boltzmann Machine ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Boltzmann Machine ◽

High Level

AbstractExtraordinary shape recovery capabilities of shape memory alloys (SMAs) have made them a crucial building block for the development of next-generation soft robotic systems and associated cognitive robotic controllers. In this study we desired to determine whether combining video data analysis techniques with machine learning techniques could develop a computer vision based predictive system to accurately predict force generated by the movement of a SMA body that is capable of a multi-point actuation performance. We identified that rapid video capture of the bending movements of a SMA body while undergoing external electrical excitements and adapting that characterisation using computer vision approach into a machine learning model, can accurately predict the amount of actuation force generated by the body. This is a fundamental area for achieving a superior control of the actuation of SMA bodies. We demonstrate that a supervised machine learning framework trained with Restricted Boltzmann Machine (RBM) inspired features extracted from 45,000 digital thermal infrared video frames captured during excitement of various SMA shapes, is capable to estimate and predict force and stress with 93% global accuracy with very low false negatives and high level of predictive generalisation.

Download Full-text

Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

Quantitative Science Studies ◽

10.1162/qss_a_00106 ◽

2020 ◽

pp. 1-26

Author(s):

Joshua Eykens ◽

Raf Guns ◽

Tim C.E. Engels

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Social Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Fine Grained ◽

Textual Data

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.

Download Full-text

Assessing Versatile Machine Learning Models for Glioma Radiogenomic Studies across Hospitals

Cancers ◽

10.3390/cancers13143611 ◽

2021 ◽

Vol 13 (14) ◽

pp. 3611

Author(s):

Risa K. Kawaguchi ◽

Masamichi Takahashi ◽

Mototaka Miyake ◽

Manabu Kinoshita ◽

Satoshi Takahashi ◽

...

Keyword(s):

Machine Learning ◽

Dimension Reduction ◽

Prediction Accuracy ◽

Heterogeneous Data ◽

Training Dataset ◽

Cancer Center ◽

Data Sets ◽

Imaging Data ◽

Who Grade ◽

High Level

Radiogenomics use non-invasively obtained imaging data, such as magnetic resonance imaging (MRI), to predict critical biomarkers of patients. Developing an accurate machine learning (ML) technique for MRI requires data from hundreds of patients, which cannot be gathered from any single local hospital. Hence, a model universally applicable to multiple cohorts/hospitals is required. We applied various ML and image pre-processing procedures on a glioma dataset from The Cancer Image Archive (TCIA, n = 159). The models that showed a high level of accuracy in predicting glioblastoma or WHO Grade II and III glioma using the TCIA dataset, were then tested for the data from the National Cancer Center Hospital, Japan (NCC, n = 166) whether they could maintain similar levels of high accuracy. Results: we confirmed that our ML procedure achieved a level of accuracy (AUROC = 0.904) comparable to that shown previously by the deep-learning methods using TCIA. However, when we directly applied the model to the NCC dataset, its AUROC dropped to 0.383. Introduction of standardization and dimension reduction procedures before classification without re-training improved the prediction accuracy obtained using NCC (0.804) without a loss in prediction accuracy for the TCIA dataset. Furthermore, we confirmed the same tendency in a model for IDH1/2 mutation prediction with standardization and application of dimension reduction that was also applicable to multiple hospitals. Our results demonstrated that overfitting may occur when an ML method providing the highest accuracy in a small training dataset is used for different heterogeneous data sets, and suggested a promising process for developing an ML method applicable to multiple cohorts.

Download Full-text

Machine Learning Applications for Mass Spectrometry-Based Metabolomics

Metabolites ◽

10.3390/metabo10060243 ◽

2020 ◽

Vol 10 (6) ◽

pp. 243 ◽

Cited By ~ 7

Author(s):

Ulf W. Liebal ◽

An N. T. Phan ◽

Malvika Sudhakar ◽

Karthik Raman ◽

Lars M. Blank

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Data Analysis ◽

Metabolic Engineering ◽

Data Representation ◽

Heterogeneous Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

The metabolome of an organism depends on environmental factors and intracellular regulation and provides information about the physiological conditions. Metabolomics helps to understand disease progression in clinical settings or estimate metabolite overproduction for metabolic engineering. The most popular analytical metabolomics platform is mass spectrometry (MS). However, MS metabolome data analysis is complicated, since metabolites interact nonlinearly, and the data structures themselves are complex. Machine learning methods have become immensely popular for statistical analysis due to the inherent nonlinear data representation and the ability to process large and heterogeneous data rapidly. In this review, we address recent developments in using machine learning for processing MS spectra and show how machine learning generates new biological insights. In particular, supervised machine learning has great potential in metabolomics research because of the ability to supply quantitative predictions. We review here commonly used tools, such as random forest, support vector machines, artificial neural networks, and genetic algorithms. During processing steps, the supervised machine learning methods help peak picking, normalization, and missing data imputation. For knowledge-driven analysis, machine learning contributes to biomarker detection, classification and regression, biochemical pathway identification, and carbon flux determination. Of important relevance is the combination of different omics data to identify the contributions of the various regulatory levels. Our overview of the recent publications also highlights that data quality determines analysis quality, but also adds to the challenge of choosing the right model for the data. Machine learning methods applied to MS-based metabolomics ease data analysis and can support clinical decisions, guide metabolic engineering, and stimulate fundamental biological discoveries.

Download Full-text

Self-Supervised Chinese Ontology Learning from Online Encyclopedias

The Scientific World JOURNAL ◽

10.1155/2014/848631 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Fanghuai Hu ◽

Zhiqing Shao ◽

Tong Ruan

Keyword(s):

Machine Learning ◽

Structural Information ◽

Relation Extraction ◽

Knowledge Bases ◽

The Self ◽

Supervised Machine Learning ◽

Ontology Learning ◽

High Coverage ◽

Category Labels ◽

Training Examples

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.

Download Full-text

A Supervised Machine Learning Based Approach for Automatically Extracting High-Level Threat Intelligence from Unstructured Sources

2018 International Conference on Frontiers of Information Technology (FIT) ◽

10.1109/fit.2018.00030 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yumna Ghazi ◽

Zahid Anwar ◽

Rafia Mumtaz ◽

Shahzad Saleem ◽

Ali Tahir

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Threat Intelligence ◽

High Level

Download Full-text

Vision Based Supervised Restricted Boltzmann Machine Helps to Actuate Novel Shape Memory Alloy Accurately

10.21203/rs.3.rs-577116/v1 ◽

2021 ◽

Author(s):

Ritaban Dutta ◽

Cherry Chen ◽

David Renshaw ◽

Daniel Liang

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Shape Memory ◽

Video Data ◽

The Body ◽

Restricted Boltzmann Machine ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Boltzmann Machine ◽

High Level

Abstract Extraordinary shape recovery capabilities of shape memory alloys (SMAs) have made them a crucial building block for the development of next-generation soft robotic systems and associated cognitive robotic controllers. In this study we desired to determine whether combining video data analysis techniques with machine learning techniques could develop a computer vision based predictive system to accurately predict force generated by the movement of a SMA body that is capable of a multi-point actuation performance. We identified that rapid video capture of the bending movements of a SMA body while undergoing external electrical excitements and adapting that characterisation using computer vision approach into a machine learning model, can accurately predict the amount of actuation force generated by the body. This is a fundamental area for achieving a superior control of the actuation of SMA bodies. We demonstrate that a supervised machine learning framework trained with Restricted Boltzmann Machine (RBM) inspired features extracted from 45000 digital thermal infrared video frames captured during excitement of various SMA shapes, is capable to estimate and predict force and stress with 93% global accuracy with very low false negatives and high level of predictive generalisation.

Download Full-text

HIERARCHICAL-INTERPOLATIVE FUZZY SYSTEM CONSTRUCTION BY GENETIC AND BACTERIAL MEMETIC PROGRAMMING APPROACHES

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s021848851240017x ◽

2012 ◽

Vol 20 (supp02) ◽

pp. 105-131 ◽

Cited By ~ 6

Author(s):

KRISZTIÁN BALÁZS ◽

LÁSZLÓ T. KÓCZY

Keyword(s):

Machine Learning ◽

System Modeling ◽

Fuzzy Rule ◽

Knowledge Bases ◽

Learning System ◽

Supervised Machine Learning ◽

Rule Base ◽

System Construction ◽

Rule Bases ◽

Programming Algorithms

In this paper a family of new methods are proposed for constructing hierarchical-interpolative fuzzy rule bases in the frame of a fuzzy rule based supervised machine learning system modeling black box systems defined by input-output pairs. The resulting hierarchical rule base is constructed by using structure building pure evolutionary and memetic techniques, namely, Genetic and Bacterial Programming Algorithms and their memetic variants containing local search steps. Applying hierarchical-interpolative fuzzy rule bases is a rather efficient way of reducing the complexity of knowledge bases, whereas evolutionary methods (including memetic techniques) ensure a relatively fast convergence in the learning process. As it is presented in the paper, by applying a newly proposed representation schema these approaches can be combined to form hierarchical-interpolative machine learning systems.

Download Full-text