scholarly journals Matching Large Scale Ontologies Based on Filter and Verification

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Yingxin Li ◽  
Zhou Jianhui ◽  
Jihong Liu ◽  
Yongzhu Hou

Ontology matching is an effective method to realize intercommunication and interoperability between heterogeneous systems. The essence of ontology matching is to discover the similar entity pairs between source ontology and target ontology, which is a process calculating the similarity between entities in ontologies. The similarity can be calculated utilizing various features between entity pairs, such as string similarity, structural similarity, and semantic similarity. The larger the ontology scale, the lower the efficiency and accuracy rate of ontology matching. As the ontology scale increases, the amount of entities in ontologies will be larger and the ontologies will become more heterogeneous. This paper proposes an innovative method of matching large scale ontologies based on filter and verification, which firstly reduces the heterogeneous of large scale ontologies in the filter phase and then matches the reduced ontologies in the verification phase. Large scale ontologies will be partitioned into several subontologies to get a proper scale before matching. The benchmark of Anatomy and Food in OAEI is adopted to evaluate the proposed method, and the experimental result illuminates that the recall rate is improved in the situation of retaining efficiency and accuracy rate using the proposed method.

2021 ◽  
Vol 32 (9) ◽  
pp. 2367-2380
Author(s):  
Guangming Tan ◽  
Chaoyang Shui ◽  
Yinshan Wang ◽  
Xianzhi Yu ◽  
Yujin Yan

2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Margaret E. Stevenson ◽  
Monika Kumpan ◽  
Franz Feichtinger ◽  
Andreas Scheidl ◽  
Alexander Eder ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-5
Author(s):  
Hai Zhu ◽  
Jie Zhang ◽  
Xingsi Xue

Sensor ontology models the sensor information and knowledge in a machine-understandable way, which aims at addressing the data heterogeneity problem on the Internet of Things (IoT). However, the existing sensor ontologies are maintained independently for different requirements, which might define the same concept with different terms or context, yielding the heterogeneity issue. Since the complex semantic relationship between the sensor concepts and the large-scale entities is to be dealt with, finding the identical entity correspondences is an error-prone task. To effectively determine the sensor entity correspondences, this work proposes a semisupervised learning-based sensor ontology matching technique. First, we borrow the idea of “centrality” from the social network to construct the training examples; then, we present an evolutionary algorithm- (EA-) based metamatching technique to train the model of aggregating different similarity measures; finally, we use the trained model to match the rest entities. The experiment uses the benchmark as well as three real sensor ontologies to test our proposal’s performance. The experimental results show that our approach is able to determine high-quality sensor entity correspondences in all matching tasks.


2021 ◽  
Vol 18 (1) ◽  
pp. 34-57
Author(s):  
Weifeng Pan ◽  
Xinxin Xu ◽  
Hua Ming ◽  
Carl K. Chang

Mashup technology has become a promising way to develop and deliver applications on the web. Automatically organizing Mashups into functionally similar clusters helps improve the performance of Mashup discovery. Although there are many approaches aiming to cluster Mashups, they solely focus on utilizing semantic similarities to guide the Mashup clustering process and are unable to utilize both the structural and semantic information in Mashup profiles. In this paper, a novel approach to cluster Mashups into groups is proposed, which integrates structural similarity and semantic similarity using fuzzy AHP (fuzzy analytic hierarchy process). The structural similarity is computed from usage histories between Mashups and Web APIs using SimRank algorithm. The semantic similarity is computed from the descriptions and tags of Mashups using LDA (latent dirichlet allocation). A clustering algorithm based on the genetic algorithm is employed to cluster Mashups. Comprehensive experiments are performed on a real data set collected from ProgrammableWeb. The results show the effectiveness of the approach when compared with two kinds of conventional approaches.


2021 ◽  
Vol 25 (5) ◽  
pp. 1153-1168
Author(s):  
Bentian Li ◽  
Dechang Pi ◽  
Yunxia Lin ◽  
Izhar Ahmed Khan

Biological network classification is an eminently challenging task in the domain of data mining since the networks contain complex structural information. Conventional biochemical experimental methods and the existing intelligent algorithms still suffer from some limitations such as immense experimental cost and inferior accuracy rate. To solve these problems, in this paper, we propose a novel framework for Biological graph classification named Biogc, which is specifically developed to predict the label of both small-scale and large-scale biological network data flexibly and efficiently. Our framework firstly presents a simplified graph kernel method to capture the structural information of each graph. Then, the obtained informative features are adopted to train different scale biological network data-oriented classifiers to construct the prediction model. Extensive experiments on five benchmark biological network datasets on graph classification task show that the proposed model Biogc outperforms the state-of-the-art methods with an accuracy rate of 98.90% on a larger dataset and 99.32% on a smaller dataset.


2021 ◽  
Author(s):  
Darel Emmot ◽  
Ryan Menhusen ◽  
Daniel Dauwe ◽  
Vipin Kumar Kukkala ◽  
Kirk Bresniker

2010 ◽  
pp. 1518-1542
Author(s):  
Janina Fengel ◽  
Heiko Paulheim ◽  
Michael Rebstock

Despite the development of e-business standards, the integration of business processes and business information systems is still a non-trivial issue if business partners use different e-business standards for formatting and describing information to be processed. Since those standards can be understood as ontologies, ontological engineering technologies can be applied for processing, especially ontology matching for reconciling them. However, as e-business standards tend to be rather large-scale ontologies, scalability is a crucial requirement. To serve this demand, we present our ORBI Ontology Mediator. It is linked with our Malasco system for partition-based ontology matching with currently available matching systems, which so far do not scale well, if at all. In our case study we show how to provide dynamic semantic synchronization between business partners using different e-business standards without initial ramp-up effort, based on ontological mapping technology combined with interactive user participation.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Chunbo Liu ◽  
Lanlan Pan ◽  
Zhaojun Gu ◽  
Jialiang Wang ◽  
Yitong Ren ◽  
...  

System logs can record the system status and important events during system operation in detail. Detecting anomalies in the system logs is a common method for modern large-scale distributed systems. Yet threshold-based classification models used for anomaly detection output only two values: normal or abnormal, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is adopted to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the probability distribution of labels for a set of samples and provide a quality assessment of predictive labels to some extent. Two Venn-Abers predictors LR-VA and SVM-VA have been implemented based on Logistic Regression and Support Vector Machine, respectively. Then, the differences among different algorithms are considered so as to build a multimodel fusion algorithm by Stacking. And then a Venn-Abers predictor based on the Stacking algorithm called Stacking-VA is implemented. The performances of four types of algorithms (unimodel, Venn-Abers predictor based on unimodel, multimodel, and Venn-Abers predictor based on multimodel) are compared in terms of validity and accuracy. Experiments are carried out on a log dataset of the Hadoop Distributed File System (HDFS). For the comparative experiments on unimodels, the results show that the validities of LR-VA and SVM-VA are better than those of the two corresponding underlying models. Compared with the underlying model, the accuracy of the SVM-VA predictor is better than that of LR-VA predictor, and more significantly, the recall rate increases from 81% to 94%. In the case of experiments on multiple models, the algorithm based on Stacking multimodel fusion is significantly superior to the underlying classifier. The average accuracy of Stacking-VA is larger than 0.95, which is more stable than the prediction results of LR-VA and SVM-VA. Experimental results show that the Venn-Abers predictor is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.


2020 ◽  
Vol 10 (7) ◽  
pp. 2634
Author(s):  
JunWeon Yoon ◽  
TaeYoung Hong ◽  
ChanYeol Park ◽  
Seo-Young Noh ◽  
HeonChang Yu

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.


Sign in / Sign up

Export Citation Format

Share Document