scholarly journals Multi-user Feedback for Large-scale Cross-lingual Ontology Matching

Author(s):  
Mamoun Abu Helou ◽  
Matteo Palmonari
2021 ◽  
Vol 2021 ◽  
pp. 1-5
Author(s):  
Hai Zhu ◽  
Jie Zhang ◽  
Xingsi Xue

Sensor ontology models the sensor information and knowledge in a machine-understandable way, which aims at addressing the data heterogeneity problem on the Internet of Things (IoT). However, the existing sensor ontologies are maintained independently for different requirements, which might define the same concept with different terms or context, yielding the heterogeneity issue. Since the complex semantic relationship between the sensor concepts and the large-scale entities is to be dealt with, finding the identical entity correspondences is an error-prone task. To effectively determine the sensor entity correspondences, this work proposes a semisupervised learning-based sensor ontology matching technique. First, we borrow the idea of “centrality” from the social network to construct the training examples; then, we present an evolutionary algorithm- (EA-) based metamatching technique to train the model of aggregating different similarity measures; finally, we use the trained model to match the rest entities. The experiment uses the benchmark as well as three real sensor ontologies to test our proposal’s performance. The experimental results show that our approach is able to determine high-quality sensor entity correspondences in all matching tasks.


2010 ◽  
pp. 1518-1542
Author(s):  
Janina Fengel ◽  
Heiko Paulheim ◽  
Michael Rebstock

Despite the development of e-business standards, the integration of business processes and business information systems is still a non-trivial issue if business partners use different e-business standards for formatting and describing information to be processed. Since those standards can be understood as ontologies, ontological engineering technologies can be applied for processing, especially ontology matching for reconciling them. However, as e-business standards tend to be rather large-scale ontologies, scalability is a crucial requirement. To serve this demand, we present our ORBI Ontology Mediator. It is linked with our Malasco system for partition-based ontology matching with currently available matching systems, which so far do not scale well, if at all. In our case study we show how to provide dynamic semantic synchronization between business partners using different e-business standards without initial ramp-up effort, based on ontological mapping technology combined with interactive user participation.


2020 ◽  
pp. 1-51
Author(s):  
Ivan Vulić ◽  
Simon Baker ◽  
Edoardo Maria Ponti ◽  
Ulla Petti ◽  
Ira Leviant ◽  
...  

We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 crosslingual semantic similarity data sets. Because of its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and crosslingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and crosslingual representation models, including static and contextualized word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised crosslingual word embeddings. We also present a step-by-step data set creation protocol for creating consistent, Multi-Simlex -style resources for additional languages.We make these contributions—the public release of Multi-SimLex data sets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning—available via aWeb site that will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.


IRBM ◽  
2013 ◽  
Vol 34 (1) ◽  
pp. 56-59 ◽  
Author(s):  
M. Ba ◽  
G. Diallo

Author(s):  
Tarek Saier ◽  
Michael Färber ◽  
Tornike Tsereteli

AbstractCitation information in scholarly data is an important source of insight into the reception of publications and the scholarly discourse. Outcomes of citation analyses and the applicability of citation-based machine learning approaches heavily depend on the completeness of such data. One particular shortcoming of scholarly data nowadays is that non-English publications are often not included in data sets, or that language metadata is not available. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on over one million English papers, spanning three scientific disciplines and a time span of three decades. Our investigation covers differences between cited languages and disciplines, trends over time, and the usage characteristics as well as impact of cross-lingual citations. Among our findings are an increasing rate of citations to publications written in Chinese, citations being primarily to local non-English languages, and consistency in citation intent between cross- and monolingual citations. To facilitate further research, we make our collected data and source code publicly available.


Sign in / Sign up

Export Citation Format

Share Document