Mapping of ImageNet and Wikidata for Knowledge Graphs Enabled Computer Vision

Knowledge graphs are used as a source of prior knowledge in numerous computer vision tasks. However, such an approach requires to have a mapping between ground truth data labels and the target knowledge graph. We linked the ILSVRC 2012 dataset (often simply referred to as ImageNet) labels to Wikidata entities. This enables using rich knowledge graph structure and contextual information for several computer vision tasks, traditionally benchmarked with ImageNet and its variations. For instance, in few-shot learning classification scenarios with neural networks, this mapping can be leveraged for weight initialisation, which can improve the final performance metrics value. We mapped all 1000 ImageNet labels – 461 were already directly linked with the exact match property (P2888), 467 have exact match candidates, and 72 cannot be matched directly. For these 72 labels, we discuss different problem categories stemming from the inability of finding an exact match. Semantically close non-exact match candidates are presented as well. The mapping is publicly available athttps://github.com/DominikFilipiak/imagenet-to-wikidata-mapping.

Download Full-text

ContextNet: representation and exploration for painting classification and retrieval in context

International Journal of Multimedia Information Retrieval ◽

10.1007/s13735-019-00189-4 ◽

2019 ◽

Vol 9 (1) ◽

pp. 17-30

Author(s):

Noa Garcia ◽

Benjamin Renoust ◽

Yuta Nakashima

Keyword(s):

Contextual Information ◽

Visual Representations ◽

Fine Art ◽

Multitask Learning ◽

Knowledge Graph ◽

Multiple Sources ◽

Visual Elements ◽

Knowledge Graphs ◽

Art Analysis ◽

Analysis Models

AbstractIn automatic art analysis, models that besides the visual elements of an artwork represent the relationships between the different artistic attributes could be very informative. Those kinds of relationships, however, usually appear in a very subtle way, being extremely difficult to detect with standard convolutional neural networks. In this work, we propose to capture contextual artistic information from fine-art paintings with a specific ContextNet network. As context can be obtained from multiple sources, we explore two modalities of ContextNets: one based on multitask learning and another one based on knowledge graphs. Once the contextual information is obtained, we use it to enhance visual representations computed with a neural network. In this way, we are able to (1) capture information about the content and the style with the visual representations and (2) encode relationships between different artistic attributes with the ContextNet. We evaluate our models on both painting classification and retrieval, and by visualising the resulting embeddings on a knowledge graph, we can confirm that our models represent specific stylistic aspects present in the data.

Download Full-text

Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification

Remote Sensing ◽

10.3390/rs13132619 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2619

Author(s):

Joao Fonseca ◽

Georgios Douzas ◽

Fernando Bacao

Keyword(s):

Active Learning ◽

Performance Metrics ◽

User Interaction ◽

Ground Truth ◽

Data Generation ◽

Artificial Data ◽

Ground Truth Data ◽

Data Generator ◽

Benchmark Datasets ◽

Time Requirements

In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data ``on-demand'' for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on the optimization of AL by modifying the selection criteria and the classifiers used. Although improvements in these areas will result in more effective data collection, the use of artificial data sources to reduce human--computer interaction remains unexplored. In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL. The implementation of the proposed AL framework is done using Geometric SMOTE as the data generator. We compare the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets. We show that this modification of the AL framework significantly reduces cost and time requirements for a successful AL implementation in all of the datasets used in the experiment.

Download Full-text

CIDOC2VEC: Extracting Information from Atomized CIDOC-CRM Humanities Knowledge Graphs

Information ◽

10.3390/info12120503 ◽

2021 ◽

Vol 12 (12) ◽

pp. 503

Author(s):

Hassan El-Hajj ◽

Matteo Valleriani

Keyword(s):

Data Structure ◽

Digital Humanities ◽

Text Processing ◽

Relevant Information ◽

Knowledge Graph ◽

Graph Structure ◽

Ontology Model ◽

Cidoc Crm ◽

Knowledge Graphs ◽

Use Of Knowledge

The development of the field of digital humanities in recent years has led to the increased use of knowledge graphs within the community. Many digital humanities projects tend to model their data based on CIDOC-CRM ontology, which offers a wide array of classes appropriate for storing humanities and cultural heritage data. The CIDOC-CRM ontology model leads to a knowledge graph structure in which many entities are often linked to each other through chains of relations, which means that relevant information often lies many hops away from their entities. In this paper, we present a method based on graph walks and text processing to extract entity information and provide semantically relevant embeddings. In the process, we were able to generate similarity recommendations as well as explore their underlying data structure. This approach was then demonstrated on the Sphaera Dataset which was modeled according to the CIDOC-CRM data structure.

Download Full-text

Few-Shot Knowledge Graph Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5698 ◽

2020 ◽

Vol 34 (03) ◽

pp. 3041-3048 ◽

Cited By ~ 2

Author(s):

Chuxu Zhang ◽

Huaxiu Yao ◽

Chao Huang ◽

Meng Jiang ◽

Zhenhui Li ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Knowledge Graph ◽

Graph Structure ◽

Reference Set ◽

Knowledge Graphs ◽

Public Datasets ◽

Relation Learning

Knowledge graphs (KGs) serve as useful resources for various natural language processing applications. Previous KG completion approaches require a large number of training instances (i.e., head-tail entity pairs) for every relation. The real case is that for most of the relations, very few entity pairs are available. Existing work of one-shot learning limits method generalizability for few-shot scenarios and does not fully use the supervisory information; however, few-shot KG completion has not been well studied yet. In this work, we propose a novel few-shot relation learning model (FSRL) that aims at discovering facts of new relations with few-shot references. FSRL can effectively capture knowledge from heterogeneous graph structure, aggregate representations of few-shot references, and match similar entity pairs of reference set for every relation. Extensive experiments on two public datasets demonstrate that FSRL outperforms the state-of-the-art.

Download Full-text

Monitoring the Recovery after 2016 Hurricane Matthew in Haiti via Markovian Multitemporal Region-Based Modeling

Remote Sensing ◽

10.3390/rs13173509 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3509

Author(s):

Andrea De Giorgi ◽

David Solarna ◽

Gabriele Moser ◽

Deodato Tapete ◽

Francesca Cigna ◽

...

Keyword(s):

Change Detection ◽

Optical Sensors ◽

Spatial Information ◽

Contextual Information ◽

Temporal Correlation ◽

Ground Truth ◽

Recovery Phase ◽

Geometrical Structures ◽

Ground Truth Data ◽

Optical Images

The aim of this paper is to address the monitoring of the recovery phase in the aftermath of Hurricane Matthew (28 September–10 October 2016) in the town of Jérémie, southwestern Haiti. This is accomplished via a novel change detection method that has been formulated, in a data fusion perspective, in terms of multitemporal supervised classification. The availability of very high resolution images provided by last-generation satellite synthetic aperture radar (SAR) and optical sensors makes this analysis promising from an application perspective and simultaneously challenging from a processing viewpoint. Indeed, pursuing such a goal requires the development of novel methodologies able to exploit the large amount of detailed information provided by this type of data. To take advantage of the temporal and spatial information associated with such images, the proposed method integrates multisensor, multisource, and contextual information. Markov random field modeling is adopted here to integrate the spatial context and the temporal correlation associated with images acquired at different dates. Moreover, the adoption of a region-based approach allows for the characterization of the geometrical structures in the images through multiple segmentation maps at different scales and times. The performances of the proposed approach are evaluated on multisensor pairs of COSMO-SkyMed SAR and Pléiades optical images acquired over Jérémie, in the aftermath of and during the three years after Hurricane Matthew. The effectiveness of the change detection results is analyzed both quantitatively, through the computation of accuracy measures on a test set, and qualitatively, by visual inspection of the classification maps. The robustness of the proposed method with respect to different algorithmic choices is also assessed, and the detected changes are discussed in relation to the recovery endeavors in the area and ground-truth data collected in the field in April 2019.

Download Full-text

Knowledge Graphs Representation for Event-Related E-News Articles

Machine Learning and Knowledge Extraction ◽

10.3390/make3040040 ◽

2021 ◽

Vol 3 (4) ◽

pp. 802-818

Author(s):

M.V.P.T. Lakshika ◽

H.A. Caldera

Keyword(s):

Knowledge Representation ◽

Contextual Information ◽

Background Information ◽

Knowledge Graph ◽

Learning Approaches ◽

Text Documents ◽

Precise Knowledge ◽

Knowledge Graphs ◽

News Corpus ◽

Constructing Knowledge

E-newspaper readers are overloaded with massive texts on e-news articles, and they usually mislead the reader who reads and understands information. Thus, there is an urgent need for a technology that can automatically represent the gist of these e-news articles more quickly. Currently, popular machine learning approaches have greatly improved presentation accuracy compared to traditional methods, but they cannot be accommodated with the contextual information to acquire higher-level abstraction. Recent research efforts in knowledge representation using graph approaches are neither user-driven nor flexible to deviations in the data. Thus, there is a striking concentration on constructing knowledge graphs by combining the background information related to the subjects in text documents. We propose an enhanced representation of a scalable knowledge graph by automatically extracting the information from the corpus of e-news articles and determine whether a knowledge graph can be used as an efficient application in analyzing and generating knowledge representation from the extracted e-news corpus. This knowledge graph consists of a knowledge base built using triples that automatically produce knowledge representation from e-news articles. Inclusively, it has been observed that the proposed knowledge graph generates a comprehensive and precise knowledge representation for the corpus of e-news articles.

Download Full-text

Clasificación de usos y cubiertas del suelo y análisis de cambios en los alrededores de la Reserva Ecológica Manglares Churute (Ecuador) mediante una serie de imágenes Sentinel-1

Revista de Teledetección ◽

10.4995/raet.2020.14099 ◽

2020 ◽

pp. 131

Author(s):

D.A. Vélez-Alvarado ◽

J. Álvarez-Mozos

Keyword(s):

Land Use ◽

Land Cover ◽

Management Practices ◽

Performance Metrics ◽

Ground Truth ◽

Training Dataset ◽

Land Use Land Cover ◽

Change Analysis ◽

Ground Truth Data ◽

Buffer Area

<p class="p1">Management practices adopted in protected natural areas often ignore the relevance of the territory surrounding the actual protected land (buffer area). These areas can be the source of impacts that threaten the protected ecosystems. This paper reports a case study where a time series of Sentinel-1 imagery was used to classify the land-use/land-cover and to evaluate its change between 2015 and 2018 in the buffer area around the Manglares Churute Ecological Reserve (REMCh) in Ecuador. Sentinel-1 scenes were processed and ground-truth data were collected consisting of samples of the main land-use/land-cover classes in the region. Then, a Random Forests (RF) classification algorithm was built and optimized, following a five-fold cross validation scheme using the training dataset (70% of the ground truth). The remaining 30% was used for validation, achieving an Overall Accuracy of 84%, a Kappa coefficient of 0.8 and successful class performance metrics for the main crops and land use classes. Results were poorer for heterogeneous and minor classes, nevertheless the performance of the classification was deemed sufficient for the targeted change analysis. Between 2015 and 2018, an increase in the area covered by intensive land uses was evidenced, such as shrimp farms and sugarcane, which replaced traditional crops (mainly rice and banana). Even though such changes only affected the land area around the natural reserve, they might affect its water quality due to the use of fertilizers and pesticides that easily. Therefore, it is recommended that these buffer areas around natural protected areas be taken into account when designing adequate environmental protection measures and polices.</p>

Download Full-text

NILMPEds: A Performance Evaluation Dataset for Event Detection Algorithms in Non-Intrusive Load Monitoring

Data ◽

10.3390/data4030127 ◽

2019 ◽

Vol 4 (3) ◽

pp. 127 ◽

Cited By ~ 2

Author(s):

Lucas Pereira

Keyword(s):

Performance Evaluation ◽

Event Detection ◽

Performance Metrics ◽

Ground Truth ◽

Ground Truth Data ◽

Individual Model ◽

Detection Algorithms ◽

Initial Release ◽

Load Monitoring ◽

Evaluation Dataset

Datasets are important for researchers to build models and test how these perform, as well as to reproduce research experiments from others. This data paper presents the NILM Performance Evaluation dataset (NILMPEds), which is aimed primarily at research reproducibility in the field of Non-intrusive load monitoring. This initial release of NILMPEds is dedicated to event detection algorithms and is comprised of ground-truth data for four test datasets, the specification of 47,950 event detection models, the power events returned by each model in the four test datasets, and the performance of each individual model according to 31 performance metrics.

Download Full-text

Expanding a Database-derived Biomedical Knowledge Graph via Multi-relation Extraction from Biomedical Abstracts

10.1101/730085 ◽

2019 ◽

Author(s):

David N. Nicholson ◽

Daniel S. Himmelstein ◽

Casey S. Greene

Keyword(s):

Contextual Information ◽

Relation Extraction ◽

Publication Rate ◽

Knowledge Graph ◽

Biomedical Knowledge ◽

Text Annotation ◽

Label Function ◽

Manual Curation ◽

Function Combination ◽

Knowledge Graphs

AbstractKnowledge graphs support multiple research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via some form of manual curation, which is difficult to scale in the context of an increasing publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to automatically annotate textual data. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This makes populating a knowledge graph with multiple nodes and edge types practically infeasible. We sought to accelerate the label function creation process by evaluating the extent to which label functions could be re-used across multiple edge types. We used a subset of an existing knowledge graph centered on disease, compound, and gene entities to evaluate label function re-use. We determined the best label function combination by comparing a baseline database-only model with the same model but added edge-specific or edge-mismatch label functions. We confirmed that adding additional edge-specific rather than edge-mismatch label functions often improves text annotation and shows that this approach can incorporate novel edges into our source knowledge graph. We expect that continued development of this strategy has the potential to swiftly populate knowledge graphs with new discoveries, ensuring that these resources include cutting-edge results.

Download Full-text

Assessing taxonomic metagenome profilers with OPAL

10.1101/372680 ◽

2018 ◽

Cited By ~ 1

Author(s):

Fernando Meyer ◽

Andreas Bremges ◽

Peter Belmann ◽

Stefan Janssen ◽

Alice C. McHardy ◽

...

Keyword(s):

Relative Abundance ◽

Software Package ◽

Method Development ◽

Performance Metrics ◽

Ground Truth ◽

Performance Criteria ◽

Computational Techniques ◽

Performance Measurements ◽

Ground Truth Data ◽

Taxonomic Profiling

AbstractTaxonomic metagenome profilers predict the presence and relative abundance of microorganisms from shotgun sequence samples of DNA isolated directly from a microbial community. Over the past years, there has been an explosive growth of software and algorithms for this task, resulting in a need for more systematic comparisons of these methods based on relevant performance criteria. Here, we present OPAL, a software package implementing commonly used performance metrics, including those of the first challenge of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI), together with convenient visualizations. In addition, OPAL implements diversity metrics from microbial ecology, as well as run time and memory efficiency measurements. By allowing users to customize the relative importance of metrics, OPAL facilitates in-depth performance comparisons, as well as the development of new methods and data analysis workflows. To demonstrate the application, we compared seven profilers on benchmark datasets of the first and second CAMI challenges using all metrics and performance measurements available in OPAL. The software is implemented in Python 3 and available under the Apache 2.0 license on GitHub (https://github.com/CAMI-challenge/OPAL).Author summaryThere are many computational approaches for inferring the presence and relative abundance of taxa (i.e. taxonomic profiling) from shotgun metagenome samples of microbial communities, making systematic performance evaluations a very important task. However, there has yet to be introduced a computational framework in which profiler performances can be compared. This delays method development and applied studies, as researchers need to implement their own custom evaluation frameworks. Here, we present OPAL, a software package that facilitates standardized comparisons of taxonomic metagenome profilers. It implements a variety of performance metrics frequently employed in microbiome research, including runtime and memory usage, and generates comparison reports and visualizations. OPAL thus facilitates and accelerates benchmarking of taxonomic profiling techniques on ground truth data. This enables researchers to arrive at informed decisions about which computational techniques to use for specific datasets and research questions.

Download Full-text