scholarly journals Fine-Grained Cross-Modal Retrieval for Cultural Items with Focal Attention and Hierarchical Encodings

Computers ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 105
Author(s):  
Shurong Sheng ◽  
Katrien Laenen ◽  
Luc Van Gool ◽  
Marie-Francine Moens

In this paper, we target the tasks of fine-grained image–text alignment and cross-modal retrieval in the cultural heritage domain as follows: (1) given an image fragment of an artwork, we retrieve the noun phrases that describe it; (2) given a noun phrase artifact attribute, we retrieve the corresponding image fragment it specifies. To this end, we propose a weakly supervised alignment model where the correspondence between the input training visual and textual fragments is not known but their corresponding units that refer to the same artwork are treated as a positive pair. The model exploits the latent alignment between fragments across modalities using attention mechanisms by first projecting them into a shared common semantic space; the model is then trained by increasing the image–text similarity of the positive pair in the common space. During this process, we encode the inputs of our model with hierarchical encodings and remove irrelevant fragments with different indicator functions. We also study techniques to augment the limited training data with synthetic relevant textual fragments and transformed image fragments. The model is later fine-tuned by a limited set of small-scale image–text fragment pairs. We rank the test image fragments and noun phrases by their intermodal similarity in the learned common space. Extensive experiments demonstrate that our proposed models outperform two state-of-the-art methods adapted to fine-grained cross-modal retrieval of cultural items for two benchmark datasets.

Author(s):  
Gilles Jacobs ◽  
Véronique Hoste

AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged $$F_1$$ F 1 -score of $$59\%$$ 59 % validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.


2018 ◽  
Author(s):  
Roman Zubatyuk ◽  
Justin S. Smith ◽  
Jerzy Leszczynski ◽  
Olexandr Isayev

<p>Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.</p>


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xin Mao ◽  
Jun Kang Chow ◽  
Pin Siang Tan ◽  
Kuan-fu Liu ◽  
Jimmy Wu ◽  
...  

AbstractAutomatic bird detection in ornithological analyses is limited by the accuracy of existing models, due to the lack of training data and the difficulties in extracting the fine-grained features required to distinguish bird species. Here we apply the domain randomization strategy to enhance the accuracy of the deep learning models in bird detection. Trained with virtual birds of sufficient variations in different environments, the model tends to focus on the fine-grained features of birds and achieves higher accuracies. Based on the 100 terabytes of 2-month continuous monitoring data of egrets, our results cover the findings using conventional manual observations, e.g., vertical stratification of egrets according to body size, and also open up opportunities of long-term bird surveys requiring intensive monitoring that is impractical using conventional methods, e.g., the weather influences on egrets, and the relationship of the migration schedules between the great egrets and little egrets.


2021 ◽  
Vol 807 ◽  
pp. 140873
Author(s):  
F. Khodabakhshi ◽  
A.P. Gerlich ◽  
D. Verma ◽  
M. Nosko ◽  
M. Haghshenas

2018 ◽  
Author(s):  
Nicholas J. Roberts ◽  
Bernhard T. Rabus ◽  
John J. Clague ◽  
Reginald L. Hermanns ◽  
Marco-Antonio Guzmán ◽  
...  

Abstract. We characterize and compare creep preceding and following the 2011 Pampahasi landslide (∼ 40 Mm3 ± 50 %) in the city of La Paz, Bolivia, using spaceborne RADAR interferometry (InSAR) that combines displacement records from both distributed and point scatterers. The failure remobilised deposits of an ancient landslide in weakly cemented, predominantly fine-grained sediments and affected ∼ 1.5 km2 of suburban development. During the 30 months preceding failure, about half of the toe area was creeping at 3–8 cm/a and localized parts of the scarp area showed displacements of up to 14 cm/a. Changes in deformation in the 10 months following the landslide are contrary to the common assumption that stress released during a discrete failure increases stability. During that period, most of the landslide toe and areas near the headscarp accelerated, respectively, to 4–14 and 14 cm/a. The extent of deformation increased to cover most, or probably all, of the 2011 landslide as well as adjacent parts of the slope and plateau above. The InSAR-measured displacement patterns – supplemented by field observations and by optical satellite images – indicate that kinematically complex, steady-state creep along pre-existing sliding surfaces temporarily accelerated in response to heavy rainfall, after which the slope quickly achieved a slightly faster and expanded steadily creeping state. This case study demonstrates that high-quality ground-surface motion fields derived using spaceborne InSAR can help to characterize creep mechanisms, quantify spatial and temporal patterns of slope activity, and identify isolated small-scale instabilities. Characterizing slope instability before, during, and after the 2011 Pampahasi landslide is particularly important for understanding landslide hazard in La Paz, half of which is underlain by similar, large paleolandslides.


Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 830
Author(s):  
Seokho Kang

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.


2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-20
Author(s):  
Vanh Khuyen Nguyen ◽  
Wei Emma Zhang ◽  
Adnan Mahmood

Intrusive Load Monitoring (ILM) is a method to measure and collect the energy consumption data of individual appliances via smart plugs or smart sockets. A major challenge of ILM is automatic appliance identification, in which the system is able to determine automatically a label of the active appliance connected to the smart device. Existing ILM techniques depend on labels input by end-users and are usually under the supervised learning scheme. However, in reality, end-users labeling is laboriously rendering insufficient training data to fit the supervised learning models. In this work, we propose a semi-supervised learning (SSL) method that leverages rich signals from the unlabeled dataset and jointly learns the classification loss for the labeled dataset and the consistency training loss for unlabeled dataset. The samples fit into consistency learning are generated by a transformation that is built upon weighted versions of DTW Barycenter Averaging algorithm. The work is inspired by two recent advanced works in SSL in computer vision and combines the advantages of the two. We evaluate our method on the dataset collected from our developed Internet-of-Things based energy monitoring system in a smart home environment. We also examine the method’s performances on 10 benchmark datasets. As a result, the proposed method outperforms other methods on our smart appliance datasets and most of the benchmarks datasets, while it shows competitive results on the rest datasets.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Yikui Zhai ◽  
He Cao ◽  
Wenbo Deng ◽  
Junying Gan ◽  
Vincenzo Piuri ◽  
...  

Because of the lack of discriminative face representations and scarcity of labeled training data, facial beauty prediction (FBP), which aims at assessing facial attractiveness automatically, has become a challenging pattern recognition problem. Inspired by recent promising work on fine-grained image classification using the multiscale architecture to extend the diversity of deep features, BeautyNet for unconstrained facial beauty prediction is proposed in this paper. Firstly, a multiscale network is adopted to improve the discriminative of face features. Secondly, to alleviate the computational burden of the multiscale architecture, MFM (max-feature-map) is utilized as an activation function which can not only lighten the network and speed network convergence but also benefit the performance. Finally, transfer learning strategy is introduced here to mitigate the overfitting phenomenon which is caused by the scarcity of labeled facial beauty samples and improves the proposed BeautyNet’s performance. Extensive experiments performed on LSFBD demonstrate that the proposed scheme outperforms the state-of-the-art methods, which can achieve 67.48% classification accuracy.


Author(s):  
Peilian Zhao ◽  
Cunli Mao ◽  
Zhengtao Yu

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.


Sign in / Sign up

Export Citation Format

Share Document