scholarly journals Learning Neural Audio Embeddings for Grounding Semantics in Auditory Perception

2017 ◽  
Vol 60 ◽  
pp. 1003-1030 ◽  
Author(s):  
Douwe Kiela ◽  
Stephen Clark

Multi-modal semantics, which aims to ground semantic representations in perception, has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics. After having shown the quality of such auditorily grounded representations, we show how they can be applied to tasks where auditory perception is relevant, including two unsupervised categorization experiments, and provide further analysis. We find that features transfered from deep neural networks outperform bag of audio words approaches. To our knowledge, this is the first work to construct multi-modal models from a combination of textual information and auditory information extracted from deep neural networks, and the first work to evaluate the performance of tri-modal (textual, visual and auditory) semantic models.

Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 151
Author(s):  
Xintao Duan ◽  
Lei Li ◽  
Yao Su ◽  
Wenxin Wang ◽  
En Zhang ◽  
...  

Data hiding is the technique of embedding data into video or audio media. With the development of deep neural networks (DNN), the quality of images generated by novel data hiding methods based on DNN is getting better. However, there is still room for the similarity between the original images and the images generated by the DNN models which were trained based on the existing hiding frameworks to improve, and it is hard for the receiver to distinguish whether the container image is from the real sender. We propose a framework by introducing a key_img for using the over-fitting characteristic of DNN and combined with difference image grafting symmetrically, named difference image grafting deep hiding (DIGDH). The key_img can be used to identify whether the container image is from the real sender easily. The experimental results show that without changing the structures of networks, the models trained based on the proposed framework can generate images with higher similarity to original cover and secret images. According to the analysis results of the steganalysis tool named StegExpose, the container images generated by the hiding model trained based on the proposed framework is closer to the random distribution.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Fuyong Xing ◽  
Yuanpu Xie ◽  
Xiaoshuang Shi ◽  
Pingjun Chen ◽  
Zizhao Zhang ◽  
...  

Abstract Background Nucleus or cell detection is a fundamental task in microscopy image analysis and supports many other quantitative studies such as object counting, segmentation, tracking, etc. Deep neural networks are emerging as a powerful tool for biomedical image computing; in particular, convolutional neural networks have been widely applied to nucleus/cell detection in microscopy images. However, almost all models are tailored for specific datasets and their applicability to other microscopy image data remains unknown. Some existing studies casually learn and evaluate deep neural networks on multiple microscopy datasets, but there are still several critical, open questions to be addressed. Results We analyze the applicability of deep models specifically for nucleus detection across a wide variety of microscopy image data. More specifically, we present a fully convolutional network-based regression model and extensively evaluate it on large-scale digital pathology and microscopy image datasets, which consist of 23 organs (or cancer diseases) and come from multiple institutions. We demonstrate that for a specific target dataset, training with images from the same types of organs might be usually necessary for nucleus detection. Although the images can be visually similar due to the same staining technique and imaging protocol, deep models learned with images from different organs might not deliver desirable results and would require model fine-tuning to be on a par with those trained with target data. We also observe that training with a mixture of target and other/non-target data does not always mean a higher accuracy of nucleus detection, and it might require proper data manipulation during model training to achieve good performance. Conclusions We conduct a systematic case study on deep models for nucleus detection in a wide variety of microscopy images, aiming to address several important but previously understudied questions. We present and extensively evaluate an end-to-end, pixel-to-pixel fully convolutional regression network and report a few significant findings, some of which might have not been reported in previous studies. The model performance analysis and observations would be helpful to nucleus detection in microscopy images.


Author(s):  
Yun-Peng Liu ◽  
Ning Xu ◽  
Yu Zhang ◽  
Xin Geng

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Shazia Akbar ◽  
Mohammad Peikari ◽  
Sherine Salama ◽  
Azadeh Yazdan Panah ◽  
Sharon Nofech-Mozes ◽  
...  

Abstract The residual cancer burden index is an important quantitative measure used for assessing treatment response following neoadjuvant therapy for breast cancer. It has shown to be predictive of overall survival and is composed of two key metrics: qualitative assessment of lymph nodes and the percentage of invasive or in situ tumour cellularity (TC) in the tumour bed (TB). Currently, TC is assessed through eye-balling of routine histopathology slides estimating the proportion of tumour cells within the TB. With the advances in production of digitized slides and increasing availability of slide scanners in pathology laboratories, there is potential to measure TC using automated algorithms with greater precision and accuracy. We describe two methods for automated TC scoring: 1) a traditional approach to image analysis development whereby we mimic the pathologists’ workflow, and 2) a recent development in artificial intelligence in which features are learned automatically in deep neural networks using image data alone. We show strong agreements between automated and manual analysis of digital slides. Agreements between our trained deep neural networks and experts in this study (0.82) approach the inter-rater agreements between pathologists (0.89). We also reveal properties that are captured when we apply deep neural network to whole slide images, and discuss the potential of using such visualisations to improve upon TC assessment in the future.


2018 ◽  
Vol 15 (9) ◽  
pp. 1451-1455 ◽  
Author(s):  
Grant J. Scott ◽  
Kyle C. Hagan ◽  
Richard A. Marcum ◽  
James Alex Hurt ◽  
Derek T. Anderson ◽  
...  

2021 ◽  
Vol 37 (2) ◽  
pp. 123-143
Author(s):  
Tuan Minh Luu ◽  
Huong Thanh Le ◽  
Tan Minh Hoang

Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.


2021 ◽  
Author(s):  
Jason Munger ◽  
Carlos W. Morato

This project explores how raw image data obtained from AV cameras can provide a model with more spatial information than can be learned from simple RGB images alone. This paper leverages the advances of deep neural networks to demonstrate steering angle predictions of autonomous vehicles through an end-to-end multi-channel CNN model using only the image data provided from an onboard camera. Image data is processed through existing neural networks to provide pixel segmentation and depth estimates and input to a new neural network along with the raw input image to provide enhanced feature signals from the environment. Various input combinations of Multi-Channel CNNs are evaluated, and their effectiveness is compared to single CNN networks using the individual data inputs. The model with the most accurate steering predictions is identified and performance compared to previous neural networks.


2021 ◽  
Author(s):  
Viktória Burkus ◽  
Attila Kárpáti ◽  
László Szécsi

Surface reconstruction for particle-based fluid simulation is a computational challenge on par with the simula- tion itself. In real-time applications, splatting-style rendering approaches based on forward rendering of particle impostors are prevalent, but they suffer from noticeable artifacts. In this paper, we present a technique that combines forward rendering simulated features with deep-learning image manipulation to improve the rendering quality of splatting-style approaches to be perceptually similar to ray tracing solutions, circumventing the cost, complexity, and limitations of exact fluid surface rendering by replacing it with the flat cost of a neural network pass. Our solution is based on the idea of training generative deep neural networks with image pairs consisting of cheap particle impostor renders and ground truth high quality ray-traced images.


2021 ◽  
Vol 15 ◽  
Author(s):  
Xinglong Wu ◽  
Yuhang Tao ◽  
Guangzhi He ◽  
Dun Liu ◽  
Meiling Fan ◽  
...  

Deep convolutional neural networks (DCNNs) are widely utilized for the semantic segmentation of dense nerve tissues from light and electron microscopy (EM) image data; the goal of this technique is to achieve efficient and accurate three-dimensional reconstruction of the vasculature and neural networks in the brain. The success of these tasks heavily depends on the amount, and especially the quality, of the human-annotated labels fed into DCNNs. However, it is often difficult to acquire the gold standard of human-annotated labels for dense nerve tissues; human annotations inevitably contain discrepancies or even errors, which substantially impact the performance of DCNNs. Thus, a novel boosting framework consisting of a DCNN for multilabel semantic segmentation with a customized Dice-logarithmic loss function, a fusion module combining the annotated labels and the corresponding predictions from the DCNN, and a boosting algorithm to sequentially update the sample weights during network training iterations was proposed to systematically improve the quality of the annotated labels; this framework eventually resulted in improved segmentation task performance. The microoptical sectioning tomography (MOST) dataset was then employed to assess the effectiveness of the proposed framework. The result indicated that the framework, even trained with a dataset including some poor-quality human-annotated labels, achieved state-of-the-art performance in the segmentation of somata and vessels in the mouse brain. Thus, the proposed technique of artificial intelligence could advance neuroscience research.


10.29007/p655 ◽  
2018 ◽  
Author(s):  
Sai Prabhakar Pandi Selvaraj ◽  
Manuela Veloso ◽  
Stephanie Rosenthal

Significant advances in the performance of deep neural networks, such as Convolutional Neural Networks (CNNs) for image classification, have created a drive for understanding how they work. Different techniques have been proposed to determine which features (e.g., image pixels) are most important for a CNN’s classification. However, the important features output by these techniques have typically been judged subjectively by a human to assess whether the important features capture the features relevant to the classification and not whether the features were actually important to classifier itself. We address the need for an objective measure to assess the quality of different feature importance measures. In particular, we propose measuring the ratio of a CNN’s accuracy on the whole image com- pared to an image containing only the important features. We also consider scaling this ratio by the relative size of the important region in order to measure the conciseness. We demonstrate that our measures correlate well with prior subjective comparisons of important features, but importantly do not require their human studies. We also demonstrate that the features on which multiple techniques agree are important have a higher impact on accuracy than those features that only one technique finds.


Sign in / Sign up

Export Citation Format

Share Document