Voice And Stream: Perceptual And Computational Modeling Of Voice Separation

2008 ◽  
Vol 26 (1) ◽  
pp. 75-94 ◽  
Author(s):  
Emilios Cambouropoulos

LISTENERS ARE THOUGHT TO BE CAPABLE of perceiving multiple voices in music. This paper presents different views of what 'voice' means and how the problem of voice separation can be systematically described, with a view to understanding the problem better and developing a systematic description of the cognitive task of segregating voices in music. Well-established perceptual principles of auditory streaming are examined and then tailored to the more specific problem of voice separation in timbrally undifferentiated music. Adopting a perceptual view of musical voice, a computational prototype is developed that splits a musical score (symbolic musical data) into different voices. A single 'voice' may consist of one or more synchronous notes that are perceived as belonging to the same auditory stream. The proposed model is tested against a small dataset that acts as ground truth. The results support the theoretical viewpoint adopted in the paper.

Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


2021 ◽  
Vol 6 (1) ◽  
pp. e000898
Author(s):  
Andrea Peroni ◽  
Anna Paviotti ◽  
Mauro Campigotto ◽  
Luis Abegão Pinto ◽  
Carlo Alberto Cutolo ◽  
...  

ObjectiveTo develop and test a deep learning (DL) model for semantic segmentation of anatomical layers of the anterior chamber angle (ACA) in digital gonio-photographs.Methods and analysisWe used a pilot dataset of 274 ACA sector images, annotated by expert ophthalmologists to delineate five anatomical layers: iris root, ciliary body band, scleral spur, trabecular meshwork and cornea. Narrow depth-of-field and peripheral vignetting prevented clinicians from annotating part of each image with sufficient confidence, introducing a degree of subjectivity and features correlation in the ground truth. To overcome these limitations, we present a DL model, designed and trained to perform two tasks simultaneously: (1) maximise the segmentation accuracy within the annotated region of each frame and (2) identify a region of interest (ROI) based on local image informativeness. Moreover, our calibrated model provides results interpretability returning pixel-wise classification uncertainty through Monte Carlo dropout.ResultsThe model was trained and validated in a 5-fold cross-validation experiment on ~90% of available data, achieving ~91% average segmentation accuracy within the annotated part of each ground truth image of the hold-out test set. An appropriate ROI was successfully identified in all test frames. The uncertainty estimation module located correctly inaccuracies and errors of segmentation outputs.ConclusionThe proposed model improves the only previously published work on gonio-photographs segmentation and may be a valid support for the automatic processing of these images to evaluate local tissue morphology. Uncertainty estimation is expected to facilitate acceptance of this system in clinical settings.


2021 ◽  
Vol 263 (2) ◽  
pp. 4441-4445
Author(s):  
Hyunsuk Huh ◽  
Seungchul Lee

Audio data acquired at industrial manufacturing sites often include unexpected background noise. Since the performance of data-driven models can be worse by background noise. Therefore, it is important to get rid of unwanted background noise. There are two main techniques for noise canceling in a traditional manner. One is Active Noise Canceling (ANC), which generates an inverted phase of the sound that we want to remove. The other is Passive Noise Canceling (PNC), which physically blocks the noise. However, these methods require large device size and expensive cost. Thus, we propose a deep learning-based noise canceling method. This technique was developed using audio imaging technique and deep learning segmentation network. However, the proposed model only needs the information on whether the audio contains noise or not. In other words, unlike the general segmentation technique, a pixel-wise ground truth segmentation map is not required for this method. We demonstrate to evaluate the separation using pump sound of MIMII dataset, which is open-source dataset.


Sensors ◽  
2019 ◽  
Vol 19 (22) ◽  
pp. 5000 ◽  
Author(s):  
Zhuangzhuang Zhou ◽  
Qinghua Lu ◽  
Zhifeng Wang ◽  
Haojie Huang

The detection of defects on irregular surfaces with specular reflection characteristics is an important part of the production process of sanitary equipment. Currently, defect detection algorithms for most irregular surfaces rely on the handcrafted extraction of shallow features, and the ability to recognize these defects is limited. To improve the detection accuracy of micro-defects on irregular surfaces in an industrial environment, we propose an improved Faster R-CNN model. Considering the variety of defect shapes and sizes, we selected the K-Means algorithm to generate the aspect ratio of the anchor box according to the size of the ground truth, and the feature matrices are fused with different receptive fields to improve the detection performance of the model. The experimental results show that the recognition accuracy of the improved model is 94.6% on a collected ceramic dataset. Compared with SVM (Support Vector Machine) and other deep learning-based models, the proposed model has better detection performance and robustness to illumination, which proves the practicability and effectiveness of the proposed method.


-The recognition of Indian food can be considered as a fine-grained visual recognition due to the same class photos may provide considerable amount of variability. Thus, an effective segmentation and classification method is needed to provide refined analysis. While only consideration of CNN may cause limitation through the absence of constraints such as shape and edge that causes output of segmentation to be rough on their edges. In order overcome this difficulty, a post-processing step is required; in this paper we proposed an EA based DCNNs model for effective segmentation. The EA is directly formulated with the DCNNs approach, which allows training step to get beneficial from both the approaches for spatial data relationship. The EA will help to get better-refined output after receiving the features from powerful DCNNs. The EA-DCNN training model contains convolution, rectified linear unit and pooling that is much relevant and practical to get optimize segmentation of food image. In order to evaluate the performance of our proposed model we will compare with the ground-truth data at several validation parameters


Author(s):  
Darrell S. Rudmann ◽  
Jason S. McCarley ◽  
Arthur F. Kramer

Attending to a single voice when multiple voices are present is a challenging but common occurrence. An experiment was conducted to determine (a) whether presenting a video display of the target speaker aided speech comprehension in an environment with competing voices, and (b) whether the “ventriloquism effect” could be used to enhance comprehension, as found by Driver (1996), using ecologically valid stimuli. Participants listened for target words from videos of an actress reading while simultaneously ignoring the voices of 2 to 4 different actresses. Target-word detection declined as participants had to ignore more distracting voices; however, this decline was reduced when a video display of the target speaker was provided. Neither a signal-detection analysis of performance data nor a gaze-contingent analysis revealed a ventriloquism effect. Providing a video display of a speaker when competing voices are present improves comprehension, but obtaining the ventriloquism effect appears elusive in naturalistic circumstances. Actual or potential applications of this research include those circumstances in which a listener must filter a relevant stream of speech from among multiple, competing voices, such as air traffic control and military environments.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Maher Ibrahim Sameen ◽  
Biswajeet Pradhan ◽  
Omar Saud Aziz

Classification of aerial photographs relying purely on spectral content is a challenging topic in remote sensing. A convolutional neural network (CNN) was developed to classify aerial photographs into seven land cover classes such as building, grassland, dense vegetation, waterbody, barren land, road, and shadow. The classifier utilized spectral and spatial contents of the data to maximize the accuracy of the classification process. CNN was trained from scratch with manually created ground truth samples. The architecture of the network comprised of a single convolution layer of 32 filters and a kernel size of 3 × 3, pooling size of 2 × 2, batch normalization, dropout, and a dense layer with Softmax activation. The design of the architecture and its hyperparameters were selected via sensitivity analysis and validation accuracy. The results showed that the proposed model could be effective for classifying the aerial photographs. The overall accuracy and Kappa coefficient of the best model were 0.973 and 0.967, respectively. In addition, the sensitivity analysis suggested that the use of dropout and batch normalization technique in CNN is essential to improve the generalization performance of the model. The CNN model without the techniques above achieved the worse performance, with an overall accuracy and Kappa of 0.932 and 0.922, respectively. This research shows that CNN-based models are robust for land cover classification using aerial photographs. However, the architecture and hyperparameters of these models should be carefully selected and optimized.


2020 ◽  
Vol 21 (S1) ◽  
Author(s):  
Dina Abdelhafiz ◽  
Jinbo Bi ◽  
Reda Ammar ◽  
Clifford Yang ◽  
Sheida Nabavi

Abstract Background Automatic segmentation and localization of lesions in mammogram (MG) images are challenging even with employing advanced methods such as deep learning (DL) methods. We developed a new model based on the architecture of the semantic segmentation U-Net model to precisely segment mass lesions in MG images. The proposed end-to-end convolutional neural network (CNN) based model extracts contextual information by combining low-level and high-level features. We trained the proposed model using huge publicly available databases, (CBIS-DDSM, BCDR-01, and INbreast), and a private database from the University of Connecticut Health Center (UCHC). Results We compared the performance of the proposed model with those of the state-of-the-art DL models including the fully convolutional network (FCN), SegNet, Dilated-Net, original U-Net, and Faster R-CNN models and the conventional region growing (RG) method. The proposed Vanilla U-Net model outperforms the Faster R-CNN model significantly in terms of the runtime and the Intersection over Union metric (IOU). Training with digitized film-based and fully digitized MG images, the proposed Vanilla U-Net model achieves a mean test accuracy of 92.6%. The proposed model achieves a mean Dice coefficient index (DI) of 0.951 and a mean IOU of 0.909 that show how close the output segments are to the corresponding lesions in the ground truth maps. Data augmentation has been very effective in our experiments resulting in an increase in the mean DI and the mean IOU from 0.922 to 0.951 and 0.856 to 0.909, respectively. Conclusions The proposed Vanilla U-Net based model can be used for precise segmentation of masses in MG images. This is because the segmentation process incorporates more multi-scale spatial context, and captures more local and global context to predict a precise pixel-wise segmentation map of an input full MG image. These detected maps can help radiologists in differentiating benign and malignant lesions depend on the lesion shapes. We show that using transfer learning, introducing augmentation, and modifying the architecture of the original model results in better performance in terms of the mean accuracy, the mean DI, and the mean IOU in detecting mass lesion compared to the other DL and the conventional models.


Sensors ◽  
2019 ◽  
Vol 19 (14) ◽  
pp. 3164 ◽  
Author(s):  
Mei Gao ◽  
Baosheng Kang ◽  
Xiangchu Feng ◽  
Wei Zhang ◽  
Wenjuan Zhang

Multiplicative speckle noise removal is a challenging task in image processing. Motivated by the performance of anisotropic diffusion in additive noise removal and the structure of the standard deviation of a compressed speckle noisy image, we address this problem with anisotropic diffusion theories. Firstly, an anisotropic diffusion model based on image statistics, including information on the gradient of the image, gray levels, and noise standard deviation of the image, is proposed. Although the proposed model can effectively remove multiplicative speckle noise, it does not consider the noise at the edge during the denoising process. Hence, we decompose the divergence term in order to make the diffusion at the edge occur along the boundaries rather than perpendicular to the boundaries, and improve the model to meet our requirements. Secondly, the iteration stopping criteria based on kurtosis and correlation in view of the lack of ground truth in real image experiments, is proposed. The optimal values of the parameters in the model are obtained by learning. To improve the denoising effect, post-processing is performed. Finally, the simulation results show that the proposed model can effectively remove the speckle noise and retain minute details of the images for the real ultrasound and RGB color images.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1047
Author(s):  
Hawazin Faiz Badawi ◽  
Fedwa Laamarti ◽  
Abdulmotaleb El Saddik

Digital twins (DTs) technology has recently gained attention within the research community due to its potential to help build sustainable smart cities. However, there is a gap in the literature: currently no unified model for city services has been proposed that can guarantee interoperability across cities, capture each city’s unique characteristics, and act as a base for modeling digital twins. This research aims to fill that gap. In this work, we propose the DT-DNA model in which we design a city services digital twin, with the goal of reflecting the real state of development of a city’s services towards enhancing its citizens’ quality of life (QoL). As it was designed using ISO 37120, one of the leading international standards for city services, the model guarantees interoperability and allows for easy comparison of services within and across cities. In order to test our model, we built DT-DNA sequences of services in both Quebec City and Boston and then used a DNA alignment tool to determine the matching percentage between them. Results show that the DT-DNA sequences of services in both cities are 46.5% identical. Ground truth comparisons show a similar result, which provides a preliminary proof-of-concept for the applicability of the proposed model and framework. These results also imply that one city performs better than the other. Therefore, we propose an algorithm to compare cities based on the proposed DT-DNA and, using Boston and Quebec City as a case study, demonstrate that Boston has better services towards enhancing QoL for its citizens.


Sign in / Sign up

Export Citation Format

Share Document