scholarly journals Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients

Signals ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 637-661
Author(s):  
Sören Schulze ◽  
Johannes Leuschner ◽  
Emily J. King

We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.

Author(s):  
Sören Schulze ◽  
Emily J. King

AbstractWe propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.


Solid Earth ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1851-1864
Author(s):  
Fabian Limberger ◽  
Michael Lindenfeld ◽  
Hagen Deckert ◽  
Georg Rümpker

Abstract. In this study, we determine spectral characteristics and amplitude decays of wind turbine induced seismic signals in the far field of a wind farm (WF) close to Uettingen, Germany. Average power spectral densities (PSDs) are calculated from 10 min time segments extracted from (up to) 6 months of continuous recordings at 19 seismic stations, positioned along an 8 km profile starting from the WF. We identify seven distinct PSD peaks in the frequency range between 1 and 8 Hz that can be observed to at least 4 km distance; lower-frequency peaks are detectable up to the end of the profile. At distances between 300 m and 4 km the PSD amplitude decay can be described by a power law with exponent b. The measured b values exhibit a linear frequency dependence and range from b=0.39 at 1.14 Hz to b=3.93 at 7.6 Hz. In a second step, the seismic radiation and amplitude decays are modeled using an analytical approach that approximates the surface wave field. Since we observe temporally varying phase differences between seismograms recorded directly at the base of the individual wind turbines (WTs), source signal phase information is included in the modeling approach. We show that phase differences between source signals have significant effects on the seismic radiation pattern and amplitude decays. Therefore, we develop a phase shift elimination method to handle the challenge of choosing representative source characteristics as an input for the modeling. To optimize the fitting of modeled and observed amplitude decay curves, we perform a grid search to constrain the two model parameters, i.e., the seismic shear wave velocity and quality factor. The comparison of modeled and observed amplitude decays for the seven prominent frequencies shows very good agreement and allows the constraint of shear velocities and quality factors for a two-layer model of the subsurface. The approach is generalized to predict amplitude decays and radiation patterns for WFs of arbitrary geometry.


2021 ◽  
Author(s):  
Fabian Limberger ◽  
Michael Lindenfeld ◽  
Hagen Deckert ◽  
Georg Rümpker

Abstract. In this study, we determine spectral characteristics and amplitude decays of wind turbine induced seismic signals in the far field of a wind farm (WF) close to Uettingen/Germany. Average power spectral densities (PSD) are calculated from 10 min time segments extracted from (up to) 6-months of continuous recordings at 19 seismic stations, positioned along an 8 km profile starting from the WF. We identify 7 distinct PSD peaks in the frequency range between 1 Hz and 8 Hz that can be observed to at least 4 km distance; lower-frequency peaks are detectable up to the end of the profile. At distances between 300 m and 4 km the PSD amplitude decay can be described by a power law with exponent b. The measured b-values exhibit a linear frequency dependence and range from b = 0.39 at 1.14 Hz to b = 3.93 at 7.6 Hz. In a second step, the seismic radiation and amplitude decays are modeled using an analytical approach which approximates the surface-wave field. Since we observe temporally varying phase differences between seismograms recorded directly at the base of the individual wind turbines (WTs), source-signal phase information is included in the modeling approach. We show that phase differences between source signals have significant effects on the seismic radiation pattern and amplitude decays. Therefore, we develop a phase-shift-elimination-method to handle the challenge of choosing representative source characteristics as an input for the modeling. To optimize the fitting of modeled and observed amplitude decay curves, we perform a grid search to constrain the two model parameters, i.e., the seismic shear wave velocity and quality factor. The comparison of modeled and observed amplitude decays for the 7 prominent frequencies shows very good agreement and allows to constrain shear velocities and quality factors for a two-layer model of the subsurface. The approach is generalized to predict amplitude decays and radiation pattern for WFs of arbitrary geometry.


Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 1053
Author(s):  
Heekyung Yang ◽  
Kyungha Min

We present a saliency-based patch sampling strategy for recognizing artistic media from artwork images using a deep media recognition model, which is composed of several deep convolutional neural network-based recognition modules. The decisions from the individual modules are merged into the final decision of the model. To sample a suitable patch for the input of the module, we devise a strategy that samples patches with high probabilities of containing distinctive media stroke patterns for artistic media without distortion, as media stroke patterns are key for media recognition. We design this strategy by collecting human-selected ground truth patches and analyzing the distribution of the saliency values of the patches. From this analysis, we build a strategy that samples patches that have a high probability of containing media stroke patterns. We prove that our strategy shows best performance among the existing patch sampling strategies and that our strategy shows a consistent recognition and confusion pattern with the existing strategies.


2018 ◽  
Vol 9 (1) ◽  
pp. 76 ◽  
Author(s):  
Yuancheng Li ◽  
Yimeng Wang

Neural networks are very vulnerable to adversarial examples, which threaten their application in security systems, such as face recognition, and autopilot. In response to this problem, we propose a new defensive strategy. In our strategy, we propose a new deep denoising neural network, which is called UDDN, to remove the noise on adversarial samples. The standard denoiser suffers from the amplification effect, in which the small residual adversarial noise gradually increases and leads to misclassification. The proposed denoiser overcomes this problem by using a special loss function, which is defined as the difference between the model outputs activated by the original image and denoised image. At the same time, we propose a new model training algorithm based on knowledge transfer, which can resist slight image disturbance and make the model generalize better around the training samples. Our proposed defensive strategy is robust against both white-box or black-box attacks. Meanwhile, the strategy is applicable to any deep neural network-based model. In the experiment, we apply the defensive strategy to a face recognition model. The experimental results show that our algorithm can effectively resist adversarial attacks and improve the accuracy of the model.


AI ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 444-463
Author(s):  
Daniel Weber ◽  
Clemens Gühmann ◽  
Thomas Seel

Inertial-sensor-based attitude estimation is a crucial technology in various applications, from human motion tracking to autonomous aerial and ground vehicles. Application scenarios differ in characteristics of the performed motion, presence of disturbances, and environmental conditions. Since state-of-the-art attitude estimators do not generalize well over these characteristics, their parameters must be tuned for the individual motion characteristics and circumstances. We propose RIANN, a ready-to-use, neural network-based, parameter-free, real-time-capable inertial attitude estimator, which generalizes well across different motion dynamics, environments, and sampling rates, without the need for application-specific adaptations. We gather six publicly available datasets of which we exploit two datasets for the method development and the training, and we use four datasets for evaluation of the trained estimator in three different test scenarios with varying practical relevance. Results show that RIANN outperforms state-of-the-art attitude estimation filters in the sense that it generalizes much better across a variety of motions and conditions in different applications, with different sensor hardware and different sampling frequencies. This is true even if the filters are tuned on each individual test dataset, whereas RIANN was trained on completely separate data and has never seen any of these test datasets. RIANN can be applied directly without adaptations or training and is therefore expected to enable plug-and-play solutions in numerous applications, especially when accuracy is crucial but no ground-truth data is available for tuning or when motion and disturbance characteristics are uncertain. We made RIANN publicly available.


2019 ◽  
Author(s):  
Kaname Kojima ◽  
Shu Tadaka ◽  
Fumiki Katsuoka ◽  
Gen Tamiya ◽  
Masayuki Yamamoto ◽  
...  

AbstractGenotype imputation estimates genotypes of unobserved variants from genotype data of other observed variants, and such estimation is enabled using haplotype data of a large number of other individuals. Although existing imputation methods explicitly use haplotype data, the accessibility of haplotype data is often limited because the agreement is necessary from donors of genome data. We propose a new imputation method that uses bidirectional recurrent neural network, and haplotype data of a large number of individuals are encoded as its model parameters through the training step, which can be shared publicly due to the difficulty in restoring genotype data at the individual-level. In the performance evaluation using the phased genotype data in the 1000 Genomes Project, the imputation accuracy of the proposed method in R2 is comparative with existing methods for variants with MAF ≥ 0.05 and is slightly worse than those of the existing methods for variants with MAF < 0.05. In a scenario of limited availability of haplotype data to the existing methods, the accuracy of the proposed method is higher than those of the existing methods at least for variants with MAF ≥ 0.005. Python code of our implementation for imputation is available at https://github.com/kanamekojima/rnnimp/.


2019 ◽  
Vol 118 (7) ◽  
pp. 101-110
Author(s):  
Ms.U.Sakthi Veeralakshmi ◽  
Dr.G. Venkatesan

This research aims at measuring the service quality in public and private banking sector and identifying its relationship to customer satisfaction and behavioral intention. The study was conducted among 500 bank customers by using revised SERVQUAL instrument with 26 items. Behavioral intention of the customers was measured by using the behavioral intention battery. The researcher has used a seven point likert scaling to measure the expected and perceived service quality (performance) and the behavioral intention of the customer. The instrument was selected as the most reliable device to measure the difference-score conceptualization. It is used to evaluate service gap between expectation and perception of service quality. Modifications are made on the SERVQUAL instrument to make it specific to the Banking sector. Questions were added to the instrument like Seating space for waiting (Tangibility), Parking space in the Bank (Tangibility), Variety of products / schemes available (Tangibility), Banks sincere steps to handling Grievances of the customers (Responsiveness). The findings of the study revealed that the customer’s perception (performance) is lower than expectation of the service quality rendered by banks. Responsiveness and Assurance SQ dimensions were the most important dimensions in service quality scored less SQ gap. The study concluded that the individual service quality dimensions have a positive impact on Overall Satisfaction.


Author(s):  
Irina Mordous

The development of modern civilization attests to its decisive role in the progressive development of institutions. They identified the difference between Western civilization and the rest of the world. Confirmation of the institutional advantages of the West was its early industrialization. The genesis and formation of institutionalism in its ideological and conceptualmethodological orientation occurs as a process alternative to neoclassic in the context of world heterodoxia, which quickly spread in social science. Highlighting institutional education as a separate area of sociocultural activity is determined by the factor of differentiation of institutional theory as a whole. A feature of institutional education is its orientation toward the individual and his/her transformation into a personality. The content of institutional education is revealed through the analysis of the institution, which includes a set of established customs, traditions, ways of thinking, behavioral stereotypes of individuals and social groups. The dynamics of socio-political, economic transformations in Ukraine requires a review of the foundations of national education and determination of the prospects for its development in the 21st century in the context of institutionalism.


Author(s):  
O. M. Reva ◽  
V. V. Kamyshin ◽  
S. P. Borsuk ◽  
V. A. Shulhin ◽  
A. V. Nevynitsyn

The negative and persistent impact of the human factor on the statistics of aviation accidents and serious incidents makes proactive studies of the attitude of “front line” aviation operators (air traffic controllers, flight crewmembers) to dangerous actions or professional conditions as a key component of the current paradigm of ICAO safety concept. This “attitude” is determined through the indicators of the influence of the human factor on decision-making, which also include the systems of preferences of air traffic controllers on the indicators and characteristics of professional activity, illustrating both the individual perception of potential risks and dangers, and the peculiarities of generalized group thinking that have developed in a particular society. Preference systems are an ordered (ranked) series of n = 21 errors: from the most dangerous to the least dangerous and characterize only the danger preference of one error over another. The degree of this preference is determined only by the difference in the ranks of the errors and does not answer the question of how much time one error is more dangerous in relation to another. The differential method for identifying the comparative danger of errors, as well as the multistep technology for identifying and filtering out marginal opinions were applied. From the initial sample of m = 37 professional air traffic controllers, two subgroups mB=20 and mG=7 people were identified with statisti-cally significant at a high level of significance within the group consistency of opinions a = 1%. Nonpara-metric optimization of the corresponding group preference systems resulted in Kemeny’s medians, in which the related (middle) ranks were missing. Based on these medians, weighted coefficients of error hazards were determined by the mathematical prioritization method. It is substantiated that with the ac-cepted accuracy of calculations, the results obtained at the second iteration of this method are more ac-ceptable. The values of the error hazard coefficients, together with their ranks established in the preference systems, allow a more complete quantitative and qualitative analysis of the attitude of both individual air traffic controllers and their professional groups to hazardous actions or conditions.


Sign in / Sign up

Export Citation Format

Share Document