scholarly journals Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

Signals ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 508-526
Author(s):  
Ryoto Ishizuka ◽  
Ryo Nishikimi ◽  
Kazuyoshi Yoshii

This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and to improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. The experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model.

Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6493
Author(s):  
Song-Kyu Park ◽  
Joon-Hyuk Chang

In this paper, we propose a multi-channel cross-tower with attention mechanisms in latent domain network (Multi-TALK) that suppresses both the acoustic echo and background noise. The proposed approach consists of the cross-tower network, a parallel encoder with an auxiliary encoder, and a decoder. For the multi-channel processing, a parallel encoder is used to extract latent features of each microphone, and the latent features including the spatial information are compressed by a 1D convolution operation. In addition, the latent features of the far-end are extracted by the auxiliary encoder, and they are effectively provided to the cross-tower network by using the attention mechanism. The cross tower network iteratively estimates the latent features of acoustic echo and background noise in each tower. To improve the performance at each iteration, the outputs of each tower are transmitted as the input for the next iteration of the neighboring tower. Before passing through the decoder, to estimate the near-end speech, attention mechanisms are further applied to remove the estimated acoustic echo and background noise from the compressed mixture to prevent speech distortion by over-suppression. Compared to the conventional algorithms, the proposed algorithm effectively suppresses the acoustic echo and background noise and significantly lowers the speech distortion.


2020 ◽  
Vol 86 (5) ◽  
pp. 299-315
Author(s):  
X. Wang ◽  
C. Heipke

Recently, global structure from motion has successfully gained many followers, mainly because of its computational speed. Most of these global methods take the parameters of relative orientation (ROs ) as input and then perform averaging operations. Therefore, eliminating incorrect ROs is of great significance for improving the robustness of global structure from motion. In this article, we propose a method to eliminate wrong ROs which have resulted from repetitive structure and very short baselines. We present two corresponding criteria that indicate the quality of ROs. Repetitive structure is detected based on counts of conjugate points of the various image pairs, while very short baselines are found by inspecting the intersection angles of corresponding image rays. By analyzing these two criteria, we detect and eliminate incorrect ROs. As correct ROs of image pairs with a longer baseline nearly parallel to both viewing directions can be valuable, a method to identify and keep these ROs is also part of our approach. We demonstrate the new method on various data sets, including public benchmarks as well as close-range images and images from unmanned aerial vehicles, by inserting our refined ROs into a global structure-from-motion pipeline. The experiments show that compared to other methods, we can generate the best results.


2019 ◽  
Author(s):  
Vuong Le ◽  
Thomas P. Quinn ◽  
Truyen Tran ◽  
Svetha Venkatesh

AbstractTechnological advances in next-generation sequencing (NGS) and chromatographic assays [e.g., liquid chromatography mass spectrometry (LC-MS)] have made it possible to identify thousands of microbe and metabolite species, and to measure their relative abundance. In this paper, we propose a sparse neural encoder-decoder network to predict metabolite abundances from microbe abundances. Using paired data from a cohort of inflammatory bowel disease (IBD) patients, we show that our neural encoder-decoder model outperforms linear univariate and multivariate methods in terms of accuracy, sparsity, and stability. Importantly, we show that our neural encoder-decoder model is not simply a black box designed to maximize predictive accuracy. Rather, the network’s hidden layer (i.e., the latent space, comprised only of sparsely weighted microbe counts) actually captures key microbe-metabolite relationships that are themselves clinically meaningful. Although this hidden layer is learned without any knowledge of the patient’s diagnosis, we show that the learned latent features are structured in a way that predicts IBD and treatment status with high accuracy. By imposing a non-negative weights constraint, the network becomes a directed graph where each downstream node is interpretable as the additive combination of the upstream nodes. Here, the middle layer comprises distinct microbe-metabolite axes that relate key microbial biomarkers with metabolite biomarkers. By pre-processing the microbiome and metabolome data using compositional data analysis methods, we ensure that our proposed multi-omics workflow will generalize to any pair of -omics data. To the best of our knowledge, this work is the first application of neural encoder-decoders for the interpretable integration of multi-omics biological data.


GeroPsych ◽  
2020 ◽  
Vol 33 (4) ◽  
pp. 235-244
Author(s):  
Boo Johansson ◽  
Marcus Praetorius Björk ◽  
Valgeir Thorvaldsson

Abstract. In 1987, we administered a subjective memory questionnaire to 143 40-year-old men, and 30 years later 67 of them again responded to the same questionnaire at age 70. At the follow-up, we also instructed participants to answer the questionnaire in the same manner as they thought they did at age 40 and to perform a picture recognition and a public event test. We employed confirmatory factor analysis to model a latent subjective memory construct. A single-factor solution provided acceptable model fit to data (χ2(12) = 9.33, p = .68; χ2(12) = 10.48, p = .57) and a decent reliability at both ages for the subjective memory measurements (omega = .82 and .93, respectively). Our longitudinal invariance testing revealed only a partial weak invariance. We also fitted a latent change-score model to the data. As expected, participants on average rated their memory as poorer at age 70 than at 40. Those who reported better overall health and less anxiety reported less memory decline up to age 70. Notably, this was also the case for those who rated memory as worse at age 40. Higher stress and depression at age 70, however, were associated with worse subjective memory already at age 40. The correspondences between memory ratings and tests were low. The correlation between the subjective memory factors at age 40 and 70 was 0.58, while the correlation between the memory factor at age 70 and the retrospective subjective memory factor was 0.87. Our findings suggest that subjective memory is quite consistent, and that we are inclined to preserve the continuity of our own memory functioning over the adult lifespan.


2009 ◽  
Vol 36 (2) ◽  
pp. 231-243 ◽  
Author(s):  
David G. Medway

Joseph Banks possessed the greater part of the zoological specimens collected on James Cook's three voyages round the world (1768–1780). In early 1792, Banks divided his zoological collection between John Hunter and the British Museum. It is probable that those donations together comprised most of the zoological specimens then in the possession of Banks, including such bird specimens as remained of those that had been collected by himself and Daniel Solander on Cook's first voyage, and those that had been presented to him from Cook's second and third voyages. The bird specimens included in the Banks donations of 1792 became part of a series of transactions during the succeeding 53 years which involved the British Museum, the Royal College of Surgeons of England, and William Bullock. It is a great pity that, of the extensive collection of bird specimens from Cook's voyages once possessed by Banks, only two are known with any certainty to survive.


2020 ◽  
Vol 140 (12) ◽  
pp. 1393-1401
Author(s):  
Hiroki Chinen ◽  
Hidehiro Ohki ◽  
Keiji Gyohten ◽  
Toshiya Takami

2020 ◽  
Vol 4 (1) ◽  
pp. 51-63
Author(s):  
Peter Neuhaus ◽  
Chris Jumonville ◽  
Rachel A. Perry ◽  
Roman Edwards ◽  
Jake L. Martin ◽  
...  

AbstractTo assess the comparative similarity of squat data collected as they wore a robotic exoskeleton, female athletes (n=14) did two exercise bouts spaced 14 days apart. Data from their exoskeleton workout was compared to a session they did with free weights. Each squat workout entailed a four-set, four-repetition paradigm with 60-second rest periods. Sets for each workout involved progressively heavier (22.5, 34, 45.5, 57 kg) loads. The same physiological, perceptual, and exercise performance dependent variables were measured and collected from both workouts. Per dependent variable, Pearson correlation coefficients, t-tests, and Cohen's d effect size compared the degree of similarity between values obtained from the exoskeleton and free weight workouts. Results show peak O2, heart rate, and peak force data produced the least variability. In contrast, far more inter-workout variability was noted for peak velocity, peak power, and electromyography (EMG) values. Overall, an insufficient amount of comparative similarity exists for data collected from both workouts. Due to the limited data similarity, the exoskeleton does not exhibit an acceptable degree of validity. Likely the cause for the limited similarity was due to the brief amount of familiarization subjects had to the exoskeleton prior to actual data collection. A familiarization session that accustomed subjects to squats done with the exoskeleton prior to actual data collection may have considerably improved the validity of data obtained from that device.


Sign in / Sign up

Export Citation Format

Share Document