scholarly journals Jazz Bass Transcription Using a U-Net Architecture

Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 670
Author(s):  
Jakob Abeßer ◽  
Meinard Müller

In this paper, we adapt a recently proposed U-net deep neural network architecture from melody to bass transcription. We investigate pitch shifting and random equalization as data augmentation techniques. In a parameter importance study, we study the influence of the skip connection strategy between the encoder and decoder layers, the data augmentation strategy, as well as of the overall model capacity on the system’s performance. Using a training set that covers various music genres and a validation set that includes jazz ensemble recordings, we obtain the best transcription performance for a downscaled version of the reference algorithm combined with skip connections that transfer intermediate activations between the encoder and decoder. The U-net based method outperforms previous knowledge-driven and data-driven bass transcription algorithms by around five percentage points in overall accuracy. In addition to a pitch estimation improvement, the voicing estimation performance is clearly enhanced.

Author(s):  
Kichang Kwak ◽  
Marc Niethammer ◽  
Kelly S. Giovanello ◽  
Martin Styner ◽  
Eran Dayan ◽  
...  

AbstractMild cognitive impairment (MCI) is often considered the precursor of Alzheimer’s disease. However, MCI is associated with substantially variable progression rates, which are not well understood. Attempts to identify the mechanisms that underlie MCI progression have often focused on the hippocampus, but have mostly overlooked its intricate structure and subdivisions. Here, we utilized deep learning to delineate the contribution of hippocampal subfields to MCI progression using a total sample of 1157 subjects (349 in the training set, 427 in a validation set and 381 in the testing set). We propose a dense convolutional neural network architecture that differentiates stable and progressive MCI based on hippocampal morphometry. The proposed deep learning model predicted MCI progression with an accuracy of 75.85%. A novel implementation of occlusion analysis revealed marked differences in the contribution of hippocampal subfields to the performance of the model, with presubiculum, CA1, subiculum, and molecular layer showing the most central role. Moreover, the analysis reveals that 10.5% of the volume of the hippocampus was redundant in the differentiation between stable and progressive MCI. Our predictive model uncovers pronounced differences in the contribution of hippocampal subfields to the progression of MCI. The results may reflect the sparing of hippocampal structure in individuals with a slower progression of neurodegeneration.


Author(s):  
Ben Saunders ◽  
Necati Cihan Camgoz ◽  
Richard Bowden

AbstractSign languages are multi-channel visual languages, where signers use a continuous 3D space to communicate. Sign language production (SLP), the automatic translation from spoken to sign languages, must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. Previous deep learning-based SLP works have produced only a concatenation of isolated signs focusing primarily on the manual features, leading to a robotic and non-expressive production. In this work, we propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D multi-channel sign pose sequences in an end-to-end manner. Our transformer network architecture introduces a counter decoding that enables variable length continuous sequence generation by tracking the production progress over time and predicting the end of sequence. We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a mixture density network (MDN) formulation to produce realistic and expressive sign pose sequences. We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging PHOENIX14T dataset and setting baselines for future research. We further provide a user evaluation of our SLP model, to understand the Deaf reception of our sign pose productions.


2021 ◽  
Vol 2094 (3) ◽  
pp. 032037
Author(s):  
M G Dorrer ◽  
S E Golovenkin ◽  
S Yu Nikulina ◽  
Yu V Orlova ◽  
E Yu Pelipeckaya ◽  
...  

Abstract The article solves the problem of creating models for predicting the course and complications of cardiovascular diseases. Artificial neural networks based on the Keras library are used. The original dataset includes 1700 case histories. In addition, the dataset augmentation procedure was used. As a result, the overall accuracy exceeded 84%. Furthermore, optimizing the network architecture and dataset has increased the overall accuracy by 17% and precision by 7%.


Author(s):  
Chao Gao ◽  
Martin Müller ◽  
Ryan Hayward

AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a two-head architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.


2020 ◽  
Author(s):  
Rocío Mercado ◽  
Tobias Rastemo ◽  
Edvard Lindelöf ◽  
Günter Klambauer ◽  
Ola Engkvist ◽  
...  

Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.<br>


2021 ◽  
Author(s):  
Hannes Staerk ◽  
Christian Dallago ◽  
Michael Heinzinger ◽  
Burkhard Rost

Although knowing where a protein functions in a cell is important to characterize biological processes, this information remains unavailable for most known proteins. Machine learning narrows the gap through predictions from expertly chosen input features leveraging evolutionary information that is resource expensive to generate. We showcase using embeddings from protein language models for competitive localization predictions not relying on evolutionary information. Our lightweight deep neural network architecture uses a softmax weighted aggregation mechanism with linear complexity in sequence length referred to as light attention (LA). The method significantly outperformed the state-of-the-art for ten localization classes by about eight percentage points (Q10). The novel models are available as a web-service and as a stand-alone application at embed.protein.properties.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e16093-e16093
Author(s):  
Mingjun Ding ◽  
Hui Cui ◽  
Butuo Li ◽  
Bing Zou ◽  
Yiyue Xu ◽  
...  

e16093 Background: Lymph node (LN) metastasis is the most important factor for decision making in esophageal squamous cell carcinoma (ESCC). A more accurate prediction model for LN metastatic status in ESCC patients is needed. Methods: In this retrospective study, 397 ESCC patients who took Contrast-Enhanced CT (CECT) within 15 days before surgery between October 2013 and November 2018 were collected. There are 924 (798 negative and 126 positive) LNs with pathologically confirmed status after surgery. All LNs were randomly divided into a training set (n = 663) and validation set (n = 185). Data augmentation including shifting and rotation was performed in the training set, resulting in 1326 negative and 1140 positive LN samples. The GACNN model was trained over CT volumetric patches centred at manually segmented LN samples. GACNN was composed of a 3D UNet encoder to extract deep features, and a graph attention layer to integrate morphological features extracted from segmented LN. The model was validated using the validation set (135 negative and 50 positive) and measured by area under ROC curve (auc), sensitivity (sen), and specificity (spe). Results: GACNN achieved better auc, sen, and spe of 0.802, 0.765, and 0.826, when compared to 3 other models including CT radiomics model (auc 0.733, sen 0.689, spe 0.765), 3D UNet encoder (auc 0.778, sen 0.722, spe 0.767), and our model without morphological features (auc 0.796, sen 0.754, spe 0.803). The improvement was statistically significant (p < 0.001). Conclusions: Our prediction model improved the prediction of LN metastasis, which has the potential to assist LN metastasis risk evaluation and personalized treatment planning in ESCC patients for surgery or radiotherapy.


2018 ◽  
Vol 7 (4.11) ◽  
pp. 90 ◽  
Author(s):  
Mohamad Aqib Haqmi Abas ◽  
Nurlaila Ismail ◽  
Ahmad Ihsan Mohd Yassin ◽  
Mohd Nasir Taib

This paper discusses the potential of applying VGG16 model architecture for plant classification. Flower images are used instead of leaves as in other plant recognition model because the structure of shape and color of leaves are similar in nature. This might be disadvantageous when we want to use only leaves images as a sole feature of plants to classify the species. Previous work has demonstrated the effectiveness of using transfer learning, dropout and data augmentation as a method to reduce overfitting problem of convolutional neural network model when applied in limited amount of images data. We have successfully build and train the VGG16 model with 2800 flower images. The model able to achieve a classification accuracy score of 96.25% for training set, 93.93% for validation set and 89.96% for testing set.  


2019 ◽  
Vol 79 (11) ◽  
Author(s):  
Stefano Carrazza ◽  
Frédéric A. Dreyer

AbstractWe introduce a generative model to simulate radiation patterns within a jet using the Lund jet plane. We show that using an appropriate neural network architecture with a stochastic generation of images, it is possible to construct a generative model which retrieves the underlying two-dimensional distribution to within a few percent. We compare our model with several alternative state-of-the-art generative techniques. Finally, we show how a mapping can be created between different categories of jets, and use this method to retroactively change simulation settings or the underlying process on an existing sample. These results provide a framework for significantly reducing simulation times through fast inference of the neural network as well as for data augmentation of physical measurements.


Sign in / Sign up

Export Citation Format

Share Document