Jazz Bass Transcription Using a U-Net Architecture

Jakob Abeßer; Meinard Müller

doi:10.3390/electronics10060670

Jazz Bass Transcription Using a U-Net Architecture

Electronics ◽

10.3390/electronics10060670 ◽

2021 ◽

Vol 10 (6) ◽

pp. 670

Author(s):

Jakob Abeßer ◽

Meinard Müller

Keyword(s):

Network Architecture ◽

Data Augmentation ◽

Neural Network Architecture ◽

Training Set ◽

Pitch Estimation ◽

Jazz Ensemble ◽

Percentage Points ◽

Augmentation Strategy ◽

Augmentation Techniques ◽

Validation Set

In this paper, we adapt a recently proposed U-net deep neural network architecture from melody to bass transcription. We investigate pitch shifting and random equalization as data augmentation techniques. In a parameter importance study, we study the influence of the skip connection strategy between the encoder and decoder layers, the data augmentation strategy, as well as of the overall model capacity on the system’s performance. Using a training set that covers various music genres and a validation set that includes jazz ensemble recordings, we obtain the best transcription performance for a downscaled version of the reference algorithm combined with skip connections that transfer intermediate activations between the encoder and decoder. The U-net based method outperforms previous knowledge-driven and data-driven bass transcription algorithms by around five percentage points in overall accuracy. In addition to a pitch estimation improvement, the voicing estimation performance is clearly enhanced.

Download Full-text

The contribution of hippocampal subfields to the progression of neurodegeneration

10.1101/2020.05.06.081034 ◽

2020 ◽

Cited By ~ 2

Author(s):

Kichang Kwak ◽

Marc Niethammer ◽

Kelly S. Giovanello ◽

Martin Styner ◽

Eran Dayan ◽

...

Keyword(s):

Deep Learning ◽

Network Architecture ◽

Molecular Layer ◽

Total Sample ◽

Neural Network Architecture ◽

Hippocampal Subfields ◽

Validation Set ◽

Hippocampal Structure ◽

Deep Learning Model ◽

Variable Progression

AbstractMild cognitive impairment (MCI) is often considered the precursor of Alzheimer’s disease. However, MCI is associated with substantially variable progression rates, which are not well understood. Attempts to identify the mechanisms that underlie MCI progression have often focused on the hippocampus, but have mostly overlooked its intricate structure and subdivisions. Here, we utilized deep learning to delineate the contribution of hippocampal subfields to MCI progression using a total sample of 1157 subjects (349 in the training set, 427 in a validation set and 381 in the testing set). We propose a dense convolutional neural network architecture that differentiates stable and progressive MCI based on hippocampal morphometry. The proposed deep learning model predicted MCI progression with an accuracy of 75.85%. A novel implementation of occlusion analysis revealed marked differences in the contribution of hippocampal subfields to the performance of the model, with presubiculum, CA1, subiculum, and molecular layer showing the most central role. Moreover, the analysis reveals that 10.5% of the volume of the hippocampus was redundant in the differentiation between stable and progressive MCI. Our predictive model uncovers pronounced differences in the contribution of hippocampal subfields to the progression of MCI. The results may reflect the sparing of hippocampal structure in individuals with a slower progression of neurodegeneration.

Download Full-text

Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks

International Journal of Computer Vision ◽

10.1007/s11263-021-01457-9 ◽

2021 ◽

Author(s):

Ben Saunders ◽

Necati Cihan Camgoz ◽

Richard Bowden

Keyword(s):

Sign Language ◽

Network Architecture ◽

Language Production ◽

Data Augmentation ◽

Future Research ◽

Sign Languages ◽

Mixture Density ◽

Back Translation ◽

Augmentation Techniques ◽

Evaluation Mechanism

AbstractSign languages are multi-channel visual languages, where signers use a continuous 3D space to communicate. Sign language production (SLP), the automatic translation from spoken to sign languages, must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. Previous deep learning-based SLP works have produced only a concatenation of isolated signs focusing primarily on the manual features, leading to a robotic and non-expressive production. In this work, we propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D multi-channel sign pose sequences in an end-to-end manner. Our transformer network architecture introduces a counter decoding that enables variable length continuous sequence generation by tracking the production progress over time and predicting the end of sequence. We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a mixture density network (MDN) formulation to produce realistic and expressive sign pose sequences. We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging PHOENIX14T dataset and setting baselines for future research. We further provide a user evaluation of our SLP model, to understand the Deaf reception of our sign pose productions.

Download Full-text

Selection of neural network architecture and data augmentation procedures for predicting the course of cardiovascular diseases

Journal of Physics Conference Series ◽

10.1088/1742-6596/2094/3/032037 ◽

2021 ◽

Vol 2094 (3) ◽

pp. 032037

Author(s):

M G Dorrer ◽

S E Golovenkin ◽

S Yu Nikulina ◽

Yu V Orlova ◽

E Yu Pelipeckaya ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Cardiovascular Diseases ◽

Network Architecture ◽

Data Augmentation ◽

Case Histories ◽

Neural Network Architecture ◽

Original Dataset ◽

Augmentation Procedure ◽

Selection Of

Abstract The article solves the problem of creating models for predicting the course and complications of cardiovascular diseases. Artificial neural networks based on the Keras library are used. The original dataset includes 1700 case histories. In addition, the dataset augmentation procedure was used. As a result, the overall accuracy exceeded 84%. Furthermore, optimizing the network architecture and dataset has increased the overall accuracy by 17% and precision by 7%.

Download Full-text

Three-Head Neural Network Architecture for Monte Carlo Tree Search

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/523 ◽

2018 ◽

Cited By ~ 2

Author(s):

Chao Gao ◽

Martin Müller ◽

Ryan Hayward

Keyword(s):

Monte Carlo ◽

Network Architecture ◽

Data Augmentation ◽

The State ◽

Neural Nets ◽

Tree Search ◽

Neural Net ◽

Neural Network Architecture ◽

Monte Carlo Tree Search ◽

Action Value

AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a two-head architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.

Download Full-text

Algorithm of artificial neural network architecture and training set size configuration within approximation of dynamic object behavior

Computer Research and Modeling ◽

10.20537/2076-7633-2015-7-2-243-251 ◽

2015 ◽

Vol 7 (2) ◽

pp. 243-251 ◽

Cited By ~ 1

Author(s):

Aleksandr Georgievich Shumixin ◽

Anna Sergeevna Boyarshinova

Keyword(s):

Neural Network ◽

Network Architecture ◽

Dynamic Object ◽

Neural Network Architecture ◽

Training Set ◽

Set Size ◽

Artificial Neural Network Architecture ◽

And Training ◽

Dynamic Object Behavior ◽

Object Behavior

Download Full-text

Graph Networks for Molecular Design

10.26434/chemrxiv.12843137 ◽

2020 ◽

Author(s):

Rocío Mercado ◽

Tobias Rastemo ◽

Edvard Lindelöf ◽

Günter Klambauer ◽

Ola Engkvist ◽

...

Keyword(s):

Network Architecture ◽

Deep Neural Network ◽

Molecular Design ◽

State Of The Art ◽

Generative Models ◽

Single Bond ◽

Neural Network Architecture ◽

Training Set ◽

Design Studies ◽

Graph Neural Networks

Deep learning methods applied to chemistry can be used to accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses a tiered deep neural network architecture to probabilistically generate new molecules a single bond at a time. All models implemented in GraphINVENT can quickly learn to build molecules resembling the training set molecules without any explicit programming of chemical rules. The models have been benchmarked using the MOSES distribution-based metrics, showing how GraphINVENT models compare well with state-of-the-art generative models. This work is one of the first thorough graph-based molecular design studies, and illustrates how GNN-based models are promising tools for molecular discovery.<br>

Download Full-text

Light Attention Predicts Protein Location from the Language of Life

10.1101/2021.04.25.441334 ◽

2021 ◽

Author(s):

Hannes Staerk ◽

Christian Dallago ◽

Michael Heinzinger ◽

Burkhard Rost

Keyword(s):

Network Architecture ◽

Language Models ◽

Sequence Length ◽

Evolutionary Information ◽

Biological Processes ◽

The Novel ◽

Neural Network Architecture ◽

Percentage Points ◽

Protein Functions ◽

A Cell

Although knowing where a protein functions in a cell is important to characterize biological processes, this information remains unavailable for most known proteins. Machine learning narrows the gap through predictions from expertly chosen input features leveraging evolutionary information that is resource expensive to generate. We showcase using embeddings from protein language models for competitive localization predictions not relying on evolutionary information. Our lightweight deep neural network architecture uses a softmax weighted aggregation mechanism with linear complexity in sequence length referred to as light attention (LA). The method significantly outperformed the state-of-the-art for ten localization classes by about eight percentage points (Q10). The novel models are available as a web-service and as a stand-alone application at embed.protein.properties.

Download Full-text

Improved lymph node metastasis prediction from preoperative esophageal squamous cell cancer CT by graph attention convolutional neural network (GACNN).

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e16093 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e16093-e16093

Author(s):

Mingjun Ding ◽

Hui Cui ◽

Butuo Li ◽

Bing Zou ◽

Yiyue Xu ◽

...

Keyword(s):

Lymph Node ◽

Prediction Model ◽

Squamous Cell ◽

Data Augmentation ◽

Cell Cancer ◽

Morphological Features ◽

Training Set ◽

Contrast Enhanced Ct ◽

Area Under Roc Curve ◽

Validation Set

e16093 Background: Lymph node (LN) metastasis is the most important factor for decision making in esophageal squamous cell carcinoma (ESCC). A more accurate prediction model for LN metastatic status in ESCC patients is needed. Methods: In this retrospective study, 397 ESCC patients who took Contrast-Enhanced CT (CECT) within 15 days before surgery between October 2013 and November 2018 were collected. There are 924 (798 negative and 126 positive) LNs with pathologically confirmed status after surgery. All LNs were randomly divided into a training set (n = 663) and validation set (n = 185). Data augmentation including shifting and rotation was performed in the training set, resulting in 1326 negative and 1140 positive LN samples. The GACNN model was trained over CT volumetric patches centred at manually segmented LN samples. GACNN was composed of a 3D UNet encoder to extract deep features, and a graph attention layer to integrate morphological features extracted from segmented LN. The model was validated using the validation set (135 negative and 50 positive) and measured by area under ROC curve (auc), sensitivity (sen), and specificity (spe). Results: GACNN achieved better auc, sen, and spe of 0.802, 0.765, and 0.826, when compared to 3 other models including CT radiomics model (auc 0.733, sen 0.689, spe 0.765), 3D UNet encoder (auc 0.778, sen 0.722, spe 0.767), and our model without morphological features (auc 0.796, sen 0.754, spe 0.803). The improvement was statistically significant (p < 0.001). Conclusions: Our prediction model improved the prediction of LN metastasis, which has the potential to assist LN metastasis risk evaluation and personalized treatment planning in ESCC patients for surgery or radiotherapy.

Download Full-text

VGG16 for Plant Image Classification with Transfer Learning and Data Augmentation

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.11.20781 ◽

2018 ◽

Vol 7 (4.11) ◽

pp. 90 ◽

Cited By ~ 7

Author(s):

Mohamad Aqib Haqmi Abas ◽

Nurlaila Ismail ◽

Ahmad Ihsan Mohd Yassin ◽

Mohd Nasir Taib

Keyword(s):

Neural Network ◽

Transfer Learning ◽

Classification Accuracy ◽

Data Augmentation ◽

Accuracy Score ◽

Training Set ◽

Plant Recognition ◽

Plant Image ◽

Validation Set ◽

Overfitting Problem

This paper discusses the potential of applying VGG16 model architecture for plant classification. Flower images are used instead of leaves as in other plant recognition model because the structure of shape and color of leaves are similar in nature. This might be disadvantageous when we want to use only leaves images as a sole feature of plants to classify the species. Previous work has demonstrated the effectiveness of using transfer learning, dropout and data augmentation as a method to reduce overfitting problem of convolutional neural network model when applied in limited amount of images data. We have successfully build and train the VGG16 model with 2800 flower images. The model able to achieve a classification accuracy score of 96.25% for training set, 93.93% for validation set and 89.96% for testing set.

Download Full-text

Lund jet images from generative and cycle-consistent adversarial networks

The European Physical Journal C ◽

10.1140/epjc/s10052-019-7501-1 ◽

2019 ◽

Vol 79 (11) ◽

Cited By ~ 8

Author(s):

Stefano Carrazza ◽

Frédéric A. Dreyer

Keyword(s):

Neural Network ◽

Network Architecture ◽

Data Augmentation ◽

Generative Model ◽

Neural Network Architecture ◽

Adversarial Networks ◽

Physical Measurements ◽

The Neural Network ◽

Alternative State ◽

Dimensional Distribution

AbstractWe introduce a generative model to simulate radiation patterns within a jet using the Lund jet plane. We show that using an appropriate neural network architecture with a stochastic generation of images, it is possible to construct a generative model which retrieves the underlying two-dimensional distribution to within a few percent. We compare our model with several alternative state-of-the-art generative techniques. Finally, we show how a mapping can be created between different categories of jets, and use this method to retroactively change simulation settings or the underlying process on an existing sample. These results provide a framework for significantly reducing simulation times through fast inference of the neural network as well as for data augmentation of physical measurements.

Download Full-text