scholarly journals Automatic onset detection using convolutional neural networks

2019 ◽  
Author(s):  
Willy Cornelissen ◽  
Maurício Loureiro

A very significant task for music research is to estimate instants when meaningful events begin (onset) and when they end (offset). Onset detection is widely applied in many fields: electrocardiograms, seismographic data, stock market results and many Music Information Research(MIR) tasks, such as Automatic Music Transcription, Rhythm Detection, Speech Recognition, etc. Automatic Onset Detection(AOD) received, recently, a huge contribution coming from Artificial Intelligence (AI) methods, mainly Machine Learning and Deep Learning. In this work, the use of Convolutional Neural Networks (CNN) is explored by adapting its original architecture in order to apply the approach to automatic onset detection on audio musical signals. We used a CNN network for onset detection on a very general dataset, well acknowledged by the MIR community, and examined the accuracy of the method by comparison to ground truth data published by the dataset. The results are promising and outperform another methods of musical onset detection.

Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 810
Author(s):  
Carlos Hernandez-Olivan ◽  
Ignacio Zay Pinilla ◽  
Carlos Hernandez-Lopez ◽  
Jose R. Beltran

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.


2020 ◽  
Vol 27 (4) ◽  
pp. 20-33
Author(s):  
Paulo César Pereira Júnior ◽  
Alexandre Monteiro ◽  
Rafael Da Luz Ribeiro ◽  
Antonio Carlos Sobieranski ◽  
Aldo Von Wangenheim

In this paper, we present a comparison between convolutional neural networks and classicalcomputer vision approaches, for the specific precision agriculture problem of weed mapping on sugarcane fields aerial images. A systematic literature review was conducted to find which computer vision methods are being used on this specific problem. The most cited methods were implemented, as well as four models of convolutional neural networks. All implemented approaches were tested using the same dataset, and their results were quantitatively and qualitatively analyzed. The obtained results were compared to a human expert made ground truth, for validation. The results indicate that the convolutional neural networks present better precision and generalize better than the classical models


Water ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3412
Author(s):  
Joakim Bruslund Haurum ◽  
Chris H. Bahnsen ◽  
Malte Pedersen ◽  
Thomas B. Moeslund

Sewer pipe inspections are currently conducted by professionals who remotely control a robot from above ground. This expensive and slow approach is prone to human mistakes. Therefore, there is both an economic and scientific interest in automating the inspection process by creating systems able to recognize sewer defects. However, the extent of research put into automatic water level estimation in sewers has been limited despite being a prerequisite for further analysis of the pipe as only sections above the water level can be visually inspected. In this work, we utilize a dataset of still images obtained from over 5000 inspections carried out for three different Danish water utilities companies. This dataset is used for training and testing decision tree methods and convolutional neural networks (CNNs) for automatic water level estimation. We pose the estimation problem as a classification and regression problem, and compare the results of both approaches. Furthermore, we compare the effect of using different inspection standards for labeling the ground truth water level. By treating the problem as a classification task and using the 2015 Danish sewer inspection standard, where water levels are clustered based on visual appearance, we achieve an averaged F1 score of 79.29% using a fine-tuned ResNet-50 CNN. This shows the potential of using CNNs for water level estimation. We believe including temporal and contextual information will improve the results further.


2020 ◽  
Vol 15 (10) ◽  
pp. 1445-1454 ◽  
Author(s):  
Giulia Ligabue ◽  
Federico Pollastri ◽  
Francesco Fontana ◽  
Marco Leonelli ◽  
Luciana Furci ◽  
...  

Background and objectivesImmunohistopathology is an essential technique in the diagnostic workflow of a kidney biopsy. Deep learning is an effective tool in the elaboration of medical imaging. We wanted to evaluate the role of a convolutional neural network as a support tool for kidney immunofluorescence reporting.Design, setting, participants, & measurementsHigh-magnification (×400) immunofluorescence images of kidney biopsies performed from the year 2001 to 2018 were collected. The report, adopted at the Division of Nephrology of the AOU Policlinico di Modena, describes the specimen in terms of “appearance,” “distribution,” “location,” and “intensity” of the glomerular deposits identified with fluorescent antibodies against IgG, IgA, IgM, C1q and C3 complement fractions, fibrinogen, and κ- and λ-light chains. The report was used as ground truth for the training of the convolutional neural networks.ResultsIn total, 12,259 immunofluorescence images of 2542 subjects undergoing kidney biopsy were collected. The test set analysis showed accuracy values between 0.79 (“irregular capillary wall” feature) and 0.94 (“fine granular” feature). The agreement test of the results obtained by the convolutional neural networks with respect to the ground truth showed similar values to three pathologists of our center. Convolutional neural networks were 117 times faster than human evaluators in analyzing 180 test images. A web platform, where it is possible to upload digitized images of immunofluorescence specimens, is available to evaluate the potential of our approach.ConclusionsThe data showed that the accuracy of convolutional neural networks is comparable with that of pathologists experienced in the field.


Geophysics ◽  
2020 ◽  
Vol 85 (4) ◽  
pp. WA27-WA39 ◽  
Author(s):  
Xinming Wu ◽  
Zhicheng Geng ◽  
Yunzhi Shi ◽  
Nam Pham ◽  
Sergey Fomel ◽  
...  

Seismic structural interpretation involves highlighting and extracting faults and horizons that are apparent as geometric features in a seismic image. Although seismic image processing methods have been proposed to automate fault and horizon interpretation, each of which today still requires significant human effort. We improve automatic structural interpretation in seismic images by using convolutional neural networks (CNNs) that recently have shown excellent performances in detecting and extracting useful image features and objects. The main limitation of applying CNNs in seismic interpretation is the preparation of many training data sets and especially the corresponding geologic labels. Manually labeling geologic features in a seismic image is highly time-consuming and subjective, which often results in incompletely or inaccurately labeled training images. To solve this problem, we have developed a workflow to automatically build diverse structure models with realistic folding and faulting features. In this workflow, with some assumptions about typical folding and faulting patterns, we simulate structural features in a 3D model by using a set of parameters. By randomly choosing the parameters from some predefined ranges, we are able to automatically generate numerous structure models with realistic and diverse structural features. Based on these structure models with known structural information, we further automatically create numerous synthetic seismic images and the corresponding ground truth of structural labels to train CNNs for structural interpretation in field seismic images. Accurate results of structural interpretation in multiple field seismic images indicate that our workflow simulates realistic and generalized structure models from which the CNNs effectively learn to recognize real structures in field images.


2019 ◽  
Vol 9 (23) ◽  
pp. 5121 ◽  
Author(s):  
Olivier Lartillot ◽  
Didier Grandjean

We present a method for tempo estimation from audio recordings based on signal processing and peak tracking, and not depending on training on ground-truth data. First, an accentuation curve, emphasizing the temporal location and accentuation of notes, is based on a detection of bursts of energy localized in time and frequency. This enables the detection of notes in dense polyphonic texture, while ignoring spectral fluctuation produced by vibrato and tremolo. Periodicities in the accentuation curve are detected using an improved version of autocorrelation function. Hierarchical metrical structures, composed of a large set of periodicities in pairwise harmonic relationships, are tracked over time. In this way, the metrical structure can be tracked even if the rhythmical emphasis switches from one metrical level to another. This approach, compared to all the other participants to the Music Information Retrieval Evaluation eXchange (MIREX) Audio Tempo Extraction competition from 2006 to 2018, is the third best one among those that can track tempo variations. While the two best methods are based on machine learning, our method suggests a way to track tempo founded on signal processing and heuristics-based peak tracking. Moreover, the approach offers for the first time a detailed representation of the dynamic evolution of the metrical structure. The method is integrated into MIRtoolbox, a Matlab toolbox freely available.


Author(s):  
Rodrigo Trevisan ◽  
Osvaldo Pérez ◽  
Nathan Schmitz ◽  
Brian Diers ◽  
Nicolas Martin

Soybean maturity is a trait of critical importance for the development of new soybean cultivars, nevertheless, its characterization based on visual ratings has many challenges. Unmanned aerial vehicles (UAVs) imagery-based high-throughput phenotyping methodologies have been proposed as an alternative to the traditional visual ratings of pod senescence. However, the lack of scalable and accurate methods to extract the desired information from the images remains a significant bottleneck in breeding programs. The objective of this study was to develop an image-based high-throughput phenotyping system for evaluating soybean maturity in breeding programs. Images were acquired twice a week, starting when the earlier lines began maturation until the latest ones were mature. Two complementary convolutional neural networks (CNN) were developed to predict the maturity date. The first using a single date and the second using the five best image dates identified by the first model. The proposed CNN architecture was validated using more than 15,000 ground truth observations from five trials, including data from three growing seasons and two countries. The trained model showed good generalization capability with a root mean squared error lower than two days in four out of five trials. Four methods of estimating prediction uncertainty showed potential at identifying different sources of errors in the maturity date predictions. The architecture used solves limitations of previous research and can be used at scale in commercial breeding programs.


Sign in / Sign up

Export Citation Format

Share Document