Musical note onset detection based on a spectral sparsity measure

AbstractIf music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteristics also contain rich information about the identity of the musical instrument producing the notes. Note onset detection (NOD) is the basic component for many music information retrieval tasks and has attracted significant interest in audio signal processing research. In this paper, we propose an NOD method based on a novel feature coined as Normalized Identification of Note Onset based on Spectral Sparsity (NINOS2). The NINOS2 feature can be thought of as a spectral sparsity measure, aiming to exploit the difference in spectral sparsity between the different parts of a musical note. This spectral structure is revealed when focusing on low-magnitude spectral components that are traditionally filtered out when computing note onset features. We present an extensive set of NOD simulation results covering a wide range of instruments, playing styles, and mixing options. The proposed algorithm consistently outperforms the baseline Logarithmic Spectral Flux (LSF) feature for the most difficult group of instruments which are the sustained-strings instruments. It also shows better performance for challenging scenarios including polyphonic music and vibrato performances.

Download Full-text

Music Onset Detection

Machine Audition ◽

10.4018/978-1-61520-919-4.ch012 ◽

2010 ◽

pp. 297-316

Author(s):

Ruohua Zhou ◽

Josh D Reiss

Keyword(s):

Performance Measures ◽

Essential Role ◽

General Scheme ◽

Future Research ◽

Onset Detection ◽

Time Frequency ◽

Detection Algorithms ◽

Music Signal ◽

Wide Range ◽

Future Research Directions

Music onset detection plays an essential role in music signal processing and has a wide range of applications. This chapter provides a step by step introduction to the design of music onset detection algorithms. The general scheme and commonly-used time-frequency analysis for onset detection are introduced. Many methods are reviewed, and some typical energy-based, phase-based, pitch-based and supervised learning methods are described in detail. The commonly used performance measures, onset annotation software, public database and evaluation methods are introduced. The performance difference between energy-based and pitch-based method is discussed. The future research directions for music onset detection are also described.

Download Full-text

An Effective Framework for Speech and Music Segregation

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/4/9 ◽

2020 ◽

Vol 17 (4) ◽

pp. 507-514

Author(s):

Sidra Sajid ◽

Ali Javed ◽

Aun Irtaza

Keyword(s):

Information Retrieval ◽

Single Channel ◽

State Of The Art ◽

Binary Mask ◽

Multimedia Information Retrieval ◽

Time Frequency ◽

Wide Range ◽

Ideal Binary Mask ◽

Music Information ◽

Singer Identification

Speech and music segregation from a single channel is a challenging task due to background interference and intermingled signals of voice and music channels. It is of immense importance due to its utility in wide range of applications such as music information retrieval, singer identification, lyrics recognition and alignment. This paper presents an effective method for speech and music segregation. Considering the repeating nature of music, we first detect the local repeating structures in the signal using a locally defined window for each segment. After detecting the repeating structure, we extract them and perform separation using a soft time-frequency mask. We apply an ideal binary mask to enhance the speech and music intelligibility. We evaluated the proposed method on the mixtures set at -5 dB, 0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset. Experimental results demonstrate that the proposed method for speech and music segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion Ratio (GNSDR) values

Download Full-text

Enhancement of Conventional Beat Tracking System Using Teager–Kaiser Energy Operator

Applied Sciences ◽

10.3390/app10010379 ◽

2020 ◽

Vol 10 (1) ◽

pp. 379 ◽

Cited By ~ 1

Author(s):

Matej Istvanek ◽

Zdenek Smekal ◽

Lubomir Spurny ◽

Jiri Mekyska

Keyword(s):

Tracking System ◽

Audio Signal ◽

Energy Operator ◽

Research Field ◽

Reference Database ◽

Detection Accuracy ◽

Onset Detection ◽

Average Deviation ◽

Beat Tracking ◽

Music Information

Beat detection systems are widely used in the music information retrieval (MIR) research field for the computation of tempo and beat time positions in audio signals. One of the most important parts of these systems is usually onset detection. There is an understandable tendency to employ the most accurate onset detector. However, there are options to increase the global tempo (GT) accuracy and also the detection accuracy of beat positions at the expense of less accurate onset detection. The aim of this study is to introduce an enhancement of a conventional beat detector. The enhancement is based on the Teager–Kaiser energy operator (TKEO), which pre-processes the input audio signal before the spectral flux calculation. The proposed approach is first evaluated in terms of the ability to estimate the GT and beat positions accuracy of given audio tracks compared to the same conventional system without the proposed enhancement. The accuracy of the GT and average beat differences (ABD) estimation is tested on the manually labelled reference database. Finally, this system is used for analysis of a string quartet music database. Results suggest that the presence of the TKEO lowers onset detection accuracy but also increases the GT and ABD estimation. The average deviation from the reference GT in the reference database is 9.99 BPM (11.28%), which improves the conventional methodology, where the average deviation is 18.19 BPM (17.74%). This study has a pilot character and provides some suggestions for improving the beat tracking system for music analysis.

Download Full-text

Graph-based feature extraction: A new proposal to study the classification of music signals outside the time-frequency domain

PLoS ONE ◽

10.1371/journal.pone.0240915 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0240915

Author(s):

Dirceu de Freitas Piedade Melo ◽

Inacio de Sousa Fadigas ◽

Hernane Borges de Barros Pereira

Keyword(s):

Feature Extraction ◽

Fourier Transforms ◽

Audio Signal ◽

Rhythmic Activity ◽

Attribute Selection ◽

Time Frequency ◽

Music Signal ◽

Network Properties ◽

Music Information ◽

Music Signals

Most feature extraction algorithms for music audio signals use Fourier transforms to obtain coefficients that describe specific aspects of music information within the sound spectrum, such as the timbral texture, tonal texture and rhythmic activity. In this paper, we introduce a new method for extracting features related to the rhythmic activity of music signals using the topological properties of a graph constructed from an audio signal. We map the local standard deviation of a music signal to a visibility graph and calculate the modularity (Q), the number of communities (Nc), the average degree (〈k〉), and the density (Δ) of this graph. By applying this procedure to each signal in a database of various musical genres, we detected the existence of a hierarchy of rhythmic self-similarities between musical styles given by these four network properties. Using Q, Nc, 〈k〉 and Δ as input attributes in a classification experiment based on supervised artificial neural networks, we obtained an accuracy higher than or equal to the beat histogram in 70% of the musical genre pairs, using only four features from the networks. Finally, when performing the attribute selection test with Q, Nc, 〈k〉 and Δ, along with the main signal processing field descriptors, we found that the four network properties were among the top-ranking positions given by this test.

Download Full-text

Transient spectral events in resting state MEG predict individual time-frequency task responses

10.1101/419374 ◽

2018 ◽

Cited By ~ 1

Author(s):

R Becker ◽

D Vidaurre ◽

AJ Quinn ◽

R Abeysuriya ◽

O Parker Jones ◽

...

Keyword(s):

Individual Differences ◽

Resting State ◽

Spectral Structure ◽

Fast Time ◽

Markov Modelling ◽

Experimental Conditions ◽

Time Frequency ◽

Wide Range ◽

Subject Specific ◽

Task Conditions

AbstractEven in response to apparently simple tasks such as hand moving, human brain activity shows remarkable inter-subject variability. Presumably, this variability reflects genuine behavioural or functional variability. Recently, spatial variability of resting-state features in fMRI - specifically connectivity - has been shown to explain (spatial) task-response variability. Such a link, however, is still missing for M/EEG data and its spectrally rich structure. At the same time, it has recently been shown that task responses in M/EEG can be well represented using transient spectral events bursting at fast time scales. Here, we show that individual differences in the spatio-spectral structure of M/EEG task responses, can, to a reasonable degree, be predicted from individual differences in transient spectral events identified at rest. In a MEG dataset of diverse task conditions (including motor responses, working memory and language comprehension tasks) and resting-state sessions for each subject (n = 89), we used Hidden-Markov-Modelling to identify transient spectral events as a feature set to learn the mapping of space-time-frequency content from rest to task. Resulting trial-averaged, subject-specific task-response predictions were then compared with the actual task responses in left-out subjects. All task conditions were predicted significantly above chance. Furthermore, we observed a systematic relationship between genetic similarity (e.g. unrelated subjects vs. twins) and predictability. These findings support the idea that subject-specific transient spectral events in resting-state neural activity are linked to, and predictive of, subject-specific trial-averaged task responses in a wide range of experimental conditions.

Download Full-text

Comparative Effects of High-Tech Visual Scene Displays and Low-Tech Isolated Picture Symbols on Engagement From Students With Multiple Disabilities

Language Speech and Hearing Services in Schools ◽

10.1044/2019_lshss-19-0007 ◽

2019 ◽

Vol 50 (4) ◽

pp. 693-702 ◽

Cited By ~ 1

Author(s):

Christine Holyfield ◽

Sydney Brooks ◽

Allison Schluterman

Keyword(s):

Language Learning ◽

Augmentative And Alternative Communication ◽

Visual Analysis ◽

Multiple Disabilities ◽

Visual Scene ◽

Future Research ◽

High Tech ◽

Single Subject ◽

Wide Range ◽

The Difference

Purpose Augmentative and alternative communication (AAC) is an intervention approach that can promote communication and language in children with multiple disabilities who are beginning communicators. While a wide range of AAC technologies are available, little is known about the comparative effects of specific technology options. Given that engagement can be low for beginning communicators with multiple disabilities, the current study provides initial information about the comparative effects of 2 AAC technology options—high-tech visual scene displays (VSDs) and low-tech isolated picture symbols—on engagement. Method Three elementary-age beginning communicators with multiple disabilities participated. The study used a single-subject, alternating treatment design with each technology serving as a condition. Participants interacted with their school speech-language pathologists using each of the 2 technologies across 5 sessions in a block randomized order. Results According to visual analysis and nonoverlap of all pairs calculations, all 3 participants demonstrated more engagement with the high-tech VSDs than the low-tech isolated picture symbols as measured by their seconds of gaze toward each technology option. Despite the difference in engagement observed, there was no clear difference across the 2 conditions in engagement toward the communication partner or use of the AAC. Conclusions Clinicians can consider measuring engagement when evaluating AAC technology options for children with multiple disabilities and should consider evaluating high-tech VSDs as 1 technology option for them. Future research must explore the extent to which differences in engagement to particular AAC technologies result in differences in communication and language learning over time as might be expected.

Download Full-text

COMBINATORIAL POLYNOMIALLY COMPUTABLE CHARACTERISTICS OF SUBSTITUTIONS AND THEIR PROPERTIES

Computational nanotechnology ◽

10.33693/2313-223x-2020-7-2-34-41 ◽

2020 ◽

Vol 7 (2) ◽

pp. 34-41

Author(s):

VLADIMIR NIKONOV ◽

◽

ANTON ZOBOV ◽

Keyword(s):

Mathematical Expectation ◽

Point Of View ◽

Small Range ◽

Computational Point ◽

Wide Range ◽

Bijective Function ◽

The Difference ◽

Element Base ◽

Selection Of

The construction and selection of a suitable bijective function, that is, substitution, is now becoming an important applied task, particularly for building block encryption systems. Many articles have suggested using different approaches to determining the quality of substitution, but most of them are highly computationally complex. The solution of this problem will significantly expand the range of methods for constructing and analyzing scheme in information protection systems. The purpose of research is to find easily measurable characteristics of substitutions, allowing to evaluate their quality, and also measures of the proximity of a particular substitutions to a random one, or its distance from it. For this purpose, several characteristics were proposed in this work: difference and polynomial, and their mathematical expectation was found, as well as variance for the difference characteristic. This allows us to make a conclusion about its quality by comparing the result of calculating the characteristic for a particular substitution with the calculated mathematical expectation. From a computational point of view, the thesises of the article are of exceptional interest due to the simplicity of the algorithm for quantifying the quality of bijective function substitutions. By its nature, the operation of calculating the difference characteristic carries out a simple summation of integer terms in a fixed and small range. Such an operation, both in the modern and in the prospective element base, is embedded in the logic of a wide range of functional elements, especially when implementing computational actions in the optical range, or on other carriers related to the field of nanotechnology.

Download Full-text

Generalized Heterodyne Configurations for Photo-induced Force Microscopy

10.26434/chemrxiv.9633407 ◽

2019 ◽

Author(s):

Le Wang ◽

Devon Jakob ◽

Haomin Wang ◽

Alexis Apostolos ◽

Marcos M. Pires ◽

...

Keyword(s):

Spatial Resolution ◽

Repetition Rate ◽

Modulation Frequency ◽

High Harmonic ◽

Chemical Imaging ◽

Heterodyne Detection ◽

Force Microscopy ◽

Wide Range ◽

The Difference ◽

Afm Cantilever

<div>Infrared chemical microscopy through mechanical probing of light-matter interactions by atomic force microscopy (AFM) bypasses the diffraction limit. One increasingly popular technique is photo-induced force microscopy (PiFM), which utilizes the mechanical heterodyne signal detection between cantilever mechanical resonant oscillations and the photo induced force from light-matter interaction. So far, photo induced force microscopy has been operated in only one heterodyne configuration. In this article, we generalize heterodyne configurations of photoinduced force microscopy by introducing two new schemes: harmonic heterodyne detection and sequential heterodyne detection. In harmonic heterodyne detection, the laser repetition rate matches integer fractions of the difference between the two mechanical resonant modes of the AFM cantilever. The high harmonic of the beating from the photothermal expansion mixes with the AFM cantilever oscillation to provide PiFM signal. In sequential heterodyne detection, the combination of the repetition rate of laser pulses and polarization modulation frequency matches the difference between two AFM mechanical modes, leading to detectable PiFM signals. These two generalized heterodyne configurations for photo induced force microscopy deliver new avenues for chemical imaging and broadband spectroscopy at ~10 nm spatial resolution. They are suitable for a wide range of heterogeneous materials across various disciplines: from structured polymer film, polaritonic boron nitride materials, to isolated bacterial peptidoglycan cell walls. The generalized heterodyne configurations introduce flexibility for the implementation of PiFM and related tapping mode AFM-IR, and provide possibilities for additional modulation channel in PiFM for targeted signal extraction with nanoscale spatial resolution.</div>

Download Full-text

A Novel Ultrasonic TOF Ranging System Using AlN Based PMUTs

Micromachines ◽

10.3390/mi12030284 ◽

2021 ◽

Vol 12 (3) ◽

pp. 284

Author(s):

Yihsiang Chiu ◽

Chen Wang ◽

Dan Gong ◽

Nan Li ◽

Shenglin Ma ◽

...

Keyword(s):

Clock Cycle ◽

High Accuracy ◽

Ultrasonic Waves ◽

Oxide Semiconductor ◽

Average Error ◽

Cmos Process ◽

Clock Frequency ◽

Time Frequency ◽

Range Finding ◽

Wide Range

This paper presents a high-accuracy complementary metal oxide semiconductor (CMOS) driven ultrasonic ranging system based on air coupled aluminum nitride (AlN) based piezoelectric micromachined ultrasonic transducers (PMUTs) using time of flight (TOF). The mode shape and the time-frequency characteristics of PMUTs are simulated and analyzed. Two pieces of PMUTs with a frequency of 97 kHz and 96 kHz are applied. One is used to transmit and the other is used to receive ultrasonic waves. The Time to Digital Converter circuit (TDC), correlating the clock frequency with sound velocity, is utilized for range finding via TOF calculated from the system clock cycle. An application specific integrated circuit (ASIC) chip is designed and fabricated on a 0.18 μm CMOS process to acquire data from the PMUT. Compared to state of the art, the developed ranging system features a wide range and high accuracy, which allows to measure the range of 50 cm with an average error of 0.63 mm. AlN based PMUT is a promising candidate for an integrated portable ranging system.

Download Full-text

Natural Science and Supernatural Thought Experiments

Religions ◽

10.3390/rel10060389 ◽

2019 ◽

Vol 10 (6) ◽

pp. 389

Author(s):

James Robert Brown

Keyword(s):

Natural Science ◽

Thought Experiment ◽

Thought Experiments ◽

The Other ◽

Space And Time ◽

Wide Range ◽

The Difference ◽

Theological Thought

Religious notions have long played a role in epistemology. Theological thought experiments, in particular, have been effective in a wide range of situations in the sciences. Some of these are merely picturesque, others have been heuristically important, and still others, as I will argue, have played a role that could be called essential. I will illustrate the difference between heuristic and essential with two examples. One of these stems from the Newton–Leibniz debate over the nature of space and time; the other is a thought experiment of my own constructed with the aim of making a case for a more liberal view of evidence in mathematics.

Download Full-text