scholarly journals An Effective Framework for Speech and Music Segregation

2020 ◽  
Vol 17 (4) ◽  
pp. 507-514
Author(s):  
Sidra Sajid ◽  
Ali Javed ◽  
Aun Irtaza

Speech and music segregation from a single channel is a challenging task due to background interference and intermingled signals of voice and music channels. It is of immense importance due to its utility in wide range of applications such as music information retrieval, singer identification, lyrics recognition and alignment. This paper presents an effective method for speech and music segregation. Considering the repeating nature of music, we first detect the local repeating structures in the signal using a locally defined window for each segment. After detecting the repeating structure, we extract them and perform separation using a soft time-frequency mask. We apply an ideal binary mask to enhance the speech and music intelligibility. We evaluated the proposed method on the mixtures set at -5 dB, 0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset. Experimental results demonstrate that the proposed method for speech and music segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion Ratio (GNSDR) values

2022 ◽  
Vol 54 (7) ◽  
pp. 1-38
Author(s):  
Lynda Tamine ◽  
Lorraine Goeuriot

The explosive growth and widespread accessibility of medical information on the Internet have led to a surge of research activity in a wide range of scientific communities including health informatics and information retrieval (IR). One of the common concerns of this research, across these disciplines, is how to design either clinical decision support systems or medical search engines capable of providing adequate support for both novices (e.g., patients and their next-of-kin) and experts (e.g., physicians, clinicians) tackling complex tasks (e.g., search for diagnosis, search for a treatment). However, despite the significant multi-disciplinary research advances, current medical search systems exhibit low levels of performance. This survey provides an overview of the state of the art in the disciplines of IR and health informatics, and bridging these disciplines shows how semantic search techniques can facilitate medical IR. First,we will give a broad picture of semantic search and medical IR and then highlight the major scientific challenges. Second, focusing on the semantic gap challenge, we will discuss representative state-of-the-art work related to feature-based as well as semantic-based representation and matching models that support medical search systems. In addition to seminal works, we will present recent works that rely on research advancements in deep learning. Third, we make a thorough cross-model analysis and provide some findings and lessons learned. Finally, we discuss some open issues and possible promising directions for future research trends.


2018 ◽  
Vol 8 (8) ◽  
pp. 1383 ◽  
Author(s):  
Mingyu Li ◽  
Ning Chen

Similarity measurement plays an important role in various information retrieval tasks. In this paper, a music information retrieval scheme based on two-level similarity fusion and post-processing is proposed. At the similarity fusion level, to take full advantage of the common and complementary properties among different descriptors and different similarity functions, first, the track-by-track similarity graphs generated from the same descriptor but different similarity functions are fused with the similarity network fusion (SNF) technique. Then, the obtained first-level fused similarities based on different descriptors are further fused with the mixture Markov model (MMM) technique. At the post-processing level, diffusion is first performed on the two-level fused similarity graph to utilize the underlying track manifold contained within it. Then, a mutual proximity (MP) algorithm is adopted to refine the diffused similarity scores, which helps to reduce the bad influence caused by the “hubness” phenomenon contained in the scores. The performance of the proposed scheme is tested in the cover song identification (CSI) task on three cover song datasets (Covers80, Covers40, and Second Hand Songs (SHS)). The experimental results demonstrate that the proposed scheme outperforms state-of-the-art CSI schemes based on single similarity or similarity fusion.


2016 ◽  
Vol 40 (2) ◽  
pp. 70-83 ◽  
Author(s):  
Valerio Velardo ◽  
Mauro Vallati ◽  
Steven Jan

Fostered by the introduction of the Music Information Retrieval Evaluation Exchange (MIREX) competition, the number of systems that calculate symbolic melodic similarity has recently increased considerably. To understand the state of the art, we provide a comparative analysis of existing algorithms. The analysis is based on eight criteria that help to characterize the systems, highlighting strengths and weaknesses. We also propose a taxonomy that classifies algorithms based on their approach. Both taxonomy and criteria are fruitfully exploited to provide input for new, forthcoming research in the area.


2021 ◽  
Author(s):  
Reza Dokht Dolatabadi Esfahani ◽  
Frank Scherbaum ◽  
Fabrice Cotton ◽  
Matthias Ohrnberger

<p>In the last decade, the increasing number and spatial density of seismological stations provide unprecedented opportunities for recording various natural and human-related events in continuous records. Diverse methods have been proposed for event detection, classification, and characterization, but few of them are based on the physical properties of the events. In this study, inspired by music information retrieval methods such as audio fingerprinting, we present a time-efficient event detection method based on capturing the physical properties of seismic signatures such as corner frequency, high-frequency fall-off, and complexity of signature. The zero-crossing rate of the recorded signal is used to estimate the corner frequency, which is the dominant frequency in the velocity domain of record. The high-frequency fall-off can be estimated in the time-frequency spectrogram by finding the frequency below which 75% of the energy of the spectrum is produced. The complexity of the spectrum of the recorded signal is finally represented by a second-order polynomial coefficient fitting the spectrum and capturing the slope of the source spectra. Also, we use the spectral flatness to quantify the noise properties. We validate the proposed procedure to synthetic data generated by the stochastic simulation method. We finally apply the method to real data sets to detect the seismic precursors for the Nuugaatsiaq landslide. We separate the earthquake event and precursory signals because of different corner frequencies and show that the precursory signals started for hours before the main landslide.</p><p> </p>


Author(s):  
Antonello D’Aguanno

State-of-the-art MIR issues are presented and discussed both from the symbolic and audio points of view. As for the symbolic aspects, different approaches are presented in order to provide an overview of the different available solutions for particular MIR tasks. This section ends with an overview of MX, the IEEE standard XML language specifically designed to support interchange between musical notation, performance, analysis, and retrieval applications. As for the audio level, first we focus on blind tasks like beat and tempo tracking, pitch tracking and automatic recognition of musical instruments. Then we present algorithms that work both on compressed and uncompressed data. We analyze the relationships between MIR and feature extraction presenting examples of possible applications. Finally we focus on automatic music synchronization and we introduce a new audio player that supports the MX logic layer and allows to play both score and audio coherently.


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4368 ◽  
Author(s):  
Phetcharat Parathai ◽  
Naruephorn Tengtrairat ◽  
Wai Lok Woo ◽  
Mohammed A. M. Abdullah ◽  
Gholamreza Rafiee ◽  
...  

This paper proposes a solution for events classification from a sole noisy mixture that consist of two major steps: a sound-event separation and a sound-event classification. The traditional complex nonnegative matrix factorization (CMF) is extended by cooperation with the optimal adaptive L1 sparsity to decompose a noisy single-channel mixture. The proposed adaptive L1 sparsity CMF algorithm encodes the spectra pattern and estimates the phase of the original signals in time-frequency representation. Their features enhance the temporal decomposition process efficiently. The support vector machine (SVM) based one versus one (OvsO) strategy was applied with a mean supervector to categorize the demixed sound into the matching sound-event class. The first step of the multi-class MSVM method is to segment the separated signal into blocks by sliding demixed signals, then encoding the three features of each block. Mel frequency cepstral coefficients, short-time energy, and short-time zero-crossing rate are learned with multi sound-event classes by the SVM based OvsO method. The mean supervector is encoded from the obtained features. The proposed method has been evaluated with both separation and classification scenarios using real-world single recorded signals and compared with the state-of-the-art separation method. Experimental results confirmed that the proposed method outperformed the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document