MULTI-VERSION MUSIC SEARCH USING ACOUSTIC FEATURE UNION AND EXACT SOFT MAPPING

Research on audio-based music retrieval has primarily concentrated on refining audio features to improve search quality. However, much less work has been done on improving the time efficiency of music audio searches. Representing music audio documents in an indexable format provides a mechanism for achieving efficiency. To address this issue, in this work Exact Locality Sensitive Mapping (ELSM) is suggested to join the concatenated feature sets and soft hash values. On this basis we propose audio-based music indexing techniques, ELSM and Soft Locality Sensitive Hash (SoftLSH) using an optimized Feature Union (FU) set of extracted audio features. Two contributions are made here. First, the principle of similarity-invariance is applied in summarizing audio feature sequences and utilized in training semantic audio representations based on regression. Second, soft hash values are pre-calculated to help locate the searching range more accurately and improve collision probability among features similar to each other. Our algorithms are implemented in a demonstration system to show how to retrieve and evaluate multi-version audio documents. Experimental evaluation over a real "multi-version" audio dataset confirms the practicality of ELSM and SoftLSH with FU and proves that our algorithms are effective for both multi-version detection (online query, one-query vs. multi-object) and same content detection (batch queries, multi-queries vs. one-object).

Download Full-text

Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination

IEEE Transactions on Multimedia ◽

10.1109/tmm.2006.884618 ◽

2006 ◽

Vol 8 (6) ◽

pp. 1179-1189 ◽

Cited By ~ 33

Author(s):

Jialie Shen ◽

John Shepherd ◽

Anne H. H. Ngu

Keyword(s):

Feature Combination ◽

Acoustic Feature ◽

Music Retrieval

Download Full-text

Chinese Accent Detection Using Acoustic Feature Sets with Context Features

Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science ◽

10.2991/lemcs-14.2014.228 ◽

2014 ◽

Author(s):

Yunxue Zhao ◽

Long Zhang ◽

Shijie Zheng

Keyword(s):

Acoustic Feature ◽

Feature Sets ◽

Context Features

Download Full-text

Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

Processes ◽

10.3390/pr9122286 ◽

2021 ◽

Vol 9 (12) ◽

pp. 2286

Author(s):

Ammar Amjad ◽

Lal Khan ◽

Hsien-Tsung Chang

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Feature Fusion ◽

Spontaneous Speech ◽

Identification Accuracy ◽

Independent Experiment ◽

Support Vector ◽

Acoustic Feature ◽

Hybrid Features ◽

Feature Sets

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.

Download Full-text

Using multiple acoustic feature sets for speech recognition

Speech Communication ◽

10.1016/j.specom.2007.04.005 ◽

2007 ◽

Vol 49 (6) ◽

pp. 514-525 ◽

Cited By ~ 18

Author(s):

András Zolnay ◽

Daniil Kocharov ◽

Ralf Schlüter ◽

Hermann Ney

Keyword(s):

Speech Recognition ◽

Acoustic Feature ◽

Feature Sets

Download Full-text

Matching Subsequence Music Retrieval in a Software Integration Environment

Complexity ◽

10.1155/2021/4300059 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Zhencong Li ◽

Qin Yao ◽

Wanzhi Ma

Keyword(s):

Feature Extraction ◽

Dimensionality Reduction ◽

Feature Fusion ◽

Basic Knowledge ◽

Software Integration ◽

K Nearest Neighbor ◽

Fusion Algorithm ◽

Music Retrieval ◽

Audio Features ◽

Audio Data

This paper firstly introduces the basic knowledge of music, proposes the detailed design of a music retrieval system based on the knowledge of music, and analyzes the feature extraction algorithm and matching algorithm by using the features of music. Feature extraction of audio data is the important research of this paper. In this paper, the main melody features, MFCC features, GFCC features, and rhythm features, are extracted from audio data and a feature fusion algorithm is proposed to achieve the fusion of GFCC features and rhythm features to form new features under the processing of principal component analysis (PCA) dimensionality reduction. After learning the main melody features, MFCC features, GFCC features, and rhythm features, based on the property that PCA dimensionality reduction can effectively reduce noise and improve retrieval efficiency, this paper proposes vector fusion by dimensionality reduction of GFCC features and rhythm features. The matching retrieval of audio features is an important task in music retrieval. In this paper, the DTW algorithm is chosen as the main algorithm for retrieving music. The classification retrieval of music is also achieved by the K-nearest neighbor algorithm. In this paper, after implementing the research and improvement of algorithms, these algorithms are integrated into the system to achieve audio preprocessing, feature extraction, feature postprocessing, and matching retrieval. This article uses 100 different kinds of MP3 format music as the music library and randomly selects 4 pieces each time, and it tests the system under different system parameters, recording duration, and environmental noise. Through the research of this paper, the efficiency of music retrieval is improved and theoretical support is provided for the design of music retrieval software integration system.

Download Full-text

High Predictive Accuracy of Negative Schizotypy With Acoustic Measures

Clinical Psychological Science ◽

10.1177/21677026211017835 ◽

2021 ◽

pp. 216770262110178

Author(s):

Alex S. Cohen ◽

Christopher R. Cox ◽

Tovah Cowan ◽

Michael D. Masucci ◽

Thanh P. Le ◽

...

Keyword(s):

Acoustic Analysis ◽

Predictive Accuracy ◽

Clinical Symptoms ◽

Model Performance ◽

Acoustic Features ◽

Acoustic Feature ◽

Feature Sets ◽

Acoustic Measures ◽

Digital Phenotyping ◽

Speaking Task

Negative schizotypal traits potentially can be digitally phenotyped using objective vocal analysis. Prior attempts have shown mixed success in this regard, potentially because acoustic analysis has relied on small, constrained feature sets. We employed machine learning to (a) optimize and cross-validate predictive models of self-reported negative schizotypy using a large acoustic feature set, (b) evaluate model performance as a function of sex and speaking task, (c) understand potential mechanisms underlying negative schizotypal traits by evaluating the key acoustic features within these models, and (d) examine model performance in its convergence with clinical symptoms and cognitive functioning. Accuracy was good (> 80%) and was improved by considering speaking task and sex. However, the features identified as most predictive of negative schizotypal traits were generally not considered critical to their conceptual definitions. Implications for validating and implementing digital phenotyping to understand and quantify negative schizotypy are discussed.

Download Full-text

Using machine learning of computerized vocal expression to measure blunted vocal affect and alogia

npj Schizophrenia ◽

10.1038/s41537-020-00115-2 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Alex S. Cohen ◽

Christopher R. Cox ◽

Thanh P. Le ◽

Tovah Cowan ◽

Michael D. Masucci ◽

...

Keyword(s):

Machine Learning ◽

Social Functioning ◽

Negative Symptoms ◽

Psychiatric Symptoms ◽

Recall Task ◽

Acoustic Feature ◽

Free Recall Task ◽

Feature Sets ◽

Digital Phenotyping ◽

Vocal Affect

Abstract Negative symptoms are a transdiagnostic feature of serious mental illness (SMI) that can be potentially “digitally phenotyped” using objective vocal analysis. In prior studies, vocal measures show low convergence with clinical ratings, potentially because analysis has used small, constrained acoustic feature sets. We sought to evaluate (1) whether clinically rated blunted vocal affect (BvA)/alogia could be accurately modelled using machine learning (ML) with a large feature set from two separate tasks (i.e., a 20-s “picture” and a 60-s “free-recall” task), (2) whether “Predicted” BvA/alogia (computed from the ML model) are associated with demographics, diagnosis, psychiatric symptoms, and cognitive/social functioning, and (3) which key vocal features are central to BvA/Alogia ratings. Accuracy was high (>90%) and was improved when computed separately by speaking task. ML scores were associated with poor cognitive performance and social functioning and were higher in patients with schizophrenia versus depression or mania diagnoses. However, the features identified as most predictive of BvA/Alogia were generally not considered critical to their operational definitions. Implications for validating and implementing digital phenotyping to reduce SMI burden are discussed.

Download Full-text

Social audio features for advanced music retrieval interfaces

Proceedings of the international conference on Multimedia - MM '10 ◽

10.1145/1873951.1874007 ◽

2010 ◽

Cited By ~ 5

Author(s):

Michael Kuhn ◽

Roger Wattenhofer ◽

Samuel Welten

Keyword(s):

Music Retrieval ◽

Audio Features

Download Full-text

Music Retrieval in Joint Emotion Space Using Audio Features and Emotional Tags

Lecture Notes in Computer Science - Advances in Multimedia Modeling ◽

10.1007/978-3-642-35725-1_48 ◽

2013 ◽

pp. 524-534 ◽

Cited By ~ 3

Author(s):

James J. Deng ◽

C. H. C. Leung

Keyword(s):

Music Retrieval ◽

Audio Features ◽

Emotion Space

Download Full-text

Speeding up training of automated bird recognizers by data reduction of audio features

PeerJ ◽

10.7717/peerj.8407 ◽

2020 ◽

Vol 8 ◽

pp. e8407

Author(s):

Allan G. de Oliveira ◽

Thiago M. Ventura ◽

Todor D. Ganchev ◽

Lucas N.S. Silva ◽

Marinêz I. Marques ◽

...

Keyword(s):

Data Reduction ◽

Markov Models ◽

Recognition Performance ◽

Training Data ◽

Acoustic Features ◽

Acoustic Feature ◽

Mel Frequency Cepstral Coefficients ◽

Audio Features ◽

Long Time ◽

Reduction Methods

Automated acoustic recognition of birds is considered an important technology in support of biodiversity monitoring and biodiversity conservation activities. These activities require processing large amounts of soundscape recordings. Typically, recordings are transformed to a number of acoustic features, and a machine learning method is used to build models and recognize the sound events of interest. The main problem is the scalability of data processing, either for developing models or for processing recordings made over long time periods. In those cases, the processing time and resources required might become prohibitive for the average user. To address this problem, we evaluated the applicability of three data reduction methods. These methods were applied to a series of acoustic feature vectors as an additional postprocessing step, which aims to reduce the computational demand during training. The experimental results obtained using Mel-frequency cepstral coefficients (MFCCs) and hidden Markov models (HMMs) support the finding that a reduction in training data by a factor of 10 does not significantly affect the recognition performance.

Download Full-text