MULTI-VERSION MUSIC SEARCH USING ACOUSTIC FEATURE UNION AND EXACT SOFT MAPPING

2009 ◽  
Vol 03 (02) ◽  
pp. 209-234 ◽  
Author(s):  
YI YU ◽  
KAZUKI JOE ◽  
VINCENT ORIA ◽  
FABIAN MOERCHEN ◽  
J. STEPHEN DOWNIE ◽  
...  

Research on audio-based music retrieval has primarily concentrated on refining audio features to improve search quality. However, much less work has been done on improving the time efficiency of music audio searches. Representing music audio documents in an indexable format provides a mechanism for achieving efficiency. To address this issue, in this work Exact Locality Sensitive Mapping (ELSM) is suggested to join the concatenated feature sets and soft hash values. On this basis we propose audio-based music indexing techniques, ELSM and Soft Locality Sensitive Hash (SoftLSH) using an optimized Feature Union (FU) set of extracted audio features. Two contributions are made here. First, the principle of similarity-invariance is applied in summarizing audio feature sequences and utilized in training semantic audio representations based on regression. Second, soft hash values are pre-calculated to help locate the searching range more accurately and improve collision probability among features similar to each other. Our algorithms are implemented in a demonstration system to show how to retrieve and evaluate multi-version audio documents. Experimental evaluation over a real "multi-version" audio dataset confirms the practicality of ELSM and SoftLSH with FU and proves that our algorithms are effective for both multi-version detection (online query, one-query vs. multi-object) and same content detection (batch queries, multi-queries vs. one-object).

2006 ◽  
Vol 8 (6) ◽  
pp. 1179-1189 ◽  
Author(s):  
Jialie Shen ◽  
John Shepherd ◽  
Anne H. H. Ngu

Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2286
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.


2007 ◽  
Vol 49 (6) ◽  
pp. 514-525 ◽  
Author(s):  
András Zolnay ◽  
Daniil Kocharov ◽  
Ralf Schlüter ◽  
Hermann Ney

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zhencong Li ◽  
Qin Yao ◽  
Wanzhi Ma

This paper firstly introduces the basic knowledge of music, proposes the detailed design of a music retrieval system based on the knowledge of music, and analyzes the feature extraction algorithm and matching algorithm by using the features of music. Feature extraction of audio data is the important research of this paper. In this paper, the main melody features, MFCC features, GFCC features, and rhythm features, are extracted from audio data and a feature fusion algorithm is proposed to achieve the fusion of GFCC features and rhythm features to form new features under the processing of principal component analysis (PCA) dimensionality reduction. After learning the main melody features, MFCC features, GFCC features, and rhythm features, based on the property that PCA dimensionality reduction can effectively reduce noise and improve retrieval efficiency, this paper proposes vector fusion by dimensionality reduction of GFCC features and rhythm features. The matching retrieval of audio features is an important task in music retrieval. In this paper, the DTW algorithm is chosen as the main algorithm for retrieving music. The classification retrieval of music is also achieved by the K-nearest neighbor algorithm. In this paper, after implementing the research and improvement of algorithms, these algorithms are integrated into the system to achieve audio preprocessing, feature extraction, feature postprocessing, and matching retrieval. This article uses 100 different kinds of MP3 format music as the music library and randomly selects 4 pieces each time, and it tests the system under different system parameters, recording duration, and environmental noise. Through the research of this paper, the efficiency of music retrieval is improved and theoretical support is provided for the design of music retrieval software integration system.


2021 ◽  
pp. 216770262110178
Author(s):  
Alex S. Cohen ◽  
Christopher R. Cox ◽  
Tovah Cowan ◽  
Michael D. Masucci ◽  
Thanh P. Le ◽  
...  

Negative schizotypal traits potentially can be digitally phenotyped using objective vocal analysis. Prior attempts have shown mixed success in this regard, potentially because acoustic analysis has relied on small, constrained feature sets. We employed machine learning to (a) optimize and cross-validate predictive models of self-reported negative schizotypy using a large acoustic feature set, (b) evaluate model performance as a function of sex and speaking task, (c) understand potential mechanisms underlying negative schizotypal traits by evaluating the key acoustic features within these models, and (d) examine model performance in its convergence with clinical symptoms and cognitive functioning. Accuracy was good (> 80%) and was improved by considering speaking task and sex. However, the features identified as most predictive of negative schizotypal traits were generally not considered critical to their conceptual definitions. Implications for validating and implementing digital phenotyping to understand and quantify negative schizotypy are discussed.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Alex S. Cohen ◽  
Christopher R. Cox ◽  
Thanh P. Le ◽  
Tovah Cowan ◽  
Michael D. Masucci ◽  
...  

Abstract Negative symptoms are a transdiagnostic feature of serious mental illness (SMI) that can be potentially “digitally phenotyped” using objective vocal analysis. In prior studies, vocal measures show low convergence with clinical ratings, potentially because analysis has used small, constrained acoustic feature sets. We sought to evaluate (1) whether clinically rated blunted vocal affect (BvA)/alogia could be accurately modelled using machine learning (ML) with a large feature set from two separate tasks (i.e., a 20-s “picture” and a 60-s “free-recall” task), (2) whether “Predicted” BvA/alogia (computed from the ML model) are associated with demographics, diagnosis, psychiatric symptoms, and cognitive/social functioning, and (3) which key vocal features are central to BvA/Alogia ratings. Accuracy was high (>90%) and was improved when computed separately by speaking task. ML scores were associated with poor cognitive performance and social functioning and were higher in patients with schizophrenia versus depression or mania diagnoses. However, the features identified as most predictive of BvA/Alogia were generally not considered critical to their operational definitions. Implications for validating and implementing digital phenotyping to reduce SMI burden are discussed.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8407
Author(s):  
Allan G. de Oliveira ◽  
Thiago M. Ventura ◽  
Todor D. Ganchev ◽  
Lucas N.S. Silva ◽  
Marinêz I. Marques ◽  
...  

Automated acoustic recognition of birds is considered an important technology in support of biodiversity monitoring and biodiversity conservation activities. These activities require processing large amounts of soundscape recordings. Typically, recordings are transformed to a number of acoustic features, and a machine learning method is used to build models and recognize the sound events of interest. The main problem is the scalability of data processing, either for developing models or for processing recordings made over long time periods. In those cases, the processing time and resources required might become prohibitive for the average user. To address this problem, we evaluated the applicability of three data reduction methods. These methods were applied to a series of acoustic feature vectors as an additional postprocessing step, which aims to reduce the computational demand during training. The experimental results obtained using Mel-frequency cepstral coefficients (MFCCs) and hidden Markov models (HMMs) support the finding that a reduction in training data by a factor of 10 does not significantly affect the recognition performance.


Sign in / Sign up

Export Citation Format

Share Document