Speaker Identification Algorithm Based on the Deep Learning for Phonetics Forensic Purposes using a Cochlear Simulation Spectrum

2021 ◽  
Vol 15 (4) ◽  
pp. 307-311
Author(s):  
Joo Young Kim ◽  
Bo Rum Nam ◽  
Myeong Su Kim ◽  
Jinkyoung Choi ◽  
Baek Hwan Cho ◽  
...  
2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xiaohua Shi ◽  
Kaicheng Tang ◽  
Hongtao Lu

PurposeBook sorting system is one of specific application in smart library scenarios, and it now has been widely used in most libraries based on RFID (radio-frequency identification devices) technology. Book identification processing is one of the core parts of a book sorting system, and the efficiency and accuracy of book identification are extremely critical to all libraries. In this paper, the authors propose a new image recognition method to identify books in libraries based on barcode decoding together with deep learning optical character recognition (OCR) and describe its application in library book identification processing.Design/methodology/approachThe identification process relies on recognition of the images or videos of the book cover moving on a conveyor belt. Barcode is printed on or attached to the surface of each book. Deep learning OCR program is applied to improve the accuracy of recognition, especially when the barcode is blurred or faded. The approach the authors proposed is robust with high accuracy and good performance, even though input pictures are not in high resolution and the book covers are not always vertical.FindingsThe proposed method with deep learning OCR achieves best accuracy in different vertical, skewed and blurred image conditions.Research limitations/implicationsMethods that the authors proposed need to cooperate and practice in different book sorting machine.Social implicationsThe authors collected more than 500 books from a library. These photos display the cover of more than 100 randomly picked books with backgrounds in different colors, each of which has about five different pictures captured from variety angles. The proposed method combines traditional barcode identification algorithm with the authors’ modification to locate and deskew the image. And deep learning OCR is involved to enhance the accuracy when the barcode is blurred or partly faded. Book sorting system design based on this method will also be introduced.Originality/valueExperiment demonstrates that the accuracy of the proposed method is high in real-time test and achieves good accuracy even when the barcode is blurred. Deep learning is very effective in analyzing image content, and a corresponding series of methods have been formed in video content understanding, which can be a greater advantage and play a role in the application scene of intelligent library.


Author(s):  
Pafan Doungpaisan ◽  
Anirach Mingkhwan

Search engine is the popular term for an information retrieval (IR) system. Typically, search engine can be based on full-text indexing. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%.


2021 ◽  
Vol 256 ◽  
pp. 02034
Author(s):  
Tong Jiang ◽  
Ruyu Bai

Aiming at the limitations of using a single feature for load identification, a non-intrusive load identification algorithm based on deep learning and compound features is proposed. The pixelated V-I trajectory characteristics and current harmonic characteristics are extracted by analyzing the load data under high-frequency sampling. Using the feature extraction capabilities of neural networks, the combination of pixelated V-I trajectory features and current harmonic features is realized. Finally, the composite feature is used as the new load feature to train the neural network for non-invasive load identification. The experimental results show that the two-layer neural network constructed by the algorithm can take advantage of the complementarity between the two features, thereby improving the load identification ability.


Whatever the modern achievement of deep learning for several terminology processing tasks, single-microphone, speaker-independent speech separation remains difficult for just two main things. The rest point is that the arbitrary arrangement of the goal and masker speakers in the combination (permutation problem), and also the following is the unidentified amount of speakers in the mix (output issue). We suggest a publication profound learning framework for speech modification, which handles both issues. We work with a neural network to project the specific time-frequency representation with the mixed-signal to a high-dimensional categorizing region. The time-frequency embeddings of the speaker have then made to an audience around corresponding attractor stage that is employed to figure out the time-frequency assignment with this speaker identifying a speaker using a blend of speakers together with the aid of neural networks employing deep learning. The purpose function for your machine is standard sign renovation error that allows finishing functioning throughout both evaluation and training periods. We assessed our system with all the voices of users three and two speaker mixes and also document similar or greater performance when compared with another advanced level, deep learning approaches for speech separation.


Sign in / Sign up

Export Citation Format

Share Document