audio processing
Recently Published Documents


TOTAL DOCUMENTS

292
(FIVE YEARS 66)

H-INDEX

12
(FIVE YEARS 4)

2022 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Hai-Yan Yao ◽  
Wang-Gen Wan ◽  
Xiang Li

Analysis of pedestrians’ motion is important to real-world applications in public scenes. Due to the complex temporal and spatial factors, trajectory prediction is a challenging task. With the development of attention mechanism recently, transformer network has been successfully applied in natural language processing, computer vision, and audio processing. We propose an end-to-end transformer network embedded with random deviation queries for pedestrian trajectory forecasting. The self-correcting scheme can enhance the robustness of the network. Moreover, we present a co-training strategy to improve the training effect. The whole scheme is trained collaboratively by the original loss and classification loss. Therefore, we also achieve more accurate prediction results. Experimental results on several datasets indicate the validity and robustness of the network. We achieve the best performance in individual forecasting and comparable results in social forecasting. Encouragingly, our approach achieves a new state of the art on the Hotel and Zara2 datasets compared with the social-based and individual-based approaches.


2021 ◽  
Vol 3 (2) ◽  
pp. 72-86
Author(s):  
Junseo Cha ◽  
Seong Hee Choi ◽  
Chul-Hee Choi

Introduction. The traditional way of facilitating a good singing voice has been achieved through rigorous voice training. In the modern days, however, there are some aspects of the singing voice that can be enhanced through digital processing. Although in the past, the frequency or intensity manipulations had to be achieved through the various singing techniques of the singer, technology today allows the singing voice to be enhanced from the instruments within recording studios. In essence, the traditional voice pedagogy and the evolution of digital audio processing both strive to achieve a better quality of the singing voice, but with different methods. Nevertheless, the major aspects of how the singing voice can be manipulated are not communicated among the professionals in each field. Objective. This paper offers insights as to how the quality of the singing voice can be changed physiologically through the traditional ways of voice training, and also digitally through various instruments that are now available in recording studios. Reflection. The ways in which singers train their voice must be mediated with the audio technology that is available today. Although there are aspects in which the digital technology can aid the singer’s voice, there remain areas in which the singers must train their singing system in a physiological level to produce a better singing voice.


Author(s):  
Hesheng Li

This paper integrates wavelet sound wave analysis with a fuzzy control method to develop a stage phobia analysis system for vocal performers in order to enhance the psychological efficiency of vocal performers and reduce the effect of stage phobia on vocal performance. To achieve howling signal filtering, the frequency sub-band with howling is reversed and then superimposed with the original signal in audio processing. Furthermore, this paper incorporates the actual requirements for processing the vocal audio spectrum and builds the corresponding practical modules. Furthermore, this paper integrates the research needs of vocal performers’ stage phobia to create system function modules, and investigates the psychological activities of vocal performers using the fuzzy control system, discovers the factors that influence stage performances, and improves the psychological output of vocal performers. Finally, this paper proposes experiments to test and evaluate the system’s results. The research findings indicate that the system described in this paper has a significant impact.


2021 ◽  
Vol 2113 (1) ◽  
pp. 012059
Author(s):  
Bin Liu ◽  
Yan Ren

Abstract This paper introduces a design scheme of laser array harp based on multi-dimensional wavelet transform and audio signal reconstruction. The green light beams from multiple high-power lasers simulate harp strings, use photoresistors as the signal receiving end, and use a signal conditioning system composed of analog circuits and LM393 comparators to collect and adjust the resistance signal of the laser sensor[1], and finally it is adjusted to a level signal that can be recognized by the CPU. After receiving the signal, the CPU core board analyzes the string signal, and sends control commands to the audio processing system through the industrial bus according to the analyzed digital signal. After receiving the control command, the audio processing system uses the audio signal reconstruction technology composed of multi-dimensional wavelet packets, deep learning and other algorithms to simulate the audio signals of various string music, so as to achieve the purposes of using the lasers as virtual strings and imitating musical instruments for musical performance.[2]


2021 ◽  
Vol 4 (2) ◽  
pp. 119-126
Author(s):  
Made Agung Raharja ◽  
I Dewa Made Bayu Atmaja Darmawan

Cultural traditions from the life of the Balinese people have given birth to various types of things, ranging from dances, traditional clothing, music and traditional musical instruments. One of the gamelan instruments in Bali is Gerantang. Everyone does not have the ability to adjust the tone of the greantang blades, so that the process of making the bushes cannot be done by just anyone. In the field of sound / audio processing, there is a method called speech synthesis. One method that can be used in implementing music or tone synthesis is the Double Frequency Modulation (DFM) method. Tests that have been carried out in the synthesis process of gamelan grantang sound using the DFM method have been successfully carried out with a total of 55 test tone data and from 11 basic tones and frequencies of several synthetic sound experiments in the output column and in the results column show 10 output results are within tolerance limits frequency and 1 (one) tone out of tolerance. It was found that 10 tones that have been synthesized produce tones that have frequencies within the frequency tolerance limit with an accuracy of 90.9%


Author(s):  
Krishna Subramani ◽  
Paris Smaragdis
Keyword(s):  

Author(s):  
L. Merah ◽  
◽  
P. Lorenz ◽  
A. Ali-Pacha ◽  
N. Hadj-Said ◽  
...  

The enormous progress in communication technology has led to a tremendous need to provide an ideal environment for the transmission, storing, and processing of digital multimedia content, where the audio signal takes the lion's share of it. Audio processing covers many diverse fields, its main aim is presenting sound to human listeners. Recently, digital audio processing became an active research area, it covers everything from theory to practice in relation to transmission, compression, filtering, and adding special effects to an audio signal. The aim of this work is to present the real-time implementation steps of some audio effects namely, the echo and Flanger effects on Field Programmable Gate Array (FPGA). Today, FPGAs are the best choice in data processing because they provide more flexibility, performance, and huge processing capabilities with great power efficiency. Designs are achieved using the XSG tool (Xilinx System Generator), which makes complex designs easier without prior knowledge of hardware description languages. The paper is presented as a guide with deep technical details about designing and real-time implementation steps. We decided to transfer some experience to designers who want to rapidly prototype their ideas using tools such as XSG. All the designs have been simulated and verified under Simulink/Matlab environment, then exported to Xilinx ISE (Integrated Synthesis Environment) tool for the rest of the implementation steps. The paper also gives an idea of interfacing the FPGA with the LM4550 AC’97 codec using VHDL coding. The ATLYS development board based on Xilinx Spartan-6 LX45 FPGA is used for the real-time implementation.


Author(s):  
Ria Sinha

Abstract: This paper describes a digital assistant designed to help hearing-impaired people sense ambient sounds. The assistant relies on obtaining audio signals from the ambient environment of a hearing-impaired person. The audio signals are analysed by a machine learning model that uses spectral signatures as features to classify audio signals into audio categories (e.g., emergency, animal sounds, etc.) and specific audio types within the categories (e.g., ambulance siren, dog barking, etc.) and notify the user leveraging a mobile or wearable device. The user can configure active notification preferences and view historical logs. The machine learning classifier is periodically trained externally based on labeled audio sound samples. Additional system features include an audio amplification option and a speech to text option for transcribing human speech to text output. Keywords: assistive technology, sound classification, machine learning, audio processing, spectral fingerprinting


Sign in / Sign up

Export Citation Format

Share Document