scholarly journals Design of Audio Embedding and De-embedding for 3G-SDI Based on FPGA

2018 ◽  
Vol 173 ◽  
pp. 03021
Author(s):  
Yaqing Liu ◽  
Lunhui Deng

This design introduces the theoretical basis of digital audio embedding and de-embedding, and proposes a solution that Verilog language can be used to achieve 3G-SDI audio embedding and de-embedding. SDI video and audio data are input to the FPGA, and the audio signals can be embedded in the SDI line blanking after processing. Moreover, some auxiliary information is embedded in the SDI data, when you need these auxiliary information, you need to use the audio de-embedding process. The process of audio de-embedding is inversed with the process of embedding. It has been proved through practice that this scheme can effectively embed digital audio in SDI data stream, synchronize audio and video data, and can de-embed audio signal. The design is very versatile and can improve the efficiency of the design, thus effectively reducing the cost of the product.

Author(s):  
Andreas M. Kist ◽  
Pablo Gómez ◽  
Denis Dubrovskiy ◽  
Patrick Schlegel ◽  
Melda Kunduk ◽  
...  

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533


Author(s):  
Michael Odzer ◽  
Kristina Francke

Abstract The sound of waves breaking on shore, or against an obstruction or jetty, is an immediately recognizable sound pattern which could potentially be employed by a sensor system to identify obstructions. If frequency patterns produced by breaking waves can be reproduced and mapped in a laboratory setting, a foundational understanding of the physics behind this process could be established, which could then be employed in sensor development for navigation. This study explores whether wave-breaking frequencies correlate with the physics behind the collapsing of the wave, and whether frequencies of breaking waves recorded in a laboratory tank will follow the same pattern as frequencies produced by ocean waves breaking on a beach. An artificial “beach” was engineered to replicate breaking waves inside a laboratory wave tank. Video and audio recordings of waves breaking in the tank were obtained, and audio of ocean waves breaking on the shoreline was recorded. The audio data was analysed in frequency charts. The video data was evaluated to correlate bubble sizes to frequencies produced by the waves. The results supported the hypothesis that frequencies produced by breaking waves in the wave tank followed the same pattern as those produced by ocean waves. Analysis utilizing a solution to the Rayleigh-Plesset equation showed that the bubble sizes produced by breaking waves were inversely related to the pattern of frequencies. This pattern can be reproduced in a controlled laboratory environment and extrapolated for use in developing navigational sensors for potential applications in marine navigation such as for use with autonomous ocean vehicles.


Author(s):  
Paul McIlvenny

Consumer versions of the passive 360° and stereoscopic omni-directional camera have recently come to market, generating new possibilities for qualitative video data collection. This paper discusses some of the methodological issues raised by collecting, manipulating and analysing complex video data recorded with 360° cameras and ambisonic microphones. It also reports on the development of a simple, yet powerful prototype to support focused engagement with such 360° recordings of a scene. The paper proposes that we ‘inhabit’ video through a tangible interface in virtual reality (VR) in order to explore complex spatial video and audio recordings of a single scene in which social interaction took place. The prototype is a software package called AVA360VR (‘Annotate, Visualise, Analyse 360° video in VR’). The paper is illustrated through a number of video clips, including a composite video of raw and semi-processed multi-cam recordings, a 360° video with spatial audio, a video comprising a sequence of static 360° screenshots of the AVA360VR interface, and a video comprising several screen capture clips of actual use of the tool. The paper discusses the prototype’s development and its analytical possibilities when inhabiting spatial video and audio footage as a complementary mode of re-presenting, engaging with, sharing and collaborating on interactional video data.


2015 ◽  
Vol 743 ◽  
pp. 355-358
Author(s):  
Mang Zhou

This paper describes a audio system for high-definition(HD)digital audio decoding solution. The system is well aligned to decoding audio data stream Dolby True HD,DTS HDMaster, and support advanced audio processing Pro Logic® IIx, DTS Neo6. Discuss focuses on system construction and management of audio process by host MUC.


2008 ◽  
Vol 18 (06) ◽  
pp. 481-489 ◽  
Author(s):  
COLIN FYFE ◽  
WESAM BARBAKH ◽  
WEI CHUAN OOI ◽  
HANSEOK KO

We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).1 But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts.2 We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense with the probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. This leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map.


2021 ◽  
Vol 11 (15) ◽  
pp. 7092
Author(s):  
Federica Bressan ◽  
Valentina Burini ◽  
Edoardo Micheloni ◽  
Antonio Rodà ◽  
Richard L. Hess ◽  
...  

Audio carriers are subject to a fast and irreversible decay. In order to save valuable historical recordings, the audio signal and other relevant information can be extracted from the source audio document and stored on another medium, normally a redundant digital storage system. This procedure is called ’content transfer’. It is a costly and time-consuming procedure. There are several solutions with which the cost can be reduced. One consists of picking up all tracks from a two-sided tape in one pass. This means that some tracks will be digitized forward and some backwards, to be subsequently corrected in the digital workstation. This article is concerned with the question of whether reading tracks backwards introduces unwanted effects into the signal. In particular, it investigates whether a difference can be observed between audio signals read forward or backwards and, if so, whether the difference is measurable. The results show that a difference can be observed, yet this is not enough to conclude that this “backwards” approach should not be used. The complexity of the situation is presented in the discussion. Future work includes reproducing this experiment with different audio equipment, as well as a perception test with human subjects.


Author(s):  
Marcel Nikmon ◽  
Roman Budjač ◽  
Daniel Kuchár ◽  
Peter Schreiber ◽  
Dagmar Janáčová

Abstract Deep learning is a kind of machine learning, and machine learning is a kind of artificial intelligence. Machine learning depicts groups of various technologies, and deep learning is one of them. The use of deep learning is an integral part of the current data classification practice in today’s world. This paper introduces the possibilities of classification using convolutional networks. Experiments focused on audio and video data show different approaches to data classification. Most experiments use the well-known pre-trained AlexNet network with various pre-processing types of input data. However, there are also comparisons of other neural network architectures, and we also show the results of training on small and larger datasets. The paper comprises description of eight different kinds of experiments. Several training sessions were conducted in each experiment with different aspects that were monitored. The focus was put on the effect of batch size on the accuracy of deep learning, including many other parameters that affect deep learning [1].


Author(s):  
Rohit Thanki ◽  
Komal Borisagar

In this article, a watermarking scheme using Curvelet Transform with a combination of compressive sensing (CS) theory is proposed for the protection of a digital audio signal. The curvelet coefficients of the host audio signal are modified according to compressive sensing (CS) measurements of the watermarked data. The CS measurements of watermark data is generated using CS theory processes and sparse coefficients (wavelet coefficients of DCT coefficients). The proposed scheme can be employed for both audio and speech watermarking. The gray scale watermark image is inserted into the host digital audio signal when the proposed scheme is used for audio watermarking. The speech signal is inserted into the host digital audio signal when the proposed scheme is employed for speech watermarking. The experimental results show that proposed scheme performs better than the existing watermarking schemes in terms of perceptual transparency.


1988 ◽  
Vol 32 (4) ◽  
pp. 229-231
Author(s):  
Lawrence M. Paul

This talk explores the limits and costs of the visual-auditory asynchrony that occurs in video teleconferencing systems using separate transmission paths for the video and audio signals. After a brief review of video teleconferencing, the special problem of asynchrony in two-path systems is developed, and the small quantity of directly applicable research is reviewed. The two human factors questions which needed to be answered were: 1) What are the “just tolerable” limits of asynchrony?, and 2) What is the cost in terms of misperceptions of living with asynchrony? The experiment had nine participants determine their “just tolerable” asynchrony limits with video first and with audio first, and their “perfect synchrony” point. The average “just tolerable” limit with video preceding audio was 104 msec with a small variability. Very surprisingly, the “just tolerable” limit with audio first was at least 160 msec. Thus, common wisdom not withstanding, it is apparently easier to live with the audio preceding the video. Research is currently underway to measure the cost of living with asynchrony. MacDonald and McGurk, 1978, found that particular combinations of spoken and seen syllables led to the perception of completely different syllables. The present research extends MacDonald and McGurk' work to word pairs with first syllables from the corresponding special syllable pairs to determine if living with asynchrony necessarily means living with misperceptions in addition to just simple “annoyance”.


2010 ◽  
Vol 19 (07) ◽  
pp. 1399-1421
Author(s):  
MOJTABA MAHDAVI ◽  
SHADROKH SAMAVI ◽  
SORINA DUMITRESCU ◽  
FERESHTEH AALAMIFAR ◽  
PARISA ABEDIKHOOZANI

Data hiding in the LSB of audio signals is an appealing steganographic method. This is due to the large volume of real-time production and transmission of audio data which makes it difficult to store and analyze these signals. Hence, steganalysis of audio signals requires online operations. Most of the existing steganalysis methods work on stored media files. In this paper, we present a steganalysis technique that can detect the existence of embedded data in the least significant bits of natural audio samples. The algorithm is designed to be simple, accurate, and to be hardware implementable. Hence, hardware implementation is presented for the proposed algorithm. The proposed hardware analyzes the histogram of an incoming stream of audio signals by using a sliding window strategy without needing the storage of the signals. The algorithm is mathematically modeled to show its capability to accurately predict the amount of embedding in an incoming stream of audio signals. Audio files with different amounts of embedded data were used to test the algorithm and its hardware implementation. The experimental results prove the functionality and high accuracy of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document