Speaker detection using multi-speaker audio files for both enrollment and test

Author(s):  
J.-F. Bonastre ◽  
S. Meignier ◽  
T. Merlin
Author(s):  
IRMA SAFITRI ◽  
NUR IBRAHIM ◽  
HERLAMBANG YOGASWARA

ABSTRAKPenelitian ini mengembangkan teknik Compressive Sensing (CS) untuk audio watermarking dengan metode Lifting Wavelet Transform (LWT) dan Quantization Index Modulation (QIM). LWT adalah salah satu teknik mendekomposisi sinyal menjadi 2 sub-band, yaitu sub-band low dan high. QIM adalah suatu metode yang efisien secara komputasi atau perhitungan watermarking dengan menggunakan informasi tambahan. Audio watermarking dilakukan menggunakan file audio dengan format *.wav berdurasi 10 detik dan menggunakan 4 genre musik, yaitu pop, classic, rock, dan metal. Watermark yang disisipkan berupa citra hitam putih dengan format *.bmp yang masing-masing berukuran 32x32 dan 64x64 pixel. Pengujian dilakukan dengan mengukur nilai SNR, ODG, BER, dan PSNR. Audio yang telah disisipkan watermark, diuji ketahanannya dengan diberikan 7 macam serangan berupa LPF, BPF, HPF, MP3 compression, noise, dan echo. Penelitian ini memiliki hasil optimal dengan nilai SNR 85,32 dB, ODG -8,34x10-11, BER 0, dan PSNR ∞.Kata kunci: Audio watermarking, QIM, LWT, Compressive Sensing. ABSTRACTThis research developed Compressive Sensing (CS) technique for audio watermarking using Wavelet Transform (LWT) and Quantization Index Modulation (QIM) methods. LWT is one technique to decompose the signal into 2 sub-bands, namely sub-band low and high. QIM is a computationally efficient method or watermarking calculation using additional information. Audio watermarking was done using audio files with *.wav format duration of 10 seconds and used 4 genres of music, namely pop, classic, rock, and metal. Watermark was inserted in the form of black and white image with *.bmp format each measuring 32x32 and 64x64 pixels. The test was done by measuring the value of SNR, ODG, BER, and PSNR. Audio that had been inserted watermark was tested its durability with given 7 kinds of attacks such as LPF, BPF, HPF, MP3 Compression, Noise, and Echo. This research had optimal result with SNR value of 85.32 dB, ODG value of -8.34x10-11, BER value of 0, and PSNR value of ∞.Keywords: Audio watermarking, QIM, LWT, Compressive Sensing.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 27-27
Author(s):  
Ricardo V Ventura ◽  
Rafael Z Lopes ◽  
Lucas T Andrietta ◽  
Fernando Bussiman ◽  
Julio Balieiro ◽  
...  

Abstract The Brazilian gaited horse industry is growing steadily, even after a recession period that affected different economic sectors in the whole country. Recent numbers suggested an increase on the exports, which reveals the relevance of this horse market segment. Horses are classified according to the gait criteria, which divide the horses in two groups associated with the animal movements: lateral (Marcha Picada) or diagonal (Marcha_Batida). These two gait groups usually show remarkable differences related to speed and number of steps per fixed unit of time, among other factors. Audio retrieval refers to the process of information extraction obtained from audio signals. This new data analysis area, in comparison to traditional methods to evaluate and classify gait types (as, for example, human subjective evaluation and video monitoring), provides a potential method to collect phenotypes in a reduced cost manner. Audio files (n = 80) were obtained after extracting audio features from freely available YouTube videos. Videos were manually labeled according to the two gait groups (Marcha Picada or Marcha Batida) and thirty animals were used after a quality control filter step. This study aimed to investigate different metrics associated with audio signal processing, in order to first cluster animals according to the gait type and subsequently include additional traits that could be useful to improve accuracy during the identification of genetically superior animals. Twenty-eight metrics, based on frequency or physical audio aspects, were carried out individually or in groups of relative importance to perform Principal Component Analysis (PCA), as well as to describe the two gait types. The PCA results indicated that over 87% of the animals were correctly clustered. Challenges regarding environmental interferences and noises must be further investigated. These first findings suggest that audio information retrieval could potentially be implemented in animal breeding programs, aiming to improve horse gait.


2021 ◽  
Vol 11 (8) ◽  
pp. 3397
Author(s):  
Gustavo Assunção ◽  
Nuno Gonçalves ◽  
Paulo Menezes

Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.


2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Gerry Mshana ◽  
Zaina Mchome ◽  
Diana Aloyce ◽  
Esther Peter ◽  
Saidi Kapiga ◽  
...  

Abstract Background COVID-19 has caused worldwide fear and uncertainty. Historically, the biomedical disease paradigm established its dominance in tackling emerging infectious illnesses mainly due to innovation in medication and advances in technology. Traditional and religious remedies have emerged as plausible options for prevention and treatment of COVID-19, especially in Africa and Asia. The appeal of religious and traditional therapies against COVID-19 in the African setting must be understood within the historical, social, and political context. This study explored how women and community members dealt with suspected symptoms of COVID-19 in Mwanza, Tanzania. Methods This study was conducted in Nyamagana and Ilemela districts of Mwanza, Tanzania, between July and August 2020. We conducted 18 mobile phone in-depth interviews with a purposively selected sample of women aged 27–57 years participating in an existing longitudinal study. For safety reasons, smart mobile phones were used to collect the data. Each interview was audio recorded after obtaining verbal consent from the participants. The audio files were transferred to computers for analysis. Four researchers conducted a multistage, inductive analysis of the data. Results Participants reported wide use and perceived high efficacy of traditional remedies and prayer to prevent and treat suspected symptoms of COVID-19. Use was either alone or combined with public health recommendations such as hand washing and crowd avoidance. Despite acknowledging that a pathogen causes COVID-19, participants attested to the relevance and power of traditional herbal medication and prayer to curb COVID-19. Four main factors underline the symbolic efficacy of the traditional and religious treatment paradigms: personal, communal, and official reinforcement of their efficacy; connection to local knowledge and belief systems; the failure of biomedicine to offer a quick and effective solution; and availability. Conclusions In the context of emerging contagious illnesses, communities turn to resilient and trusted treatment paradigms to quell fear and embrace hope. To tackle emerging infections effectively, it is essential to engage the broader sociopolitical landscape, including communal considerations of therapeutic efficacy.


Author(s):  
Adrienne de Ruiter

AbstractDeepfake technology presents significant ethical challenges. The ability to produce realistic looking and sounding video or audio files of people doing or saying things they did not do or say brings with it unprecedented opportunities for deception. The literature that addresses the ethical implications of deepfakes raises concerns about their potential use for blackmail, intimidation, and sabotage, ideological influencing, and incitement to violence as well as broader implications for trust and accountability. While this literature importantly identifies and signals the potentially far-reaching consequences, less attention is paid to the moral dimensions of deepfake technology and deepfakes themselves. This article will help fill this gap by analysing whether deepfake technology and deepfakes are intrinsically morally wrong, and if so, why. The main argument is that deepfake technology and deepfakes are morally suspect, but not inherently morally wrong. Three factors are central to determining whether a deepfake is morally problematic: (i) whether the deepfaked person(s) would object to the way in which they are represented; (ii) whether the deepfake deceives viewers; and (iii) the intent with which the deepfake was created. The most distinctive aspect that renders deepfakes morally wrong is when they use digital data representing the image and/or voice of persons to portray them in ways in which they would be unwilling to be portrayed. Since our image and voice are closely linked to our identity, protection against the manipulation of hyper-realistic digital representations of our image and voice should be considered a fundamental moral right in the age of deepfakes.


2017 ◽  
Vol 10 (2) ◽  
pp. 156
Author(s):  
Sahar Jalilian ◽  
Rouhollah Rahmatian ◽  
Parivash Safa ◽  
Roya Letafati

In a simultaneous bilingual education, there are many factors that can affect its success, primarily the age of the child and socio-cognitive elements. This phenomenon can be initially studied in the first lexical productions of either language in a child. The present study focuses on the early lexical developments of a child, who lives in the monolingual society of Iran, where there is no linguistic milieu for French, and has been exposed to a bilingual education since birth. Applying Ronjat’s principle of “one parent-one language” (1913), the parents have formed the child’s basic linguistic interactions; the father employs Farsi in his interactions with the child as his mother tongue while the mother uses French as her foreign language. The data is collected from audio files recorded in the period between 18 and 36 months old of the child, containing her everyday interactions with her parents. Through the analysis of the data with the purpose of studying the changes of the presence of the minority language words, i.e. French, in the child’s sentences at different ages, questions are raised regarding the conditions of a persistent presence of both languages and the reason due to which one language positions as a minor means of communication, observing parental attitudes and environmental issues that can influence the language acquisition procedure.


Author(s):  
Jingyang Wang ◽  
Min Huang ◽  
Huiyong Wang ◽  
Xiaohong Wang ◽  
Pingmian Kou
Keyword(s):  

2013 ◽  
Vol 380-384 ◽  
pp. 3129-3132
Author(s):  
Ying Zhang

In order to help the transportation enterprises strengthen their competitiveness, this study proposed a digital photo frame which based on the ARM9 family processor chip S3C2440, Linux operating system and Qt/Embedded, trying to make it different from the traditional frames in terms of function, capacity and operability. It focuses on the hardware circuit and the pivotal part of software which can successfully realize playing many formats of photos like magic lantern by the touch screen and audio files. The result of the experiment shows that the digital photo frame designed in the thesis works steadily, and it is easy to operate as well as it has strong expansibility. By adding some applications to the digital photo frame, it can complete many other functions.


2021 ◽  
Author(s):  
Rodney H. Jones ◽  
Christiana Themistocleous

This accessible and entertaining textbook introduces students to both traditional and more contemporary approaches to sociolinguistics in a real-world context, addressing current social problems that students are likely to care about, such as racism, inequality, political conflict, belonging, and issues around gender and sexuality. Each chapter includes exercises, case studies and ideas for small-scale research projects, encouraging students to think critically about the different theories and approaches to language and society, and to interrogate their own beliefs about language and communication. The book gives students a grounding in the traditional concepts and techniques upon which sociolinguistics is built, while also introducing new developments from the last decade, such as translanguaging, multimodality, superdiversity, linguistic landscapes and language and digital media. Students will also have online access to more detailed examples, links to video and audio files, and more challenging exercises to strengthen their skills and confidence as sociolinguists.


Sign in / Sign up

Export Citation Format

Share Document