Speaker detection using multi-speaker audio files for both enrollment and test

ABSTRAKPenelitian ini mengembangkan teknik Compressive Sensing (CS) untuk audio watermarking dengan metode Lifting Wavelet Transform (LWT) dan Quantization Index Modulation (QIM). LWT adalah salah satu teknik mendekomposisi sinyal menjadi 2 sub-band, yaitu sub-band low dan high. QIM adalah suatu metode yang efisien secara komputasi atau perhitungan watermarking dengan menggunakan informasi tambahan. Audio watermarking dilakukan menggunakan file audio dengan format *.wav berdurasi 10 detik dan menggunakan 4 genre musik, yaitu pop, classic, rock, dan metal. Watermark yang disisipkan berupa citra hitam putih dengan format *.bmp yang masing-masing berukuran 32x32 dan 64x64 pixel. Pengujian dilakukan dengan mengukur nilai SNR, ODG, BER, dan PSNR. Audio yang telah disisipkan watermark, diuji ketahanannya dengan diberikan 7 macam serangan berupa LPF, BPF, HPF, MP3 compression, noise, dan echo. Penelitian ini memiliki hasil optimal dengan nilai SNR 85,32 dB, ODG -8,34x10-11, BER 0, dan PSNR ∞.Kata kunci: Audio watermarking, QIM, LWT, Compressive Sensing. ABSTRACTThis research developed Compressive Sensing (CS) technique for audio watermarking using Wavelet Transform (LWT) and Quantization Index Modulation (QIM) methods. LWT is one technique to decompose the signal into 2 sub-bands, namely sub-band low and high. QIM is a computationally efficient method or watermarking calculation using additional information. Audio watermarking was done using audio files with *.wav format duration of 10 seconds and used 4 genres of music, namely pop, classic, rock, and metal. Watermark was inserted in the form of black and white image with *.bmp format each measuring 32x32 and 64x64 pixels. The test was done by measuring the value of SNR, ODG, BER, and PSNR. Audio that had been inserted watermark was tested its durability with given 7 kinds of attacks such as LPF, BPF, HPF, MP3 Compression, Noise, and Echo. This research had optimal result with SNR value of 85.32 dB, ODG value of -8.34x10-11, BER value of 0, and PSNR value of ∞.Keywords: Audio watermarking, QIM, LWT, Compressive Sensing.

Download Full-text

Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset

10.21437/interspeech.2020-2807 ◽

2020 ◽

Author(s):

Jack Deadman ◽

Jon Barker

Keyword(s):

Speaker Detection

Download Full-text

402 Audio information retrieval for describing gait patterns in Brazilian horses

Journal of Animal Science ◽

10.1093/jas/skaa278.048 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 27-27

Author(s):

Ricardo V Ventura ◽

Rafael Z Lopes ◽

Lucas T Andrietta ◽

Fernando Bussiman ◽

Julio Balieiro ◽

...

Keyword(s):

Information Retrieval ◽

Subjective Evaluation ◽

Audio Signal ◽

Principal Component ◽

Potential Method ◽

Economic Sectors ◽

Audio Features ◽

Horse Industry ◽

Audio Files ◽

Audio Information

Abstract The Brazilian gaited horse industry is growing steadily, even after a recession period that affected different economic sectors in the whole country. Recent numbers suggested an increase on the exports, which reveals the relevance of this horse market segment. Horses are classified according to the gait criteria, which divide the horses in two groups associated with the animal movements: lateral (Marcha Picada) or diagonal (Marcha_Batida). These two gait groups usually show remarkable differences related to speed and number of steps per fixed unit of time, among other factors. Audio retrieval refers to the process of information extraction obtained from audio signals. This new data analysis area, in comparison to traditional methods to evaluate and classify gait types (as, for example, human subjective evaluation and video monitoring), provides a potential method to collect phenotypes in a reduced cost manner. Audio files (n = 80) were obtained after extracting audio features from freely available YouTube videos. Videos were manually labeled according to the two gait groups (Marcha Picada or Marcha Batida) and thirty animals were used after a quality control filter step. This study aimed to investigate different metrics associated with audio signal processing, in order to first cluster animals according to the gait type and subsequently include additional traits that could be useful to improve accuracy during the identification of genetically superior animals. Twenty-eight metrics, based on frequency or physical audio aspects, were carried out individually or in groups of relative importance to perform Principal Component Analysis (PCA), as well as to describe the two gait types. The PCA results indicated that over 87% of the animals were correctly clustered. Challenges regarding environmental interferences and noises must be further investigated. These first findings suggest that audio information retrieval could potentially be implemented in animal breeding programs, aiming to improve horse gait.

Download Full-text

Bio-Inspired Modality Fusion for Active Speaker Detection

Applied Sciences ◽

10.3390/app11083397 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3397

Author(s):

Gustavo Assunção ◽

Nuno Gonçalves ◽

Paulo Menezes

Keyword(s):

Superior Colliculus ◽

Visual Information ◽

Human Beings ◽

Validation Process ◽

Detection Approach ◽

Wide Range ◽

Speaker Detection ◽

The One ◽

The Brain ◽

Fusion Ability

Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.

Download Full-text

Contested or complementary healing paradigms? Women’s narratives of COVID-19 remedies in Mwanza, Tanzania

Journal of Ethnobiology and Ethnomedicine ◽

10.1186/s13002-021-00457-w ◽

2021 ◽

Vol 17 (1) ◽

Author(s):

Gerry Mshana ◽

Zaina Mchome ◽

Diana Aloyce ◽

Esther Peter ◽

Saidi Kapiga ◽

...

Keyword(s):

Belief Systems ◽

Emerging Infections ◽

Community Members ◽

Women's Narratives ◽

Traditional Remedies ◽

Traditional Therapies ◽

Audio Files ◽

Smart Mobile ◽

Depth Interviews ◽

African Setting

Abstract Background COVID-19 has caused worldwide fear and uncertainty. Historically, the biomedical disease paradigm established its dominance in tackling emerging infectious illnesses mainly due to innovation in medication and advances in technology. Traditional and religious remedies have emerged as plausible options for prevention and treatment of COVID-19, especially in Africa and Asia. The appeal of religious and traditional therapies against COVID-19 in the African setting must be understood within the historical, social, and political context. This study explored how women and community members dealt with suspected symptoms of COVID-19 in Mwanza, Tanzania. Methods This study was conducted in Nyamagana and Ilemela districts of Mwanza, Tanzania, between July and August 2020. We conducted 18 mobile phone in-depth interviews with a purposively selected sample of women aged 27–57 years participating in an existing longitudinal study. For safety reasons, smart mobile phones were used to collect the data. Each interview was audio recorded after obtaining verbal consent from the participants. The audio files were transferred to computers for analysis. Four researchers conducted a multistage, inductive analysis of the data. Results Participants reported wide use and perceived high efficacy of traditional remedies and prayer to prevent and treat suspected symptoms of COVID-19. Use was either alone or combined with public health recommendations such as hand washing and crowd avoidance. Despite acknowledging that a pathogen causes COVID-19, participants attested to the relevance and power of traditional herbal medication and prayer to curb COVID-19. Four main factors underline the symbolic efficacy of the traditional and religious treatment paradigms: personal, communal, and official reinforcement of their efficacy; connection to local knowledge and belief systems; the failure of biomedicine to offer a quick and effective solution; and availability. Conclusions In the context of emerging contagious illnesses, communities turn to resilient and trusted treatment paradigms to quell fear and embrace hope. To tackle emerging infections effectively, it is essential to engage the broader sociopolitical landscape, including communal considerations of therapeutic efficacy.

Download Full-text

The Distinct Wrong of Deepfakes

Philosophy & Technology ◽

10.1007/s13347-021-00459-2 ◽

2021 ◽

Author(s):

Adrienne de Ruiter

Keyword(s):

Digital Data ◽

Main Argument ◽

Ethical Challenges ◽

Ethical Implications ◽

Digital Representations ◽

Distinctive Aspect ◽

Moral Dimensions ◽

Audio Files ◽

Potential Use ◽

Identity Protection

AbstractDeepfake technology presents significant ethical challenges. The ability to produce realistic looking and sounding video or audio files of people doing or saying things they did not do or say brings with it unprecedented opportunities for deception. The literature that addresses the ethical implications of deepfakes raises concerns about their potential use for blackmail, intimidation, and sabotage, ideological influencing, and incitement to violence as well as broader implications for trust and accountability. While this literature importantly identifies and signals the potentially far-reaching consequences, less attention is paid to the moral dimensions of deepfake technology and deepfakes themselves. This article will help fill this gap by analysing whether deepfake technology and deepfakes are intrinsically morally wrong, and if so, why. The main argument is that deepfake technology and deepfakes are morally suspect, but not inherently morally wrong. Three factors are central to determining whether a deepfake is morally problematic: (i) whether the deepfaked person(s) would object to the way in which they are represented; (ii) whether the deepfake deceives viewers; and (iii) the intent with which the deepfake was created. The most distinctive aspect that renders deepfakes morally wrong is when they use digital data representing the image and/or voice of persons to portray them in ways in which they would be unwilling to be portrayed. Since our image and voice are closely linked to our identity, protection against the manipulation of hyper-realistic digital representations of our image and voice should be considered a fundamental moral right in the age of deepfakes.

Download Full-text

The French-Farsi Simultaneous Early Bilingualism in an Iranian Child—Study on the Regularity of the Presence of the Minority Language in the First Lexical Productions of a Bilingual Child

International Education Studies ◽

10.5539/ies.v10n2p156 ◽

2017 ◽

Vol 10 (2) ◽

pp. 156

Author(s):

Sahar Jalilian ◽

Rouhollah Rahmatian ◽

Parivash Safa ◽

Roya Letafati

Keyword(s):

Foreign Language ◽

Bilingual Education ◽

Mother Tongue ◽

Parental Attitudes ◽

Environmental Issues ◽

Minority Language ◽

Child Study ◽

Iranian Child ◽

Audio Files ◽

A Minor

In a simultaneous bilingual education, there are many factors that can affect its success, primarily the age of the child and socio-cognitive elements. This phenomenon can be initially studied in the first lexical productions of either language in a child. The present study focuses on the early lexical developments of a child, who lives in the monolingual society of Iran, where there is no linguistic milieu for French, and has been exposed to a bilingual education since birth. Applying Ronjat’s principle of “one parent-one language” (1913), the parents have formed the child’s basic linguistic interactions; the father employs Farsi in his interactions with the child as his mother tongue while the mother uses French as her foreign language. The data is collected from audio files recorded in the period between 18 and 36 months old of the child, containing her everyday interactions with her parents. Through the analysis of the data with the purpose of studying the changes of the presence of the minority language words, i.e. French, in the child’s sentences at different ages, questions are raised regarding the conditions of a persistent presence of both languages and the reason due to which one language positions as a minor means of communication, observing parental attitudes and environmental issues that can influence the language acquisition procedure.

Download Full-text

Research on A New Method for Mixing Play Audio Files Data

2010 International Conference on Internet Technology and Applications ◽

10.1109/itapp.2010.5566172 ◽

2010 ◽

Author(s):

Jingyang Wang ◽

Min Huang ◽

Huiyong Wang ◽

Xiaohong Wang ◽

Pingmian Kou

Keyword(s):

New Method ◽

Audio Files

Download Full-text

A Design of Embedded Digital Photo Frame

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.3129 ◽

2013 ◽

Vol 380-384 ◽

pp. 3129-3132

Author(s):

Ying Zhang

Keyword(s):

Operating System ◽

Touch Screen ◽

Digital Photo ◽

Audio Files ◽

Hardware Circuit ◽

Magic Lantern

In order to help the transportation enterprises strengthen their competitiveness, this study proposed a digital photo frame which based on the ARM9 family processor chip S3C2440, Linux operating system and Qt/Embedded, trying to make it different from the traditional frames in terms of function, capacity and operability. It focuses on the hardware circuit and the pivotal part of software which can successfully realize playing many formats of photos like magic lantern by the touch screen and audio files. The result of the experiment shows that the digital photo frame designed in the thesis works steadily, and it is easy to operate as well as it has strong expansibility. By adding some applications to the digital photo frame, it can complete many other functions.

Download Full-text

Introducing Language and Society

10.1017/9781108689922 ◽

2021 ◽

Author(s):

Rodney H. Jones ◽

Christiana Themistocleous

Keyword(s):

Digital Media ◽

Real World ◽

Political Conflict ◽

Gender And Sexuality ◽

Small Scale ◽

Research Projects ◽

New Developments ◽

Online Access ◽

Video And Audio ◽

Audio Files

This accessible and entertaining textbook introduces students to both traditional and more contemporary approaches to sociolinguistics in a real-world context, addressing current social problems that students are likely to care about, such as racism, inequality, political conflict, belonging, and issues around gender and sexuality. Each chapter includes exercises, case studies and ideas for small-scale research projects, encouraging students to think critically about the different theories and approaches to language and society, and to interrogate their own beliefs about language and communication. The book gives students a grounding in the traditional concepts and techniques upon which sociolinguistics is built, while also introducing new developments from the last decade, such as translanguaging, multimodality, superdiversity, linguistic landscapes and language and digital media. Students will also have online access to more detailed examples, links to video and audio files, and more challenging exercises to strengthen their skills and confidence as sociolinguists.

Download Full-text