auxiliary input
Recently Published Documents


TOTAL DOCUMENTS

76
(FIVE YEARS 15)

H-INDEX

15
(FIVE YEARS 2)

2021 ◽  
Vol 11 (18) ◽  
pp. 8321
Author(s):  
Zongming Liu ◽  
Zhihua Huang ◽  
Li Wang ◽  
Pengyuan Zhang

Vowel reduction is a common pronunciation phenomenon in stress-timed languages like English. Native speakers tend to weaken unstressed vowels into a schwa-like sound. It is an essential factor that makes the accent of language learners sound unnatural. To improve vowel reduction detection in a phoneme recognition framework, we propose an end-to-end vowel reduction detection method that introduces pronunciation prior knowledge as auxiliary information. In particular, we have designed two methods for automatically generating pronunciation prior sequences from reference texts and have implemented a main and auxiliary encoder structure that uses hierarchical attention mechanisms to utilize the pronunciation prior information and acoustic information dynamically. In addition, we also propose a method to realize the feature enhancement after encoding by using the attention mechanism between different streams to obtain expanded multi-streams. Compared with the HMM-DNN hybrid method and the general end-to-end method, the average F1 score of our approach for the two types of vowel reduction detection increased by 8.8% and 6.9%, respectively. The overall phoneme recognition rate increased by 5.8% and 5.0%, respectively. The experimental part further analyzes why the pronunciation prior knowledge auxiliary input is effective and the impact of different pronunciation prior knowledge types on performance.


Author(s):  
Jizhizi Li ◽  
Jing Zhang ◽  
Dacheng Tao

Automatic image matting (AIM) refers to estimating the soft foreground from an arbitrary natural image without any auxiliary input like trimap, which is useful for image editing. Prior methods try to learn semantic features to aid the matting process while being limited to images with salient opaque foregrounds such as humans and animals. In this paper, we investigate the difficulties when extending them to natural images with salient transparent/meticulous foregrounds or non-salient foregrounds. To address the problem, a novel end-to-end matting network is proposed, which can predict a generalized trimap for any image of the above types as a unified semantic representation. Simultaneously, the learned semantic features guide the matting network to focus on the transition areas via an attention mechanism. We also construct a test set AIM-500 that contains 500 diverse natural images covering all types along with manually labeled alpha mattes, making it feasible to benchmark the generalization ability of AIM models. Results of the experiments demonstrate that our network trained on available composite matting datasets outperforms existing methods both objectively and subjectively. The source code and dataset are available at https://github.com/JizhiziLi/AIM.


Author(s):  
Atsushi Ando ◽  
Takeshi Mori ◽  
Satoshi Kobashikawa ◽  
Tomoki Toda

This paper presents a novel speech emotion recognition scheme that leverages the individuality of emotion perception. Most conventional methods simply poll multiple listeners and directly model the majority decision as the perceived emotion. However, emotion perception varies with the listener, which forces the conventional methods with their single models to create complex mixtures of emotion perception criteria. In order to mitigate this problem, we propose a majority-voted emotion recognition framework that constructs listener-dependent (LD) emotion recognition models. The LD model can estimate not only listener-wise perceived emotion, but also majority decision by averaging the outputs of the multiple LD models. Three LD models, fine-tuning, auxiliary input, and sub-layer weighting, are introduced, all of which are inspired by successful domain-adaptation frameworks in various speech processing tasks. Experiments on two emotional speech datasets demonstrate that the proposed approach outperforms the conventional emotion recognition frameworks in not only majority-voted but also listener-wise perceived emotion recognition.


2020 ◽  
Vol 103 ◽  
pp. 104596
Author(s):  
Wenyu Xiong ◽  
Jie Ye ◽  
Qichangyi Gong ◽  
Han Feng ◽  
Jinbang Xu ◽  
...  

2020 ◽  
Vol 31 (05) ◽  
pp. 551-567
Author(s):  
Juyan Li ◽  
Chunguang Ma ◽  
Zhen Gu

Proxy Re-Encryption (PRE) is a cryptographic primitive that allows a proxy to turn an Alice’s ciphertext into a Bob’s ciphertext on the same plaintext. All of the PRE schemes are public key encryption and semantic security. Deterministic Public Key Encryption (D-PKE) provides an alternative to randomized public key encryption in various scenarios where the latter exhibits inherent drawbacks. In this paper, we construct the first multi-use unidirectional D-PRE scheme from Lattices in the auxiliary-input setting. We also prove that it is PRIV1-INDr secure in the standard model based on the LWR. Finally, an identity-based D-PRE is obtained from the basic construction.


Electronics ◽  
2019 ◽  
Vol 8 (12) ◽  
pp. 1524 ◽  
Author(s):  
Jau-Woei Perng ◽  
Tung-Li Hsieh

The main purpose of this study was to create an acousto-optic control lock device to convert electrical signals with a specific sound command using an acousto-optic conversion module, thereby improving the reliability and safety of opening or closing remote controlled door locks, such as car central locks or rolling doors. We used music playing through a smart phone speaker to create a special laser pointer to connect with the smart phone‘s auxiliary input. The laser pointer (wavelength of 630–650 nm and maximum output of 5 mw) lights up when the smart phone’s music starts playing at a music frequency matching the light frequency. When the solar panel receives light, it converts the frequency of the light signal into an electrical frequency signal. The current is amplified using the power amplifier and then the amplified current flows to the sound recognition module. The sound recognition module performs audio comparison on the set sound signal, and once the comparison is correct, the output voltage activates the electromagnetic switch on the door to open or close it.


Sign in / Sign up

Export Citation Format

Share Document