A Proposed Speaker Recognition Method B Based on Long-Term Voice Features and Fuzzy Logic

2021 ◽  
Vol 39 (1B) ◽  
pp. 1-10
Author(s):  
Iman H. Hadi ◽  
Alia K. Abdul-Hassan

Speaker recognition depends on specific predefined steps. The most important steps are feature extraction and features matching. In addition, the category of the speaker voice features has an impact on the recognition process. The proposed speaker recognition makes use of biometric (voice) attributes to recognize the identity of the speaker. The long-term features were used such that maximum frequency, pitch and zero crossing rate (ZCR).  In features matching step, the fuzzy inner product was used between feature vectors to compute the matching value between a claimed speaker voice utterance and test voice utterances. The experiments implemented using (ELSDSR) data set. These experiments showed that the recognition accuracy is 100% when using text dependent speaker recognition.

Geofluids ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Dongsheng Wang ◽  
Jun Feng ◽  
Xinpeng Zhao ◽  
Yeping Bai ◽  
Yujie Wang ◽  
...  

It is difficult to form a method for recognizing the degree of infiltration of a tunnel lining. To solve this problem, we propose a recognition method by using a deep convolutional neural network. We carry out laboratory tests, prepare cement mortar specimens with different saturation levels, simulate different degrees of infiltration of tunnel concrete linings, and establish an infrared thermal image data set with different degrees of infiltration. Then, based on a deep learning method, the data set is trained using the Faster R-CNN+ResNet101 network, and a recognition model is established. The experiments show that the recognition model established by the deep learning method can be used to select cement mortar specimens with different degrees of infiltration by using an accurately minimized rectangular outer frame. This model shows that the classification recognition model for tunnel concrete lining infiltration established by the indoor experimental method has high recognition accuracy.


Author(s):  
Kai Zhao ◽  
Dan Wang

Aiming at the problem of low recognition rate in speech recognition methods, a speech recognition method in multi-layer perceptual network environment is proposed. In the multi-layer perceptual network environment, the speech signal is processed in the filter by using the transfer function of the filter. According to the framing process, the speech signal is windowed and framing processed to remove the silence segment of the speech signal. At the same time, the average energy of the speech signal is calculated and the zero crossing rate is calculated to extract the characteristics of the speech signal. By analyzing the principle of speech signal recognition, the process of speech recognition is designed, and the speech recognition in multi-layer perceptual network environment is realized. The experimental results show that the speech recognition method designed in this paper has good speech recognition performance


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4850
Author(s):  
Mingyu Gao ◽  
Chao Chen ◽  
Jie Shi ◽  
Chun Sing Lai ◽  
Yuxiang Yang ◽  
...  

Effective traffic sign recognition algorithms can assist drivers or automatic driving systems in detecting and recognizing traffic signs in real-time. This paper proposes a multiscale recognition method for traffic signs based on the Gaussian Mixture Model (GMM) and Category Quality Focal Loss (CQFL) to enhance recognition speed and recognition accuracy. Specifically, GMM is utilized to cluster the prior anchors, which are in favor of reducing the clustering error. Meanwhile, considering the most common issue in supervised learning (i.e., the imbalance of data set categories), the category proportion factor is introduced into Quality Focal Loss, which is referred to as CQFL. Furthermore, a five-scale recognition network with a prior anchor allocation strategy is designed for small target objects i.e., traffic sign recognition. Combining five existing tricks, the best speed and accuracy tradeoff on our data set (40.1% mAP and 15 FPS on a single 1080Ti GPU), can be achieved. The experimental results demonstrate that the proposed method is superior to the existing mainstream algorithms, in terms of recognition accuracy and recognition speed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiaoxiao Song ◽  
Xiangyun Qiao ◽  
Dongmei Hao ◽  
Lin Yang ◽  
Xiya Zhou ◽  
...  

AbstractUterine contraction (UC) is an essential clinical indicator in the progress of labour and delivery. Electrohysterogram (EHG) signals recorded on the abdomen of pregnant women reflect the uterine electrical activity. This study proposes a novel algorithm for automatic recognition of UCs with EHG signals to improve the accuracy of detecting UCs. EHG signals by electrodes, the tension of the abdominal wall by tocodynamometry (TOCO) and maternal perception were recorded simultaneously in 54 pregnant women. The zero-crossing rate (ZCR) of the EHG signal and its power were calculated to modulate the raw EHG signal and highlight the EHG bursts. Then the envelope was extracted from the modulated EHG for UC recognition. Besides, UC was also detected by the conventional TOCO signal. Taking maternal perception as a reference, the UCs recognized by EHG and TOCO were evaluated with the sensitivity, positive predictive value (PPV), and UC parameters. The results show that the sensitivity and PPV are 87.8% and 93.18% for EHG, and 84.04% and 90.89% for TOCO. EHG detected a larger number of UCs than TOCO, which is closer to maternal perception. The duration and frequency of UC obtained from EHG and TOCO were not significantly different (p > 0.05). In conclusion, the proposed UC recognition algorithm has high accuracy and simple calculation which could be used for real-time analysis of EHG signals and long-term monitoring of UCs.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xiaoying Shen ◽  
Chao Yuan

With the development of the live broadcast industry, security issues in the live broadcast process have become increasingly apparent. At present, the supervision of various live broadcast platforms is basically in a state of human supervision. Manpower supervision is mainly through user reporting and platform supervision measures. However, there are a large number of live broadcast rooms at the same time, and only relying on human supervision can no longer meet the monitoring needs of live broadcasts. Based on this situation, this study proposes a violation information recognition method of a live-broadcasting platform based on machine learning technology. By analyzing the similarities and differences between normal live broadcasts and violation live broadcasts, combined with the characteristics of violation image data, this study mainly detects human skin color and sensitive parts. A prominent feature of violation images is that they contain a large area of naked skin, and the ratio of the area of naked skin to the overall image area of the violation image will exceed the threshold. Skin color recognition plays a role in initial target positioning. The accuracy of skin color recognition is directly related to the recognition accuracy of the entire system, so skin color recognition is the most important part of violation information recognition. Although there are many effective skin color recognition technologies, the accuracy and stability of skin color recognition still need to be improved due to the influence of various external factors, such as light intensity, light source color, and physical equipment. When it is detected that the area of the skin color in the live screen exceeds the threshold, it is preliminarily determined to be a suspected violation video. In order to improve the recognition accuracy, it is necessary to detect sensitive parts of the suspected video. Naked female breasts are a very obvious feature in violation images. This study uses a chest feature extraction method to detect the chest in the image. When the recognition result is a violation image, it is determined that the live broadcast involves violation content. The machine learning algorithm is simple to implement, and the parameters are easy to adjust. The classifier training requires a short time and is suitable for live violation information recognition scenarios. The experimental results on the adopted data set show that the method used in this article can effectively detect videos with violation content. The recognition rate is as high as 85.98%, which is suitable for a real-life environment and has good practical significance.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Guowei Wang ◽  
Haiye Yu ◽  
Yuanyuan Sui

In order to solve the problem of accuracy and speed of disease identification in real-time spraying operation in maize field, an improved ResNet50 maize disease identification model was proposed. Firstly, this paper uses the Adam algorithm to optimize the model, adjusts the learning strategy through the inclined triangle learning rate, increases L2 regularization to reduce over fitting, and adopts exit strategy and ReLU incentive function. Then, the first convolution kernel of the ResNet50 model is modified into three 3 x 3 small convolution kernels. Finally, the ratio of training set to verification set is 3 : 1. Through experimental comparison, the recognition accuracy of the maize disease recognition model proposed in this paper is higher than that of other models. The image recognition accuracy in the data set is 98.52%, the image recognition accuracy in the farmland is 97.826%, and the average recognition speed is 204 ms, which meets the accuracy and speed requirements of maize field spraying operation and provides technical support for the research of maize field spraying equipment.


2020 ◽  
Vol 64 (4) ◽  
pp. 40404-1-40404-16
Author(s):  
I.-J. Ding ◽  
C.-M. Ruan

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.


Sign in / Sign up

Export Citation Format

Share Document