A Study on the Robustness of Pitch-Range Estimation from Brief Speech Segments

2020 ◽  
Vol 30 (01) ◽  
pp. 2050003
Author(s):  
Wenjie Peng ◽  
Kaiqi Fu ◽  
Wei Zhang ◽  
Yanlu Xie ◽  
Jinsong Zhang

Pitch-range estimation from brief speech segments could bring benefits to many tasks like automatic speech recognition and speaker recognition. To estimate pitch range, previous studies have proposed to utilize deep-learning-based models with spectrum information as input. They demonstrated that such method works and could still achieve reliable estimation results when the speech segment is as brief as 300 ms. In this study, we evaluated the robustness of this method. We take the following scenarios into account: (1) a large number of training speakers; (2) different language backgrounds; and (3) monosyllabic utterances with different tones. Experimental results showed that: (1) The use of a large number of training speakers improved the estimation accuracies. (2) The mean absolute percentage error (MAPE) rate evaluated on the L2 speakers is similar to that on the native speakers. (3) Different tonal information will affect the LSTM-based model, but this influence is limited compared to the baseline method which calculates pitch-range targets from the distribution of [Formula: see text]0 values. These experimental results verified the efficiency of the LSTM-based pitch-range estimation method.

2020 ◽  
Vol 12 (3) ◽  
pp. 352 ◽  
Author(s):  
WenFang Ye ◽  
Chuang Qian ◽  
Jian Tang ◽  
Hui Liu ◽  
XiaoYun Fan ◽  
...  

The detailed structure information under the forest canopy is important for forestry surveying. As a high-precision environmental sensing and measurement method, terrestrial laser scanning (TLS) is widely used in high-precision forestry surveying. In TLS-based forestry surveys, stem-mapping, which is focused on detecting and extracting trunks, is one of the core data processing tasks and the basis for the subsequent calculation of tree attributes; one of the most basic attributes is the diameter at breast height (DBH). This article explores and improves the methods for stem mapping and DBH estimation from TLS data. Firstly, an improved 3D stem mapping algorithm considering the growth direction in random sample consistency (RANSAC) cylinder fitting is proposed to extract and fit the individual tree point cloud section. It constructs the hierarchical optimum cylinder of the trunk and introduces the growth direction into the establishment of the backbone buffer in the next layer. Experimental results show that it can effectively remove most of the branches and reduce the interference of the branches to the discrimination of trunks and improve the integrity of stem extraction by about 36%. Secondly, a robust least squares ellipse fitting method based on the elliptic hypothesis is proposed for DBH estimation. Experimental results show that the DBH estimation accuracy of the proposed estimation method is improved compared with other methods. The mean root mean squared error (RMSE) of the proposed estimation method is 1.14 cm, compared with other methods with a mean RMSE of 1.70, 2.03, and 2.14 cm. The mean relative accuracy of the proposed estimation method is 95.2%, compared with other methods with a mean relative accuracy of 92.9%, 91.9%, and 90.9%.


2009 ◽  
Vol 4 (3) ◽  
pp. 303-335 ◽  
Author(s):  
Martin van Leerdam ◽  
Anna M.T. Bosman ◽  
Annette M.B. de Groot

Three experiments investigated whether perception of a spelling-to-sound inconsistent word such as MOOD involves coding of inappropriate phonology caused by knowledge of enemy neighbors (e.g., BLOOD) in non-native speakers. In a new bimodal matching task, Dutch-English bilinguals judged the correspondence between a printed English word and a speech segment that was or was not the printed word’s rime. Evidence for coding of inappropriate phonology was obtained with trials in which the speech segment was derived from an English enemy neighbor. In such trials, error rates increased significantly relative to control trials. This effect was also found when speech segments were derived from Dutch enemy neighbors, which suggests inappropriate coding of cross-language phonology. These findings are consistent with a strong phonological theory of word perception (Frost, 1998), in which phonological coding is essentially a language non-selective process.


2020 ◽  
Vol 16 (1) ◽  
pp. 155014771989956 ◽  
Author(s):  
Jie Wang ◽  
Chunfang Yang ◽  
Ping Wang ◽  
Xiaofeng Song ◽  
Jicang Lu

In digital steganography, due to difficulties estimating the JPEG cover image, it is still very hard to accurately locate the hidden message embedded in a JPEG image. Therefore, this study proposes a payload location method for a category of pseudo-random scrambled JPEG image steganography. In order to estimate the quantized discrete cosine transform coefficients in the cover JPEG image, a cover JPEG image estimation method is proposed based on co-frequency sub-image filtering. The proposed payload location method defines a general residual, uses the estimated cover JPEG image to compute the residuals, and then employs the mean residuals of multiple stego images embedded along the same path to distinguish the stego positions. The proposed cover JPEG image estimation method constructs 64 co-frequency sub-images, and then filters the sub-image to estimate the cover JPEG image. Finally, using these methods, payload location algorithms are designed for two common JPEG image steganography algorithms: JSteg and F5. Experimental results show that the proposed location algorithms can effectively locate the stego positions in both JSteg and F5 steganography when the investigator possesses multiple stego images embedded along the same path. In addition, the location results can also be used to recover the steganography key to extract the embedded secret messages.


2020 ◽  
Vol 14 (4) ◽  
pp. 7396-7404
Author(s):  
Abdul Malek Abdul Wahab ◽  
Emiliano Rustighi ◽  
Zainudin A.

Various complex shapes of dielectric electro-active polymer (DEAP) actuator have been promoted for several types of applications. In this study, the actuation and mechanical dynamics characteristics of a new core free flat DEAP soft actuator were investigated. This actuator was developed by Danfoss PolyPower. DC voltage of up to 2000 V was supplied for identifying the actuation characteristics of the actuator and compare with the existing formula. The operational frequency of the actuator was determined by dynamic testing. Then, the soft actuator has been modelled as a uniform bar rigidly fixed at one end and attached to mass at another end. Results from the theoretical model were compared with the experimental results. It was found that the deformation of the current actuator was quadratic proportional to the voltage supplied. It was found that experimental results and theory were not in good agreement for low and high voltage with average percentage error are 104% and 20.7%, respectively. The resonance frequency of the actuator was near 14 Hz. Mass of load added, inhomogeneity and initial tension significantly affected the resonance frequency of the soft actuator. The experimental results were consistent with the theoretical model at zero load. However, due to inhomogeneity, the frequency response function’s plot underlines a poor prediction where the theoretical calculation was far from experimental results as values of load increasing with the average percentage error 15.7%. Hence, it shows the proposed analytical procedure not suitable to provide accurate natural frequency for the DEAP soft actuator.


2020 ◽  
Vol 39 (3) ◽  
pp. 407-437
Author(s):  
Markus Bader

Abstract In German, a verb selected by another verb normally precedes the selecting verb. Modal verbs in the perfect tense provide an exception to this generalization because they require the perfective auxiliary to occur in cluster-initial position according to prescriptive grammars. Bader and Schmid (2009b) have shown, however, that native speakers accept the auxiliary in all positions except the cluster-final one. Experimental results as well as corpus data indicate that verb cluster serialization is a case of free variation. I discuss how this variation can be accounted for, focusing on two mismatches between acceptability and frequency: First, slight acceptability advantages can turn into strong frequency advantages. Second, syntactic variants with basically zero frequency can still vary substantially in acceptability. These mismatches remain unaccounted for if acceptability is related to frequency on the level of whole sentence structures, as in Stochastic OT (Boersma and Hayes2001). However, when the acceptability-frequency relationship is modeled on the level of individual weighted constraints, using harmony as link (see Pater2009, for different harmony based frameworks), the two mismatches follow given appropriate linking assumptions.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4839
Author(s):  
Aritz Bilbao-Jayo ◽  
Aitor Almeida ◽  
Ilaria Sergi ◽  
Teodoro Montanaro ◽  
Luca Fasano ◽  
...  

In this work we performed a comparison between two different approaches to track a person in indoor environments using a locating system based on BLE technology with a smartphone and a smartwatch as monitoring devices. To do so, we provide the system architecture we designed and describe how the different elements of the proposed system interact with each other. Moreover, we have evaluated the system’s performance by computing the mean percentage error in the detection of the indoor position. Finally, we present a novel location prediction system based on neural embeddings, and a soft-attention mechanism, which is able to predict user’s next location with 67% accuracy.


Sign in / Sign up

Export Citation Format

Share Document