A Study on the Robustness of Pitch-Range Estimation from Brief Speech Segments

Pitch-range estimation from brief speech segments could bring benefits to many tasks like automatic speech recognition and speaker recognition. To estimate pitch range, previous studies have proposed to utilize deep-learning-based models with spectrum information as input. They demonstrated that such method works and could still achieve reliable estimation results when the speech segment is as brief as 300 ms. In this study, we evaluated the robustness of this method. We take the following scenarios into account: (1) a large number of training speakers; (2) different language backgrounds; and (3) monosyllabic utterances with different tones. Experimental results showed that: (1) The use of a large number of training speakers improved the estimation accuracies. (2) The mean absolute percentage error (MAPE) rate evaluated on the L2 speakers is similar to that on the native speakers. (3) Different tonal information will affect the LSTM-based model, but this influence is limited compared to the baseline method which calculates pitch-range targets from the distribution of [Formula: see text]0 values. These experimental results verified the efficiency of the LSTM-based pitch-range estimation method.

Download Full-text

Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method

EURASIP Journal on Advances in Signal Processing ◽

10.1186/1687-6180-2012-67 ◽

2012 ◽

Vol 2012 (1) ◽

Cited By ~ 6

Author(s):

Azar Mahmoodzadeh ◽

Hamid Reza Abutalebi ◽

Hamid Soltanian-Zadeh ◽

Hamid Sheikhzadeh

Keyword(s):

Frequency Domain ◽

Single Channel ◽

Modulation Frequency ◽

Estimation Method ◽

Speech Separation ◽

Range Estimation ◽

Pitch Range

Download Full-text

Improved 3D Stem Mapping Method and Elliptic Hypothesis-Based DBH Estimation from Terrestrial Laser Scanning Data

Remote Sensing ◽

10.3390/rs12030352 ◽

2020 ◽

Vol 12 (3) ◽

pp. 352 ◽

Cited By ~ 1

Author(s):

WenFang Ye ◽

Chuang Qian ◽

Jian Tang ◽

Hui Liu ◽

XiaoYun Fan ◽

...

Keyword(s):

High Precision ◽

Laser Scanning ◽

Forest Canopy ◽

Estimation Method ◽

Terrestrial Laser Scanning ◽

Growth Direction ◽

Relative Accuracy ◽

Experimental Results ◽

Individual Tree ◽

The Mean

The detailed structure information under the forest canopy is important for forestry surveying. As a high-precision environmental sensing and measurement method, terrestrial laser scanning (TLS) is widely used in high-precision forestry surveying. In TLS-based forestry surveys, stem-mapping, which is focused on detecting and extracting trunks, is one of the core data processing tasks and the basis for the subsequent calculation of tree attributes; one of the most basic attributes is the diameter at breast height (DBH). This article explores and improves the methods for stem mapping and DBH estimation from TLS data. Firstly, an improved 3D stem mapping algorithm considering the growth direction in random sample consistency (RANSAC) cylinder fitting is proposed to extract and fit the individual tree point cloud section. It constructs the hierarchical optimum cylinder of the trunk and introduces the growth direction into the establishment of the backbone buffer in the next layer. Experimental results show that it can effectively remove most of the branches and reduce the interference of the branches to the discrimination of trunks and improve the integrity of stem extraction by about 36%. Secondly, a robust least squares ellipse fitting method based on the elliptic hypothesis is proposed for DBH estimation. Experimental results show that the DBH estimation accuracy of the proposed estimation method is improved compared with other methods. The mean root mean squared error (RMSE) of the proposed estimation method is 1.14 cm, compared with other methods with a mean RMSE of 1.70, 2.03, and 2.14 cm. The mean relative accuracy of the proposed estimation method is 95.2%, compared with other methods with a mean relative accuracy of 92.9%, 91.9%, and 90.9%.

Download Full-text

A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments

2019 International Conference on Asian Language Processing (IALP) ◽

10.1109/ialp48816.2019.9037713 ◽

2019 ◽

Author(s):

Wenjie Peng ◽

Kaiqi Fu ◽

Wei Zhang ◽

Yanlu Xie ◽

Jinsong Zhang

Keyword(s):

Range Estimation ◽

Pitch Range ◽

Speech Segments

Download Full-text

When MOOD rhymes with ROAD

The Mental Lexicon ◽

10.1075/ml.4.3.01van ◽

2009 ◽

Vol 4 (3) ◽

pp. 303-335 ◽

Cited By ~ 2

Author(s):

Martin van Leerdam ◽

Anna M.T. Bosman ◽

Annette M.B. de Groot

Keyword(s):

Native Speakers ◽

English Word ◽

Error Rates ◽

Matching Task ◽

Phonological Coding ◽

Speech Segment ◽

Inconsistent Word ◽

Speech Segments ◽

Cross Language ◽

Selective Process

Three experiments investigated whether perception of a spelling-to-sound inconsistent word such as MOOD involves coding of inappropriate phonology caused by knowledge of enemy neighbors (e.g., BLOOD) in non-native speakers. In a new bimodal matching task, Dutch-English bilinguals judged the correspondence between a printed English word and a speech segment that was or was not the printed word’s rime. Evidence for coding of inappropriate phonology was obtained with trials in which the speech segment was derived from an English enemy neighbor. In such trials, error rates increased significantly relative to control trials. This effect was also found when speech segments were derived from Dutch enemy neighbors, which suggests inappropriate coding of cross-language phonology. These findings are consistent with a strong phonological theory of word perception (Frost, 1998), in which phonological coding is essentially a language non-selective process.

Download Full-text

Payload location for JPEG image steganography based on co-frequency sub-image filtering

International Journal of Distributed Sensor Networks ◽

10.1177/1550147719899569 ◽

2020 ◽

Vol 16 (1) ◽

pp. 155014771989956 ◽

Cited By ~ 1

Author(s):

Jie Wang ◽

Chunfang Yang ◽

Ping Wang ◽

Xiaofeng Song ◽

Jicang Lu

Keyword(s):

Discrete Cosine Transform ◽

Estimation Method ◽

Image Filtering ◽

Image Steganography ◽

Experimental Results ◽

Jpeg Image ◽

Image Estimation ◽

Location Method ◽

Transform Coefficients ◽

The Mean

In digital steganography, due to difficulties estimating the JPEG cover image, it is still very hard to accurately locate the hidden message embedded in a JPEG image. Therefore, this study proposes a payload location method for a category of pseudo-random scrambled JPEG image steganography. In order to estimate the quantized discrete cosine transform coefficients in the cover JPEG image, a cover JPEG image estimation method is proposed based on co-frequency sub-image filtering. The proposed payload location method defines a general residual, uses the estimated cover JPEG image to compute the residuals, and then employs the mean residuals of multiple stego images embedded along the same path to distinguish the stego positions. The proposed cover JPEG image estimation method constructs 64 co-frequency sub-images, and then filters the sub-image to estimate the cover JPEG image. Finally, using these methods, payload location algorithms are designed for two common JPEG image steganography algorithms: JSteg and F5. Experimental results show that the proposed location algorithms can effectively locate the stego positions in both JSteg and F5 steganography when the investigator possesses multiple stego images embedded along the same path. In addition, the location results can also be used to recover the steganography key to extract the embedded secret messages.

Download Full-text

Single channel speech separation with a frame-based pitch range estimation method in modulation frequency

2010 5th International Symposium on Telecommunications ◽

10.1109/istel.2010.5734097 ◽

2010 ◽

Cited By ~ 4

Author(s):

A. Mahmoodzadeh ◽

H. R. Abutalebi ◽

H. Soltanian-Zadeh ◽

H. Sheikhzadeh

Keyword(s):

Single Channel ◽

Modulation Frequency ◽

Estimation Method ◽

Speech Separation ◽

Range Estimation ◽

Pitch Range

Download Full-text

Actuation and dynamic mechanical characteristics of a core free flat dielectric electro-active polymer soft actuator

JOURNAL OF MECHANICAL ENGINEERING AND SCIENCES ◽

10.15282/jmes.14.4.2020.08.0582 ◽

2020 ◽

Vol 14 (4) ◽

pp. 7396-7404

Author(s):

Abdul Malek Abdul Wahab ◽

Emiliano Rustighi ◽

Zainudin A.

Keyword(s):

Theoretical Model ◽

Resonance Frequency ◽

Dynamic Testing ◽

Experimental Results ◽

Percentage Error ◽

Initial Tension ◽

Average Percentage ◽

Soft Actuator ◽

Average Percentage Error ◽

Mechanical Dynamics

Various complex shapes of dielectric electro-active polymer (DEAP) actuator have been promoted for several types of applications. In this study, the actuation and mechanical dynamics characteristics of a new core free flat DEAP soft actuator were investigated. This actuator was developed by Danfoss PolyPower. DC voltage of up to 2000 V was supplied for identifying the actuation characteristics of the actuator and compare with the existing formula. The operational frequency of the actuator was determined by dynamic testing. Then, the soft actuator has been modelled as a uniform bar rigidly fixed at one end and attached to mass at another end. Results from the theoretical model were compared with the experimental results. It was found that the deformation of the current actuator was quadratic proportional to the voltage supplied. It was found that experimental results and theory were not in good agreement for low and high voltage with average percentage error are 104% and 20.7%, respectively. The resonance frequency of the actuator was near 14 Hz. Mass of load added, inhomogeneity and initial tension significantly affected the resonance frequency of the soft actuator. The experimental results were consistent with the theoretical model at zero load. However, due to inhomogeneity, the frequency response function’s plot underlines a poor prediction where the theoretical calculation was far from experimental results as values of load increasing with the average percentage error 15.7%. Hence, it shows the proposed analytical procedure not suitable to provide accurate natural frequency for the DEAP soft actuator.

Download Full-text

Physiological pitch range estimation from a brief speech input: A study on a bilingual parallel speech corpus

10.21437/speechprosody.2020-196 ◽

2020 ◽

Author(s):

Wei Zhang ◽

Yanlu Xie ◽

Jinsong Zhang

Keyword(s):

Range Estimation ◽

Speech Corpus ◽

Pitch Range ◽

Speech Input

Download Full-text

Analyzing free variation with harmony – A case study of verb-cluster serialization

Zeitschrift für Sprachwissenschaft ◽

10.1515/zfs-2020-2020 ◽

2020 ◽

Vol 39 (3) ◽

pp. 407-437

Author(s):

Markus Bader

Keyword(s):

Native Speakers ◽

Initial Position ◽

Experimental Results ◽

Modal Verbs ◽

Weighted Constraints ◽

Corpus Data ◽

Frequency Relationship ◽

Zero Frequency ◽

Free Variation

Abstract In German, a verb selected by another verb normally precedes the selecting verb. Modal verbs in the perfect tense provide an exception to this generalization because they require the perfective auxiliary to occur in cluster-initial position according to prescriptive grammars. Bader and Schmid (2009b) have shown, however, that native speakers accept the auxiliary in all positions except the cluster-final one. Experimental results as well as corpus data indicate that verb cluster serialization is a case of free variation. I discuss how this variation can be accounted for, focusing on two mismatches between acceptability and frequency: First, slight acceptability advantages can turn into strong frequency advantages. Second, syntactic variants with basically zero frequency can still vary substantially in acceptability. These mismatches remain unaccounted for if acceptability is related to frequency on the level of whole sentence structures, as in Stochastic OT (Boersma and Hayes2001). However, when the acceptability-frequency relationship is modeled on the level of individual weighted constraints, using harmony as link (see Pater2009, for different harmony based frameworks), the two mismatches follow given appropriate linking assumptions.

Download Full-text

Behavior Modeling for a Beacon-Based Indoor Location System

Sensors ◽

10.3390/s21144839 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4839

Author(s):

Aritz Bilbao-Jayo ◽

Aitor Almeida ◽

Ilaria Sergi ◽

Teodoro Montanaro ◽

Luca Fasano ◽

...

Keyword(s):

Behavior Modeling ◽

Percentage Error ◽

Location Prediction ◽

Indoor Environments ◽

Prediction System ◽

Indoor Location ◽

The Mean ◽

Locating System ◽

Monitoring Devices ◽

Do So

In this work we performed a comparison between two different approaches to track a person in indoor environments using a locating system based on BLE technology with a smartphone and a smartwatch as monitoring devices. To do so, we provide the system architecture we designed and describe how the different elements of the proposed system interact with each other. Moreover, we have evaluated the system’s performance by computing the mean percentage error in the detection of the indoor position. Finally, we present a novel location prediction system based on neural embeddings, and a soft-attention mechanism, which is able to predict user’s next location with 67% accuracy.

Download Full-text