Perceptual evaluation of broadband physics-based speech sound synthesis with a 1D versus a 3D acoustic model

Perceptual evaluation of broadband physics-based speech sound synthesis with a 1D versus a 3D acoustic model https://opara.zih.tu-dresden.de/xmlui/handle/123456789/5914 Articulatory synthesis is a useful tool to explore the relationship between the speech production and perception processes. However, including the high frequencies (above about 5 kHz) requires a three-dimensional (3D) acoustical model for realistic simulations. In this frequency range, one-dimensional (1D) acoustic models fail to predict additional resonances and anti-resonances related to the 3D properties of the acoustic field. While articulatory synthesis based on 3D acoustic models is nowadays achievable for isolated phonemes, the impact of such models on the perception by human listeners remains largely unknown. In this study, it was first examined whether the high-frequency part of stimuli generated with 1D and 3D acoustic models can be differentiated in a pair comparison task. The results show that such differences can be discriminated, which is in line with recent findings showing that purely spectral cues can contribute to the perception of speech at high frequencies. A second perceptual experiment that consisted in rating the naturalness of the stimuli on a four-level Likert scale did not show any significant effect of the acoustic model. However, it highlighted differences of naturalness between the synthesized phonemes. 2026-06-10T17:56:47Z