Experiments were carried out to investigate the correlation between the perceptual and physical space of 11 vowel sounds. The signals were single periods out of the constant vowel part of normally spoken words of the type h (vowel) t, generated continuously by computer. Pitch, loudness, onset, and duration were equalized. These signals were presented to 15 subjects in a triadic‐comparison procedure, resulting in a cumulative similarity matrix. Multidimensional scaling (Kruskal) of this matrix resulted in a three‐dimensional perceptual space with 1.6% stress. The signals were also analyzed physically with 13oct band filters. Principal‐components analysis of the decibel values per frequency band indicated that three dimensions accounted for 81.7% of the total variance. Matching the perceptual and the physical configurations to maximal congruence yielded an excellent result with correlation coefficients of 0.992, 0.971, and 0.742 along the corresponding dimensions. The formant frequencies and levels were correlated also with both configurations.

