This study investigates the fusion of multiple formant-trajectory- and fundamental-frequency-trajectory-based (f0-trajectory-based) forensic-voice-comparison systems. Each system was based on tokens of a single phoneme: tokens of Chinese /ei1/, /ai2/, and /iau1/ (numbers indicate tones). Human-supervised formant-trajectory and f0-trajectory measurements were made on tokens from a database of recordings of 60 female speakers of Chinese. Discrete cosine transforms (DCT) were fitted to the trajectories and the DCT coefficients used to calculate likelihood ratios via the multivariate kernel density (MVKD) formula. The individual-phoneme systems were fused with each other and with a baseline mel-frequency cepstral-coefficient (MFCC) Gaussian-mixture-model universal-background-model (GMM-UBM). The latter made use of the entire speech-active portion of the recordings. Tests were conducted using high-quality recordings as nominal suspect samples and mobile-to-landline transmitted recordings as nominal offender samples. Fusion of the phoneme-systems with the baseline system via logistic regression did not lead to any substantial improvement in validity, and reliability deteriorated.

This content is only available via PDF.