On the Use of Wavelets for Voice Morphing Lucimar S. Vieira Rodrigo C. Guido Sylvio B. Junior∗ Fabrı́cio L. Sanchez Kim I. C. Sergio Márcio B. A. Guilherme Leonardo M. Souza Paulo C. Fantinato Regiane D. S. Bassi Bruno C. G. Amadio Everthon S. Fonseca United Group for Audio and Speech Processing, IFSC, USP 13560-970, São Carlos, SP home-page: http://speechlab.ifsc.usp.br E-mail: [email protected], [email protected], [email protected] [email protected], [email protected], [email protected] [email protected], [email protected], [email protected] [email protected], [email protected] ABSTRACT In this work, we presented a study on voice morphing[1] based on neural networks and wavelets. The tests have demonstrated that Daubechies’ wavelet[2] with support-size 30 presented the best results for the perceptual criterion. This result indicates that a maximally-flat frequency response close to the ideal response is important. Future work includes the construction of a matched wavelet filter-bank to improve the results. References [1] C.Orphanidou et al, ”Multiscale voice morphing”. Pre-print submitted to Pattern Recognition Letters, Elseiver, on September 2006, to appear in 2007. [2] P. S. Addison, ”The Illustrated wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medice and Finance”. Edinburg-UK: Institute of Physics Publishing, 2002. ∗ bolsista de mestrado CAPES DWT Haar Daubechies Daubechies Daubechies Daubechies Daubechies Daubechies Vaidyanathan Coiflet Coiflet Coiflet Coiflet Symmlet Symmlet Burt Adelson Beylkin support 2 4 6 8 10 20 30 24 6 12 18 30 8 16 6 18 M 5.2 5.8 6.0 6.1 6.5 6.9 7.0 7.0 6.0 6.0 6.1 6.6 6.0 6.0 5.9 5.9 S 0.75 1.0 1.0 1.1 1.0 0.97 0.90 1.1 1.05 1.04 1.05 1.01 1.10 1.05 1.01 1.02 Table 1: Results of the perceptual tests for converting the sentence sa1.wav of the directory /timit/test/dr1/mdab0 into sa1.wav of the directory /timit/test/dr1/mjsw0. The tests report the mean (M) and standard deviations (S) of the rates given by the volunteers participating in the tests. A rate of 0 means the morphed speech is not similar at all to the corresponding original one, while a rate of 10 means they are indistinguishable.