On the Use of Wavelets for Voice Morphing
Lucimar S. Vieira
Rodrigo C. Guido
Sylvio B. Junior∗
Fabrı́cio L. Sanchez
Kim I. C. Sergio
Márcio B. A. Guilherme
Leonardo M. Souza
Paulo C. Fantinato
Regiane D. S. Bassi
Bruno C. G. Amadio
Everthon S. Fonseca
United Group for Audio and Speech Processing, IFSC, USP
13560-970, São Carlos, SP
home-page: http://speechlab.ifsc.usp.br
E-mail: [email protected], [email protected], [email protected]
[email protected], [email protected], [email protected]
[email protected], [email protected], [email protected]
[email protected], [email protected]
ABSTRACT
In this work, we presented a study on
voice morphing[1] based on neural networks
and wavelets. The tests have demonstrated
that Daubechies’ wavelet[2] with support-size
30 presented the best results for the perceptual criterion. This result indicates that a
maximally-flat frequency response close to the
ideal response is important. Future work includes the construction of a matched wavelet
filter-bank to improve the results.
References
[1] C.Orphanidou et al, ”Multiscale voice
morphing”. Pre-print submitted to Pattern Recognition Letters, Elseiver, on
September 2006, to appear in 2007.
[2] P. S. Addison, ”The Illustrated wavelet
Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medice and Finance”. Edinburg-UK:
Institute of Physics Publishing, 2002.
∗
bolsista de mestrado CAPES
DWT
Haar
Daubechies
Daubechies
Daubechies
Daubechies
Daubechies
Daubechies
Vaidyanathan
Coiflet
Coiflet
Coiflet
Coiflet
Symmlet
Symmlet
Burt Adelson
Beylkin
support
2
4
6
8
10
20
30
24
6
12
18
30
8
16
6
18
M
5.2
5.8
6.0
6.1
6.5
6.9
7.0
7.0
6.0
6.0
6.1
6.6
6.0
6.0
5.9
5.9
S
0.75
1.0
1.0
1.1
1.0
0.97
0.90
1.1
1.05
1.04
1.05
1.01
1.10
1.05
1.01
1.02
Table 1: Results of the perceptual tests for converting the sentence sa1.wav of the directory
/timit/test/dr1/mdab0 into sa1.wav of the directory /timit/test/dr1/mjsw0. The tests report the mean (M) and standard deviations (S)
of the rates given by the volunteers participating in the tests. A rate of 0 means the morphed
speech is not similar at all to the corresponding
original one, while a rate of 10 means they are
indistinguishable.