Robust Shape Estimation and Tracking in
the Presence of Clutter
Jacinto Carlos Marques Peixoto do Nascimento
(Mestre)
Dissertação para obtenção do grau de Doutor em
Engenharia Electrotécnica e de Computadores
Orientador:
Doutor Jorge dos Santos Salvador Marques
Júri
Presidente:
Reitor da Universidade Técnica de Lisboa
Vogais:
Doutor
Doutor
Doutor
Doutor
Doutor
Doutor
João Manuel Lage de Miranda Lemos
Jorge dos Santos Salvador Marques
José Alberto Rosado dos Santos Victor
Arnaldo Joaquim de Castro Abrantes
José Manuel Bioucas Dias
Gilles Celeux
Abril 2003
iii
Abstract
This thesis proposes robust methods for the estimation and tracking of objects boundaries in images.
There are several well known methods for contour estimation and tracking using deformable models.
However, these methods have strong limitations. Their performance is severely hampered in the presence
of outliers, i.e., image features detected which do not belong to the object boundary and this happens
in most of the practical applications.
The goal of this thesis is to improve the performance of existing methods in the presence of outliers.
This thesis proposes robust versions for three contour estimation algorithms: Snakes, Kalman tracker,
and Multi model Kalman tracker. The first one is a pioneering contour estimation algorithm for static
objects. The last two are tracking methods for the estimation of motion objects in image sequences.
The Kalman tracker is the most popular method for tracking deformable objects using active contours.
The Multi model tracker is a recent method and it is based on the use of multiple dynamic models
switched according to a Markov process, being useful to represent complex motions.
In this work we propose robust versions of the three methods using statistical models to represent
valid and invalid observations. The proposed algorithms share the following properties. First, they are
based on the use of middle level features (contour strokes) instead of low level ones (edge points) which
are used in the original methods. Second, the detected features are not all considered as valid, since we
know that some of them are outliers.
In this thesis, confidence degrees are assigned to each feature or a set of features, and the contour
estimates are based on these confidence degrees. Features with high confidence degrees have a large
influence on the shape estimates, while features with low confidence degrees have a negligible influence.
A set of tests are presented to evaluate the performance of the proposed algorithms in shape estimation and tracking. It is concluded that the performance of the proposed methods is much better than
the one obtained with the original algorithms.
Keywords: Shape analysis, Tracking, Robust estimation, Active contours, Adaptive Snakes, MultiModel tracking.
v
Resumo
A presente tese propõe métodos robustos para a estimação e seguimento da fronteira de objectos em
imagens. Há vários métodos conhecidos para a estimação e seguimento de contornos usando modelos
deformáveis. Esses métodos têm contudo limitações. Em particular, são sensı́veis à detecção de caracterı́sticas visuais na imagem que não pertençam ao contorno do objecto, situação que ocorre na maioria
dos problemas práticos de interesse.
O objectivo desta tese é melhorar o desempenho dos métodos existentes na presença de observações
incorrectas da fronteira do objecto de interesse. Propõem-se nesta tese versões robustas para três
algoritmos de estimação de contornos: Snakes, seguidor de Kalman e seguidor de Kalman com modelos
comutados. O primeiro é um algoritmo pioneiro de estimação do contorno de objectos estáticos. Os dois
últimos são métodos de seguimento de objectos em movimento em sequências de imagens. O seguidor
de Kalman é o método mais popular para seguimento de objectos deformáveis através de contornos
activos e o seguidor com modelos comutados é um método recente, baseado na utilização de múltiplos
modelos dinâmicos comutados, que é útil na representação de movimentos complexos.
Neste trabalho propõem-se versões robustas para os três métodos escolhidos usando modelos estatı́sticos para representar as observações válidas e inválidas. Os algoritmos propostos partilham aspectos comuns. Em primeiro, lugar baseiam-se no uso de caracterı́sticas visuais de nı́vel médio (segmentos
de contorno) em vez das caracterı́sticas de baixo nı́vel (pontos de contorno) utilizadas nos algoritmos
originais. Em segundo lugar, as caracterı́sticas detectadas não são consideradas como sendo todas
válidas, ao contrário do que é feito nos métodos clássicos de contornos activos. Nesta tese, atribuemse de graus de confiança a cada caracterı́stica ou grupos de caracterı́sticas e o contorno do objecto é
estimado tendo em consideração os graus de confiança atribuı́dos. Assim, observações com graus de confiança elevados têm influência elevada na estimação do contorno do objecto enquanto que observações
com baixo grau de confiança têm uma influência reduzida.
Os métodos propostos são avaliados através de testes experimentais de estimação e seguimento de
objectos deformáveis em sequências de vı́deo. Mostra-se que o desempenho dos métodos propostos é
muito superior ao dos métodos originais.
Palavras-chave: Análise de forma, Seguimento, Estimação robusta, Contornos Activos, Snakes adaptativas, Seguimento com múltiplos modelos.
Agradecimentos
Tenacidade, persistência e isolamento foram fundamentais para a realização deste trabalho. Desde o
seu inı́cio, sempre desejei ver “uma luz ao fundo do túnel”. Verifico que não encontrei a luz, pois
não estou num túnel. Percebo agora, que disponho duma oportunidade para dar os primeiros passos,
ambicionando num futuro, ter a sorte de encontrar um túnel...
As minhas primeiras palavras de gratidão, são para o Prof. Jorge Marques. Conheci-o pessoalmente
em Outubro de 95, tendo sido sempre uma pessoa rigorosa, exigente e, acima de tudo, disponı́vel para
imensas discussões que mantivemos ao longo destes anos, ensinando-me que a capacidade de trabalho não
tem horizontes. Confesso-me atraı́do pela sua capacidade de pesquisa, procura e descoberta cientı́fica.
Gostaria de agradecer ao Prof. João Sentieiro, director do ISR, e ao Prof. Victor Barroso, responsável
pelo Lab. de Processamento de Sinal, e a todos os colegas do ISR, pelo excelente ambiente de trabalho
que me proporcionaram durante a realização da tese.
Agradeço ao Prof. Bioucas pela leitura meticulosa da tese, e pelas sugestões que contribuı́ram para
a melhoria da qualidade do documento final.
Quero também agradecer ao Prof. Gilles Celeux, a oportunidade de visitar por duas vezes o INRIA
e de aı́ desenvolver trabalho cientı́fico e realizar agradáveis corridas no verdor de Grenoble. Foram sem
dúvida experiências enriquecedoras e inesquecı́veis para mim.
Ao meu colega João Sanches, pelo apoio que sempre me emprestou durante estes anos e pelas imagens
ecográficas que amavelmente me cedeu e que foram úteis para avaliar o desempenho dos algoritmos desta
tese.
Ao Arnaldo Abrantes pelas conversas frutuosas sobre modelos deformáveis e fusão de informação,
para além da amizade que sempre mantive.
Agradeço à Fundação para a Ciência e Tecnologia que financiou este Doutoramento através da bolsa
PRAXIS XXI/BD/15827/98, assim como as deslocações necessárias à participação de conferências
internacionais.
Num mundo não académico, mas sem sombra de dúvida o mais importante, gostaria de agradecer
à mana Lia, pelas magnı́ficas sequências de vı́deo, ao mano Zé, por muitos momentos partilhados,
juntamente com a Guida, Jonas, Ian, sem esquecer claro, a pequena Iara, que invadiu as nossas vidas
há bem pouco tempo, matizando-as com os seus intermináveis e constantes momentos infantis e de
viii
inquietude. As minhas últimas palavras vão para os meus pais, por terem tido paciência comigo, e por
fim aceitarem ter um filho que não “há meio de se formar”.
Bibliography
[1] A. Abrantes, J. S. Marques, A Class of Constrained Clustering Algorithms for Object Boundary
Detection, IEEE Trans. Image Processing, vol. 5, no. 11, pp. 1507-1521, 1996.
[2] A. Abrantes, J. S. Marques, Tracking of Moving Objects using Deformable Models, Proc. 4th Int.
Symposium on Intelligent Robotic Systems, pp. 309-316, Lisbon, 1996.
[3] A. Abrantes, J. S. Marques, Pattern Recognition Methods for Object Boundary Detection, Proc.
British Machine Vision Conf., vol. 2, pp. 409-417, Southampton, September 1998.
[4] A. Abrantes, Extracção e Seguimento de Contornos de Objectos: uma Perspectiva Unificadora,
PhD thesis, Instituto Superior Técnico, Lisbon, Portugal, December 1998.
[5] A. Amini, R. Curwen, J. Gore, Snakes and splines for tracking nonrigid heart motion, Proc. Eur.
Conf. Computer Vision, pp. 251-261, Cambridge, U.K., 1996.
[6] Y. Bar-Shalom, T. Fortmann, Tracking and Data Association, Academic Press, 1988.
[7] S. Basu, C. Neti, N. Rajput, A. Senior, L. Subramaniam, A. Verma, Audio-visual large vocabulary
continuous speech recognition in the broadcast domain, IEEE Workshop on Multimedia Signal
Processing, pp. 475-481, Copenhagen, September 1999.
[8] A. Baumberg, D. Hogg, Learning deformable models for tracking the human body, Motion Based
Recognition, R. Jain, M. Sha, eds., pp. 39-60, Kluwer, 1997.
[9] C.M. Bishop, Neural Networks for pattern recognition, Oxford, 1995.
[10] A. Blake, M. Isard, Active Contours, Springer, 1998.
[11] A. Blake, R. Curwen, A. Zisserman, A framework for spatio-temporal control in the tracking of
visual contours, Int. Journal of Computer Vision, vol. 11, no. 2, pp. 127-145, 1993.
108
Bibliography
[12] A. Blake, M. Isard, D. Reynard, Learning to track the visual motion contours, Artificial Intelligence,
vol. 78, pp. 179-212, 1995.
[13] C. Chang, M. Athans, State Estimation for Discrete Systems with Switching Parameters, IEEE
Trans. Aerosp. Electron. Syst., vol. 14, pp. 418-425, 1978.
[14] J. Chen, G. Stockman, K. Rao, Recovering and tracking pose of curved 3D objects from 2D images,
Proc. CVPR, pp. 233-239, 1993.
[15] G. Chuang, C. Kuo, Wavelet description of planar curves: Theory and applications, IEEE Trans.
Image Processing, vol. 5, pp. 56-70, January 1996.
[16] L. Cohen, On active contour models and ballons, CVGIP: Image Understanding, vol. 53, no. 2, pp.
211-218, 1991.
[17] L. Cohen, Auxiliary variables and two-step iterative algorithms in computer vision problems, Int.
Journal Computer Vision, vol. 6, no. 1, pp. 59-83, 1996.
[18] L. Cohen, I. Cohen, Finite-element methods for active contour models and ballons for 2-D and
3-D images, IEEE Trans. Pattern Anal. Machine Intell., vol. 15, no. 11, pp. 1131-1147, November
1993.
[19] T. Cootes, C. Taylor, A. Hill, J.Haslam, The Use of Active Shape Models for Locating Structures,
Proc. 13th Int. Conf. on Information Processing in Medical Imaging, H.H. Barrett, A.F. Gmitro,
eds., Springer-Verlag, pp. 33-47, 1993.
[20] T. Cootes, C. Taylor, D. Cooper, J. Graham, Active shape models - their training and application,
Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
[21] T. Cootes, C. Taylor, J. Haslam, The use of active shape models for locating structures in medical
images, Image Vis. Comput., pp. 355-366, 1994.
[22] R. Curwen, A. Blake, Dynamic contours: Real-time active splines, A. Blake, A. Yuille, eds., Active
Vision, cap. 3, MIT Press, 1992.
[23] T. Darrel, A. Pentland, Space-time gestures, Comput. Vis. Pattern Recognition, pp. 335-340, 1993.
Bibliography
[24] J. M. Dias, J. Leitão, Wall position and thickness estimation from sequences of echocardiograms
images, IEEE Trans. Med. Imag., vol. 15, no. 1, pp. 25-38, February 1996.
[25] J. M. Dias, Bayesian Contour Estimation: A Subspace Representation Approach, Energy Minimization Methods in Computer Vision and Pattern Recognition, E. Hancock, M. Pelillo, eds.,
Springer Verlag, pp. 157-172, July 1999.
[26] A. Dempster, M. Laird, D. Rubin, Maximum Likelihood from incomplete data via the EMAlgorithm, Journal of the Royal Statistical Society B, vol. 39, pp. 1-38, 1977.
[27] R. Duda, P. Hart, Pattern Classification and Scene Analysis, John Wiley and Sons, 1973.
[28] J. Evans, R. Evans, Image-enhanced multiple model tracking, Automatica, vol. 35, pp. 1769-1786,
1999.
[29] T. Faruquie, A. Majumdar, N. Rajput, L. Subramaniam, Large vocabulary audio-visual speech
recognition using active shape models, Proc. IEEE Int. Conf. on Pattern Recognition, vol. 3, pp.
110-113, 2000.
[30] M. Figueiredo, J. Leitão, A. Jain, Unsupervised contour representation and estimation using Bsplines and a minimum description length criterion, IEEE Trans. Image Processing, vol. 9, no. 6,
pp. 1075-1087, June 2000.
[31] M. Figueiredo, J. Leitão, A. Jain, Adaptive parametrically deformable contours, Energy Minimization Methods in Computer Vision and Pattern Recognition, M. Pellilo, E. Hancock, eds. Berlin,
Germany: Springer-Verlag, pp. 35-50, 1997.
[32] M. Figueiredo, J. Leitão, Bayesian estimation of ventricular contours in angiographic images, IEEE
Trans. Med. Imag., vol. 11, pp. 416-429, March 1992.
[33] T. Gevers, S. Ghebreab, A. Smeulders, Color Invariant Snakes, Proc. British Machine Vision Conf.,
vol. 2, pp. 578-588, Southampton, September 1998.
[34] H. Gu, Y. Shirai, M. Asada, MDL-Based Segmentation and Motion Modeling in a Long Sequence
of Scene with Multiple Independently Moving Objects, IEEE Trans. Pattern Anal. Machine Intell.,
vol. 18, no. 1, pp. 58-64, 1996.
109
110
Bibliography
[35] X. D. Huang, Y. Arika, M. A. Jack, Hidden Markov models for speech recognition, Edinburg
University Press, 1990.
[36] D. Huttenlocher, S. Ullman, Recognizing Solid Objects by Alignment with an Image, Int. Journal
of Computer Vision, vol. 5, pp. 195-212, 1990.
[37] M. Isard, A. Blake, Contour tracking by stochastic propagation of conditional density, Proc. European Conf. on Computer Vision, vol. 1, pp. 343-356, 1996.
[38] M. Isard, A. Blake, A mixed-state condensation tracker with automatic model-switching, Int. Conf.
on Computer Vision, pp. 107-112, 1998.
[39] J. Jensen, Estimation of blood velocities using ultrasound, A signal processing approach, Cambridge
university press, 1996.
[40] S. Kalitzin, J. J. Staal, B. M. ter Haar Romeny, M. A. Viergever, Image Segmentation and Object
Recognition by Bayesian Grouping, Proc. IEEE Int. Conf. on Image Processing, vol. 3, pp. 580-583,
2000.
[41] R. E. Kalman, A new approach to linear filtering and Prediction Problems, ASME Trans. - journal
of Basic Engineering, no. Series D, vol. 82, pp. 35-45, March 1960.
[42] M. Kass, A. Witkin, D. Terzopoulos, Snakes: Active contour models, Int. Journal of Computer
Vision, vol. 1, no. 4, pp. 321-331, 1987.
[43] B. Leroy, I. Herlin, L. D. Cohen, Multi-resolution algorithms for active contour models, 12th Int.
Conf. Analysis and Optimization od Systems, pp. 58-65, 1996.
[44] F. Leymarie, M. Levine, Tracking deformable objects in the plane using an active contour model,
IEEE Trans. Pattern Anal. Machine Intell., vol. 15, no. 6, pp. 617-634, 1993.
[45] S. Lucey, S. Sridharan, V. Chandran, Initialised eigenlip estimator for fast lip tracking using linear
regression, Proc. IEEE Int. Conf. on Pattern Recognition, vol. 3, pp. 182-185, 2000.
[46] J. Maciel, J. Costeira, Holistic Synthesis of Human Face Images, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. 6, pp. 3545-3548, Phoenix, March 1999.
Bibliography
[47] R. Malladi, J.A. Sethian, B.C. Vemuri, Shape Modeling with Front Propagation: A Level Set
Approach, IEEE Trans. Pattern Anal. Machine Intell., vol. 17, no. 2, pp. 158-175, February 1995.
[48] J. S. Marques, J. M. Lemos, Shape tracking Based on Switched Dynamical Models, Proc. IEEE
Int. Conf. on Image Processing, pp. 954-958, Kobe, 1999.
[49] J. S. Marques, J. M. Lemos, Optimal and Suboptimal Shape Tracking Based on Switched Dynamic
Models, Image and Vision Computing, pp. 539-550, June 2001.
[50] J. S. Marques, A. J. Abrantes, A Constrained Clustering Algorithm for Shape Analysis with Multiple Features, Proc. Int. Conf. on Pattern Recognition, vol. 1, pp. 916-919, Barcelona, September
2000.
[51] T. McInerney, D. Terzopoulos, Topologically adaptable snakes, Proc. Int. Conf. Computer Vision,
pp. 840-845, Cambridge, 1995.
[52] G. J. McLachlan, T. Krishnan, The EM Algorithm and Extensions, New York: John Wiley and
Sons, 1997.
[53] S. Menet, P. Saint-Marc, G. Medioni, B-snakes: Implementations and applications to stereo, Proc.
DARPA Image Understanding Workshop, pp. 720-726, 1990.
[54] H. Murase, S. Nayar, Visual learning and recognition of 3-D objects from appearence, Int. Journal
of Computer Vision, vol. 14, pp. 5-24, 1995.
[55] J. Nascimento, A. Abrantes, J. S. Marques, An algorithm for centroid-based tracking of moving
objects, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, vol. 6, pp. 3305-3308,
Phoenix, March 1999.
[56] J. Nascimento, J. S. Marques, Robust shape tracking in the presence of cluttered background,
Proc. IEEE Int. Conf. on Image Processing, vol. 3, pp. 82-85, Vancouver, September 2000.
[57] J. Nascimento, J. S. Marques, Improving the Robustness of Parametric Shape Tracking with
Switched Multiple Models, Workshop on Pattern Recognition in Information Systems, A. Fred,
A. Jain, eds., pp. 50-58, Setúbal, July 2001.
111
112
Bibliography
[58] J. Nascimento, J. S. Marques, An Adaptive Potential for Robust Shape Estimation, Proc. British
Machine Vision Conf., vol. 1, pp. 343-352, Manchester, September 2001.
[59] J. Nascimento, J. S. Marques, Robust Multi-Model Filter for Shape Tracking in the Presence of
Outliers, Pattern Recognition, vol. 35, pp. 2711-2718, December 2002.
[60] J. Nascimento, A. J. Abrantes, J. S. Marques, The Role of Middle Level Features for Robust Shape
Tracking, Proc. of 12th Portuguese Conf. on Pattern Recognition, Aveiro, June 2002.
[61] J. Nascimento, A. J. Abrantes, J. S. Marques, Using Middle Level Features for Robust Shape
Tracking, Pattern Recognition Letters, vol. 24, pp. 295-307, 2003.
[62] J. Nascimento, J. S. Marques. Robust Shape Tracking in the Presence of Cluttered Background.
IEEE Trans. Multimedia, accepted, 2003.
[63] H. Ney, Stochastic Modelling: From pattern classification to speech recognition and translation,
Proc. IEEE Int. Conf. on Pattern Recognition, vol. 3, pp. 25-32, 2000.
[64] B. North, A. Blake, Learning dynamical models using Expectation-Maximisation, Int. Conf. on
Computer Vision, pp. 384-389, 1998.
[65] V. Petridis, A. Kehagias, A multi-nodel algorithm for parameter estimation of time varying nonlinear systems, Automatica, vol. 34, no. 4, pp. 469-475, 1998.
[66] G. Potamianos, H. P. Graf, Discriminative training of HMM stream exponents for audio-visual
speech recognition, Int. Conf. on Acoustic, Speech and Signal Processing, vol. 6, pp. 3733-3736,
1998.
[67] A. Rajagopalan, R. Chellappa, Vehicle detection and tracking in video, Proc. IEEE Int. Conf. on
Image Processing, vol. 1, pp. 351-355, 2000.
[68] L. Rabiner, A tutorial on hidden Markov models and selected applications, Speech Recognition,
Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[69] L. Rabiner, Bing-Hwang, Fundamentals of speech recognition, Prentice Hall, 1993.
[70] L. Staib, J. Duncan, Boundary Finding with Parametrically Deformable Models, IEEE Trans.
Pattern Anal. Machine Intell., vol. 11, no. 14, pp. 1061-1075, November 1992.
Bibliography
[71] H. Tagare, Deformable 2-D template matching using orthogonal curves, IEEE Trans. Med. Imag.,
vol. 16, no. 1, pp. 108-117, 1997.
[72] D. Terzopoulos, R. Szeliski, Tracking with Kalman snakes, A. Blake, A. Yuille, eds., Active Vision,
cap. 1, pp. 3-20, MIT Press, 1992.
[73] F. De la Torre, J. Vitria, P. Radeva, J. Melenchon, Eigenfiltering for flexible eigentracking (EFE),
Proc. IEEE Int. Conf. on Pattern Recognition, vol. 3, pp. 1118-1121, 2000.
[74] J. Tugnait, Detection and estimation for abruptly changing systems, Automatica, vol. 18, no. 5,
pp. 607-615, 1982.
[75] S. Ullman, R. Basri, Recognition by linear combination of models, IEEE Trans. Pattern Anal.
Machine Intell., vol. 13, no. 10, pp. 992-1006, 1991.
[76] A. Verma, T. Faruquie, C. Neti, S. Basu, A. Senior, Late integration in audio-visual continuous
speech recognition, Automatic Speech Recognition and Understanding, vol. 1, pp. 71-74, 1999.
[77] C. Xu, J. Prince, Snakes, shapes, and gradient vector flow, IEEE Trans. Image Processing, vol. 7,
no. 3, pp. 359-369, March 1998.
[78] A. Yuille, P. Hallinan, Deformable templates, A. Blake, A. Yuille, eds., Active Vision, cap. 2, pp.
21-38, MIT Press, 1992.
[79] X. Zhang, H. Burkhardt, Grouping Edge Points into Line Segments by Sequencial Hough Transformation, Int. Conf. on Pattern Recognition, vol. 3, pp. 676-679, 2000.
[80] S. Zhu, A. Yuille, Region competition: Unifying snakes, region growing, energy/Bayes/MDL for
multi-band image segmentation, IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 884-900,
September 1996.
113