Proceedings 19th International Symposium on Computer Architecture and High Performance Computing ----------- October 24th-27th, 2007 Gramado, RS - Brazil Promoted by Brazilian Computer Society (SBC) Co-sponsored by IEEE Computer Society International Federation for Information Processing (IFIP) Organization Federal University of Rio Grande do Sul (UFRGS) Fluminense Federal University (UFF) Los Alamitos, California Washington • Tokyo 19th International Symposium on Computer Architecture and High Performance Computing SBAC-PAD Message from the General Chairs............................................................... ix Message from the Program Committee Chairs ............................................ x Conference Organizers .............................................................................. xi Program Committee.................................................................................. xii Reviewers ................................................................................................ xiv Brazilian Computer Society (SBC)............................................................. xv Session 1 Applications I Multi-level Parallelism in the Computational Modeling of the Heart.............................................................. 3 Carolina Xavier, Rafael Sachetto, Vinicius Vieira, Rodrigo Weber dos Santos, and Wagner Meira Jr. Computational Characteristics of Production Seismic Migration and its Performance on Novel Processor Architectures............................................................................................................... 11 Jairo Panetta, Paulo R. P. de Souza Filho, Carlos A. da Cunha Filho, Fernando M. Roxo da Motta, Silvio S. Pinheiro, Ivan Pedrosa Junior, Andre L. R. Rosa, Luiz R. Monnerat, Leandro T. Carneiro, and Carlos H. B. de Albrecht Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA) ............................................................... 19 Gustavo Poli, Alexandre L. M. Levada, João F. Mari, and José Hiroki Saito Exploring Novel Parallelization Technologies for 3-D Imaging Applications .............................................. 26 Diego Rivera, Dana Schaa, Micha Moffie, and David Kaeli Session 2 Microarchitecture Low-cost Techniques for Reducing Branch Context Pollution in a Soft Realtime Embedded Multithreaded Processor........................................................................................................... 37 Emre Özer, Alastair Reid and Stuart Biles Self-Imposed Temporal Redundancy: An Efficient Technique to Enhance the Reliability of Pipelined Functional Units ................................................................................................ 45 Elias Mizan, Tileli Amimeur, and Margarida F. Jacome Predicting Loop Termination to Boost Speculative Thread-Level Parallelism in Embedded Applications........................................................................................................................... 54 Mafijul Md. Islam Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors .............................. 62 Rafael Ubal, Julio Sahuquillo, Salvador Petit, and Pedro López v Session 3 Applications II Performance Improvement of the Parallel Lattice Boltzmann Method through Blocked Data Distributions........................................................................................................................................ 71 Claudio Schepke and Nicolas Maillard A Scalable Parallel Deduplication Algorithm............................................................................................... 79 Walter Santos, Thiago Teixeira, Carla Machado, Wagner Meira Jr., Altigran S. Da Silva, Renato Ferreira and Dorgival Guedes A Multigrid-Schwarz Method for the Solution of Hydrodynamics and Heat Transfer Problems in Unstructured Meshes .............................................................................................................................. 87 Guilherme Galante, Rogério L. Rizzi, and Tiarajú A. Diverio Session 4 Benchmarking, Performance Measurements and Analysis Performance Evaluation of the Dual-Core Based SGI Altix 4700............................................................... 97 Rod Fatoohi Impacts of Multiprocessor Configurations on Workloads in Bioinformatics .............................................. 105 Youfeng Wu, Mauricio Breternitz Jr., and Victor Ying Session 5 Application-Specific Architectures Efficient Hardware for Modular Exponentiation Using the Sliding-Window Method with Variable-Length Partitioning .............................................................................................................. 117 Nadia Nedjah and Luiza de Macedo Mourelle Optimized Math Functions for a Fixed-Point DSP Architecture ................................................................ 125 Karlo G. Lenzi and Osamu Saotome Session 6 Grid Computing A Component-Oriented Support for Hierarchical MPI Programming on Multi-cluster Grid Environments........................................................................................................... 135 Elton Nicoletti Mathias, Vincent Cave, Francoise Baude, and Nicolas Maillard A Selector of Grid Resources based on the Semantic Integration of Multiple Ontologies ....................... 143 Alexandre P.C Silva and Mario A.R. Dantas A Novel Algorithm for Indirect Reputation-Based Grid Resource Management....................................... 151 Javier Echaiz, Jorge R. Ardenghi, and Guillermo R. Simari vi Session 7 Cache and Memory Architectures Register File Energy Optimization for Snooping Based Clustered VLIW Architectures ........................... 161 Rahul Nagpal and Y. N. Srikant Queue Register File Optimization Algorithm for QueueCore Processor .................................................. 169 Arquimedes Canedo, Ben Abderazek, and Masahiro Sowa An Intelligent Mechanism to Explore a Two-Level Cache Hierarchy Considering Energy Consumption and Time Performance ....................................................................................................... 177 Abel G. Silva-Filho, Carmelo J. A. Bastos-Filho, Ricardo M.F. Lima, Davi M.A. Falcão, Filipe R. Cordeiro, and Marília P. Lima A Code Compression Method to Cope with Security Hardware Overheads ............................................ 185 Eduardo Wanderley Netto, Romain Vaslin, Guy Gogniat, and Jean-Philippe Diguet Session 8 Interconnection Networks, Routing, and Communication Architectural Breakdown of End-to-End Latency in a TCP/IP Network .................................................... 195 Steen Larsen, Parthasarathy Sarangam, and Ram Huggahalli Performance Analysis and Linear Optimization Modeling of All-to-all Collective Communication Algorithms ....................................................................................................................... 203 Hyacinthe N. Mamadou, Guilherme de Melo B. Domingues, Takeshi Nanri, and Kazuaki Murakami Design of a Feasible On-Chip Interconnection Network for a Chip Multiprocessor (CMP) ...................... 211 Seung Eun Lee, Jun Ho Bahn, and Nader Bagherzadeh Session 9 Tools for Parallel and Distributed Programming Node Level Primitives for Parallel Exact Inference................................................................................... 221 Yinglong Xia and Viktor Prasanna Fault-Tolerance in Filter-Labeled-Stream Applications............................................................................. 229 Bruno Coutinho, Dorgival Guedes, Wagner Meira Jr., and Renato A. Ferreira High-Level Service Connectors for Component-Based High Performance Computing ........................... 237 Francisco H. de Carvalho-Junior, Ricardo C. Corrêa, Gisele A. Araújo, Jefferson C. Silva, and Rafael D. Lins vii Session 10 Load Balancing and Scheduling On-Line Scheduling of MPI-2 Programs with Hierarchical Work Stealing ................................................ 247 Guilherme P. Pezzi, Márcia C. Cera, Elton Mathias, Nicolas Maillard, and Philippe O. A. Navaux Exigency-Based Real-Time Scheduling Policy to Provide Absolute QoS for Web Services.................... 255 Lucas S. Casagrande, Rodrigo F. de Mello, Ricardo Bertagna, José A. Andrade Filho, and Francisco J. Monaco DTA-C: A Decoupled Multi-threaded Architecture for CMP Systems ....................................................... 263 Roberto Giorgi, Zdravko Popovic, and Nikola Puzovic Automatic Constraint Partitioning to Speed up CLP Execution ................................................................ 271 Marluce R. Pereira, Patrícia K. Vargas, Maria Clícia S. de Castro, Felipe M. G. França, and Inês de Castro Dutra Author Index .............................................................................................................. 279 viii