Slides for Chapter 11: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001 Figure 11.1 A network partition Cras hed router Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.2 Server managing a mutual exclusion token for a set of processes Queue of reques ts Server 4 2 1. Request token p 1 p2 3. Grant token 2. Rel ease token p 3 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 p4 Figure 11.3 A ring of processes transferring a mutual exclusion token p 1 p 2 p n p 3 p 4 T oken Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Ricart e Agrawala [1981] Implementa exclusão mútua distribuída entre N processos. usa “multicast” e clocks lógicos Idéia básica: Processos que requerem entrar em uma seção crítica “multicast” uma mensagem de “request”, e pode entrar somente quando todos os outros processos têm respondido a esta mensagem de “request”. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala As condições sob as quais um processo responde a um “request” são projetadas para garantir que as condições ME1(segurança), ME2(vivacidade)-ME3(ordenação) são satisfeitas. Os processos p1, p2,..., pN arcam com identificadores numéricos distintos. É assumido existirem canais de comunicação entre os processos e cada processo guarda um clock lógico de Lamport, atualizado de acordo com as regras LC1-LC4 (Cap10 – Tempo em SDs). Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Mensagens de “request” são da forma <T,pi>, onde T é o rótulo de tempo do processo enviando o “request” e pi é o identificador do processo que envia o “request”. Cada processo registra seu estado em uma variável state: - de estar fora da seção crítica (RELEASED), - esperando entrar na seção crítica (WANTED), - ou estando na seção crítica (HELD) Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Se um processo requisita entrar e o estado de todos os processos é RELEASED, então todos os processo responderão imediatamente ao “request” e o processo requerente obterá a entrada na seção crítica. Se algum processo está em HELD, então aquele processo não responderá a “requests” até que ele tenha terminado sua seção crítica. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala E assim, o processo requerente não pode ganhar a entrada enquanto isso. Se dois ou mais processos requerem entrar ao mesmo tempo em suas seções críticas, então qualquer que seja o processo requerente, aquele que suporta o menor rótulo de tempo será o primeiro a coletar N-1 respostas (replies), concedendo a ele próxima entrada na seção crítica. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Se os “requests” têm rótulos de tempo iguais, os “requests” são ordenados pelos identificadores correspondendo aos processos. Note que, quando um processo “requests” entrada, ele adia o processamento de “requests” de outros processos até que seu próprio “request” tenha sido enviado e ele tenha registrado o rótulo de tempo T do “request”. É assim que processos tomam decisões consistentes quando processando “requests”. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala O algoritmo alcança a propriedade ME1: Se fosse possível para dois processos pi e pj (i diferente de j) entrarem nas suas seções críticas ao mesmo tempo, então ambos os processos teriam que ter respondido ao outro. Mas, visto que, os pares <Ti,pi> e <Tj,pj> são totalmente ordenados, isto se torna impossível. O algoritmo também satisfaz a ME2 e ME3. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.4 Ricart and Agrawala’s algorithm On initialization state := RELEASED; To enter the section state := WANTED; Multicast request to all processes; request processing deferred here T := request’s timestamp; Wait until (number of replies received = (N – 1)); state := HELD; On receipt of a request <Ti, pi> at pj (i ≠ j) if (state = HELD or (state = WANTED and (T, pj) < (Ti, pi))) then queue request from pi without replying; else reply immediately to pi; end if To exit the critical section state := RELEASED; reply to any queued requests; Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Para ilustar o algoritmo, considere a Figura 11.5, seguinte. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.5 Multicast synchronization 41 p 41 p 3 Reply 1 34 Reply 34 Reply 41 p 34 2 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Assuma que p3 não está interessado em entrar na sua seção crítica. P1 e p2 “requests” entrar concorrentemente. O rótulo de tempo de p1, T1, é 41. O rótulo de tempo de p2, T2, é 34. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala Quando p3 recebe seus “requests”, ele responde imediatamente, a p1 e a p2. Quando p2 recebe o “request” de p1, ele descobre que seu próprio “request” tem o rótilo de tempo menor e assim, não “reply”, retendo p1 a esperar. Contudo, p1 descobre que o “request” de p2 ^tem o rótulo de tempo menor do que o seu próprio “request” e assim responde imediatamente. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Algoritmo de Ricart e Agrawala No recebimento do segundo “reply”, p2 pode então entar na sua seção crítica. Quando p2 sai da sua seção crítica, ele responderá ao “request” de p1 e assim, concede a ele a entrada. Para obter a entrada, o algoritmo proporciona 2(N-1) mensagens, seguidas por (N-1) respostas. A vantagem do algoritmo é que seu atraso de sincronização é “round-trip syncronization”, ou seja, somente sobre o tempo de ida-e-volta para transmissão das mensagens “requests” e “replies”. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.6 Maekawa’s algorithm – part 1 On initialization state := RELEASED; voted := FA LSE; For pi to enter the critical section state := WANTED; Multicast request to all processes in Vi – {pi}; Wait until (number of replies received = (K – 1)); state := HELD; On receipt of a request from pi at pj (i ≠ j) if (state = HELD or voted = TRUE) then queue request from pi without replying; else send reply to pi; voted := TRUE; end if Continues on next slide Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.6 Maekawa’s algorithm – part 2 For pi to exit the critical section state := RELEASED; Multicast release to all processes in Vi – {pi}; On receipt of a release from pi at pj (i ≠ j) if (queue of requests is non-empty) then remove head of queue – from pk, say; send reply to pk; voted := TRUE; else voted := FALSE; end if Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.7 A ring-based election in progress 3 17 4 24 9 1 15 28 24 Note: The election was started by process 17. The highest process identifier encountered so far is 24. Participant processes are shown darkened Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.8 The bully algorithm elect ion The election of coordinator p2, after the failure of p4 and then p3 C elect ion Stage 1 p answer 1 p p 2 p 3 4 answer elect ion elect ion elect ion C Stage 2 p1 p 2 answer p 3 p 4 timeout Stage 3 p p 1 Eventually. ... . 2 p 3 p 4 coordinator C Stage 4 p 1 p 2 p Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 3 p 4 Figure 11.9 Open and closed groups Clos ed group Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Open group Figure 11.10 Reliable multicast algorithm Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.11 The hold-back queue for arriving multicast messages Mes sage proc es si ng deli ver Hold-bac k queue Incoming mes sages Deli very queue When delivery guarantees are met Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.12 Total, FIFO and causal ordering of multicast messages T1 Notice the consistent ordering of totally ordered messages T1 and T2, the FIFO-related messages F1 and F2 and the causally related messages C1 and C3 – and the otherwise arbitrary delivery ordering of messages. T2 F1 F3 F2 T ime C1 C2 P1 C3 P2 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 P3 Figure 11.13 Display from bulletin board program Bulletin board:os.interesting Item From Subject 23 A.Hanlon Mach 24 G.Joseph Microkernels 25 A.Hanlon Re: Microkernels 26 T.L’Heureux RPC performance 27 M.Walker Re: Mach end Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.14 Total ordering using a sequencer Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.15 The ISIS algorithm for total ordering P2 1 Message 3 22 P4 1 3 Agreed Seq 1 2 P1 3 P3 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.16 Causal ordering using vector timestamps Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.17 Consensus for three processes P1 d1 :=proc eed d2 :=proc eed P2 v2=proceed v1 =proceed 1 Consens us algori thm v3=abort P3 (crashes) Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.18 Consensus in a synchronous system Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.19 Three byzantine generals p1 (Commander) p1 (Commander) 1:v 1:v 1:w 2:1:v 2:1:w p3 p2 1:x p2 3:1:u p3 3:1:x Faulty processes are shown shaded Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000 Figure 11.20 Four byzantine generals p1 (Commander) p1 (Commander) 1:v 1:v 1:u 1:w 1:v 1:v 2:1:v p2 2:1:u p3 3:1:u 4:1:v p2 4:1:v 2:1:v 4:1:v 3:1:w p3 3:1:w 4:1:v 2:1:u p4 3:1:w p4 Faulty processes are shown shaded Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 © Addison-Wesley Publishers 2000