Chapter Five The Processor: Datapath and Control (Parte B: multiciclo) Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-1 Multicycle Approach • • Break up the instructions into steps, each step takes a cycle – balance the amount of work to be done – restrict each cycle to use only one major functional unit • 1 ALU, 1 Memória, 1 Banco de Registradores At the end of a cycle – store values for use in later cycles (easiest thing to do) – introduce additional “internal” registers Instruction register PC Address Data A Register # Instruction Memory or data Data Memory data register Mario Côrtes - MO401 - IC/Unicamp- 2004s2 ALU Registers Register # ALUOut B Register # 1998 Morgan Kaufmann Publishers Ch5B-2 Multicycle Approach • • • • • IR (Instruction Register) e MDR (Memory Data Register) salvam saída da memória Registradores A e B salvam saída do banco de registradores ALUout salva saída da ALU Todos, exceto IR, guardam dados por um clock controle de escrita é desnecessário novos MUXes: endereço da memória, operandos da ALU PC 0 M u x 1 Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register Mario Côrtes - MO401 - IC/Unicamp- 2004s2 0 M Instruction u x [15– 11] 1 0 M u x 1 A B 4 Write data 0 M u x 1 16 Sign extend 32 Zero ALU ALU result ALUOut 0 1 M u 2 x 3 Shift left 2 1998 Morgan Kaufmann Publishers Ch5B-3 Via de dados multiciclo e sinais de controle IorD PC 0 M u x 1 MemRead MemWrite RegDst RegWrite Instruction [25– 21] Address Memory MemData Write data IRWrite Instruction register Instruction [15– 0] Memory data register 0 M u x 1 Read register 1 Read Read data 1 register 2 Registers Write Read register data 2 Instruction [20– 16] Instruction [15– 0] ALUSrcA 0 M Instruction u x [15– 11] 1 A B 4 Write data 0 M u x 1 16 Sign extend 32 Shift left 2 Zero ALU ALU result ALUOut 0 1 M u 2 x 3 ALU control Instruction [5– 0] MemtoReg ALUSrcB ALUOp Falta implementar 3 possíveis fontes de carga para PC: PC+4, beq e j Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-4 Via de dados completa P C W r it e C o n d P C S ource P C W ri t e Io r D O u t p u ts A LU O p A L U S rc B M em R e ad M e m W r i te C o n tr o l M e m to R e g I R W ri te A L U S rc A R e g W ri te Op [5 – 0 ] R egD st 0 M 26 In s tr u c tio n [ 2 5 – 0 ] 28 S h ift Jum p a d d re s s [3 1 - 0 ] PC P C [3 1 - 2 8 ] 0 M u x A d d re s s M e m ory 1 Read r e g i s te r 1 I n s t ru ct io n [2 0 – 1 6 ] R ead Read r e g i s te r 2 d a t a 1 0 M em D ata W r ite d a ta 0 I n s t ru ct io n [2 5 – 2 1 ] I n s t ru ct io n [1 5 – 0 ] In s tr u c tio n re g is te r x 2 le f t 2 In s tr u c ti o n [3 1 - 26 ] 1 u In s t ru c tio n [1 5 – 1 1 ] I n s tr u c t io n [1 5 – 0 ] M e m o ry d a ta re g is te r A A LU B 0 ALU r es ult ALUO ut 0 4 W rite d a ta 1 Z ero 1 R e gi s te r s W rite R ead r e g i s te r da ta 2 M u x M u x 1 M u 2 x 3 M u x 1 16 32 S ig n e x te n d S h i ft le f t 2 ALU c on t ro l I n s tr u c t io n [5 – 0 ] Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-5 Sinais de controle (1 bit) Desligado Reg de escrita rt Ligado RegDst Reg de escrita rd RegWR Escreve no banco ALUSrcA operando é o PC operando é o reg A MemRD Lê a memória MemWR Escreve na memória Memto Reg Reg WR data ALUout Reg WR data MDR IorD Endereço do PC Endereço de ALUout IRWrite Escreve no IR PCWrite Escreve no PC PCWriteCond Escr PC se ALU=0 Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-6 Sinais de controle (2 bits) 00 ALUop 01 10 00 01 ALUSrcB 10 11 00 PCSrc 01 10 Soma Subtração Function Segundo operando é o registrador B “ constante 4 “ sign-extend, 16 bits do IR “ idem acima, deslocado 2 bits à esquerda PC + 4 ALUout (target address) Jump (4 bits PC & 26 bits ender & 00) Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-7 Five Execution Steps • Instruction Fetch • Instruction Decode and Register Fetch • Execution, Memory Address Computation, or Branch Completion • Memory Access or R-type instruction completion • Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-8 Step 1: Instruction Fetch • • • Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-9 Step 2: Instruction Decode and Register Fetch • • • Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2); • We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-10 Step 3 (instruction dependent) • ALU is performing one of three functions, based on instruction type • Memory Reference: ALUOut = A + sign-extend(IR[15-0]); • R-type: ALUOut = A op B; • Branch: if (A==B) PC = ALUOut; Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-11 Step 4 (R-type or memory-access) • Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; • R-type instructions finish Reg[IR[15-11]] = ALUOut; The write actually takes place at the end of the cycle on the edge Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-12 Write-back step • Reg[IR[20-16]]= MDR; What about all the other instructions? Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-13 Summary: Step name Instruction fetch Action for R-type instructions Instruction decode/register fetch Execution, address computation, branch/ jump completion Memory access or R-type completion ALUOut = A op B Reg [IR[15-11]] = ALUOut Memory read completion Mario Côrtes - MO401 - IC/Unicamp- 2004s2 Action for memory-reference Action for instructions branches IR = Memory[PC] PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] ALUOut = PC + (sign-extend (IR[15-0]) << 2) ALUOut = A + sign-extend if (A ==B) then (IR[15-0]) PC = ALUOut Action for jumps PC = PC [31-28] II (IR[25-0]<<2) Load: MDR = Memory[ALUOut] or Store: Memory [ALUOut] = B Load: Reg[IR[20-16]] = MDR 1998 Morgan Kaufmann Publishers Ch5B-14 Outra visão, fluxograma IR=Mem[PC] PC=PC+4 0 1 A=Reg[IR[25:21]] B=Reg[IR[20:16]] ALUout=PC+(SignExt(IR(15:0))<<2) 6 2 ALUout=A op B 8 ALUout=A+SignExt(IR(15:0)) 7 Reg[IR[15:11]]=ALUout 3 MDR=Mem(ALUout) If A==B PC=ALUout 5 9 PC=PC[31:28]& IR[25:0]<<2 Mem(ALUout)=B 4 Reg[IR[15:11]]=MDR Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-15 Simple Questions • How many cycles will it take to execute this code? Label: • • lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) ... #assume not What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-16 Implementing the Control • Value of control signals is dependent upon: – what instruction is being executed – which step is being performed • Use the information we’ve acculumated to specify a finite state machine – specify the finite state machine graphically, or – use microprogramming • Implementation can be derived from specification Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-17 Tabela dos sinais de controle Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-18 Graphical Specification of FSM Instruction fetch Start Memory address computation 6 (Op = 'LW') ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Branch completion 8 ALUSrcA =1 ALUSrcB = 00 ALUOp= 10 Memory access 3 1 Execution 2 Memory access 5 MemRead IorD = 1 Write-back step 4 RegDst=0 RegWrite MemtoReg=1 • MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 How many state bits will we need? MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 Jump completion 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 R-type completion 7 (Op = 'J') 0 Instruction decode/ register fetch PCWrite PCSource = 10 Finite State Machine for Control PCWrite PCWriteCond IorD MemRead MemWrite IRWrite Control logic MemtoReg PCSource ALUOp Outputs ALUSrcB ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 Instruction register opcode field Mario Côrtes - MO401 - IC/Unicamp- 2004s2 S0 S1 S2 S3 Op0 Op1 Op2 Op3 Op4 Op5 Inputs State register 1998 Morgan Kaufmann Publishers Ch5B-20 Desempenho desta máquina para o gcc • • • lw sw Tipo R beq jump % 23 13 43 19 2 CPI 5 4 4 3 3 CPI = 0,23*5 + 0,13*4 + 0,43*4 + 0,19*3 + 0,02*3 = 4,02 Melhor do que se todas as instruções tomassem 5 ciclos Melhoria em MIPS (supor ck = 100 MHz) – CPI = 4 -> 25 MIPS – CPI = 5 -> 20 MIPS – CPI = 1 (fazer tudo em um ciclo) -> 100 MIPS Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-21 PLA Implementation • If I picked a horizontal or vertical line could you explain it? Op5 Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource1 PCSource0 ALUOp1 ALUOp0 ALUSrcB1 ALUSrcB0 ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-22 ROM Implementation • • ROM = "Read Only Memory" – values of memory locations are fixed ahead of time A ROM can be used to implement a truth table – if the address is m-bits, we can address 2m entries in the ROM. – our outputs are the bits of data that the address points to. m 0 0 0 0 1 1 1 1 n 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 m is the "heigth", and n is the "width" Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-23 ROM Implementation • • How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs • ROM is 210 x 20 = 1K x 20 bits • Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored Mario Côrtes - MO401 - IC/Unicamp- 2004s2 (and a rather unusual size) 1998 Morgan Kaufmann Publishers Ch5B-24 ROM vs PLA • Break up the table into two parts — 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM — Total: 4.3K bits of ROM • PLA is much smaller — can share product terms — only need entries that produce an active output — can take into account don't cares • Size is (#inputs #product-terms) + (#outputs #product-terms) For this example = (10x17)+(20x17) = 460 PLA cells • PLA cells usually about the size of a ROM cell (slightly bigger) Mario Côrtes - MO401 - IC/Unicamp- 2004s2 1998 Morgan Kaufmann Publishers Ch5B-25