Chapter Five
The Processor: Datapath and Control
(Parte B: multiciclo)
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-1
Multicycle Approach
•
•
Break up the instructions into steps, each step takes a cycle
– balance the amount of work to be done
– restrict each cycle to use only one major functional unit
• 1 ALU, 1 Memória, 1 Banco de Registradores
At the end of a cycle
– store values for use in later cycles (easiest thing to do)
– introduce additional “internal” registers
Instruction
register
PC
Address
Data
A
Register #
Instruction
Memory
or data
Data
Memory
data
register
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
ALU
Registers
Register #
ALUOut
B
Register #
1998 Morgan Kaufmann Publishers
Ch5B-2
Multicycle Approach
•
•
•
•
•
IR (Instruction Register) e MDR (Memory Data Register) salvam saída da
memória
Registradores A e B salvam saída do banco de registradores
ALUout salva saída da ALU
Todos, exceto IR, guardam dados por um clock  controle de escrita é
desnecessário
novos MUXes: endereço da memória, operandos da ALU
PC
0
M
u
x
1
Address
Memory
MemData
Write
data
Instruction
[25– 21]
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Instruction
[15– 0]
Instruction
register
Instruction
[15– 0]
Memory
data
register
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
0
M
Instruction u
x
[15– 11]
1
0
M
u
x
1
A
B
4
Write
data
0
M
u
x
1
16
Sign
extend
32
Zero
ALU ALU
result
ALUOut
0
1 M
u
2 x
3
Shift
left 2
1998 Morgan Kaufmann Publishers
Ch5B-3
Via de dados multiciclo e sinais de controle
IorD
PC
0
M
u
x
1
MemRead MemWrite
RegDst
RegWrite
Instruction
[25– 21]
Address
Memory
MemData
Write
data
IRWrite
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
u
x
1
Read
register 1
Read
Read
data 1
register 2
Registers
Write
Read
register
data 2
Instruction
[20– 16]
Instruction
[15– 0]
ALUSrcA
0
M
Instruction u
x
[15– 11]
1
A
B
4
Write
data
0
M
u
x
1
16
Sign
extend
32
Shift
left 2
Zero
ALU ALU
result
ALUOut
0
1 M
u
2 x
3
ALU
control
Instruction [5– 0]
MemtoReg
ALUSrcB ALUOp
Falta implementar 3 possíveis fontes de carga para PC: PC+4, beq e j
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-4
Via de dados completa
P C W r it e C o n d
P C S ource
P C W ri t e
Io r D
O u t p u ts
A LU O p
A L U S rc B
M em R e ad
M e m W r i te
C o n tr o l
M e m to R e g
I R W ri te
A L U S rc A
R e g W ri te
Op
[5 – 0 ]
R egD st
0
M
26
In s tr u c tio n [ 2 5 – 0 ]
28
S h ift
Jum p
a d d re s s [3 1 - 0 ]
PC
P C [3 1 - 2 8 ]
0
M
u
x
A d d re s s
M e m ory
1
Read
r e g i s te r 1
I n s t ru ct io n
[2 0 – 1 6 ]
R ead
Read
r e g i s te r 2 d a t a 1
0
M em D ata
W r ite
d a ta
0
I n s t ru ct io n
[2 5 – 2 1 ]
I n s t ru ct io n
[1 5 – 0 ]
In s tr u c tio n
re g is te r
x
2
le f t 2
In s tr u c ti o n
[3 1 - 26 ]
1 u
In s t ru c tio n
[1 5 – 1 1 ]
I n s tr u c t io n
[1 5 – 0 ]
M e m o ry
d a ta
re g is te r
A
A LU
B
0
ALU
r es ult
ALUO ut
0
4
W rite
d a ta
1
Z ero
1
R e gi s te r s
W rite
R ead
r e g i s te r
da ta 2
M
u
x
M
u
x
1 M
u
2 x
3
M
u
x
1
16
32
S ig n
e x te n d
S h i ft
le f t 2
ALU
c on t ro l
I n s tr u c t io n [5 – 0 ]
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-5
Sinais de controle (1 bit)
Desligado
Reg de escrita rt
Ligado
RegDst
Reg de escrita rd
RegWR
Escreve no banco
ALUSrcA
operando é o PC
operando é o reg A
MemRD
Lê a memória
MemWR
Escreve na memória
Memto Reg Reg WR data ALUout Reg WR data MDR
IorD
Endereço do PC
Endereço de ALUout
IRWrite
Escreve no IR
PCWrite
Escreve no PC
PCWriteCond
Escr PC se ALU=0
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-6
Sinais de controle (2 bits)
00
ALUop 01
10
00
01
ALUSrcB
10
11
00
PCSrc 01
10
Soma
Subtração
Function
Segundo operando é o registrador B
“
constante 4
“
sign-extend, 16 bits do IR
“ idem acima, deslocado 2 bits à esquerda
PC + 4
ALUout (target address)
Jump (4 bits PC & 26 bits ender & 00)
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-7
Five Execution Steps
•
Instruction Fetch
•
Instruction Decode and Register Fetch
•
Execution, Memory Address Computation, or Branch Completion
•
Memory Access or R-type instruction completion
•
Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-8
Step 1: Instruction Fetch
•
•
•
Use PC to get instruction and put it in the Instruction Register.
Increment the PC by 4 and put the result back in the PC.
Can be described succinctly using RTL "Register-Transfer Language"
IR = Memory[PC];
PC = PC + 4;
Can we figure out the values of the control signals?
What is the advantage of updating the PC now?
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-9
Step 2: Instruction Decode and Register Fetch
•
•
•
Read registers rs and rt in case we need them
Compute the branch address in case the instruction is a branch
RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);
•
We aren't setting any control lines based on the instruction type
(we are busy "decoding" it in our control logic)
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-10
Step 3 (instruction dependent)
•
ALU is performing one of three functions, based on instruction type
•
Memory Reference:
ALUOut = A + sign-extend(IR[15-0]);
•
R-type:
ALUOut = A op B;
•
Branch:
if (A==B) PC = ALUOut;
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-11
Step 4 (R-type or memory-access)
•
Loads and stores access memory
MDR = Memory[ALUOut];
or
Memory[ALUOut] = B;
•
R-type instructions finish
Reg[IR[15-11]] = ALUOut;
The write actually takes place at the end of the cycle on the edge
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-12
Write-back step
• Reg[IR[20-16]]= MDR;
What about all the other instructions?
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-13
Summary:
Step name
Instruction fetch
Action for R-type
instructions
Instruction
decode/register fetch
Execution, address
computation, branch/
jump completion
Memory access or R-type
completion
ALUOut = A op B
Reg [IR[15-11]] =
ALUOut
Memory read completion
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
Action for memory-reference
Action for
instructions
branches
IR = Memory[PC]
PC = PC + 4
A = Reg [IR[25-21]]
B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
ALUOut = A + sign-extend
if (A ==B) then
(IR[15-0])
PC = ALUOut
Action for
jumps
PC = PC [31-28] II
(IR[25-0]<<2)
Load: MDR = Memory[ALUOut]
or
Store: Memory [ALUOut] = B
Load: Reg[IR[20-16]] = MDR
1998 Morgan Kaufmann Publishers
Ch5B-14
Outra visão, fluxograma
IR=Mem[PC]
PC=PC+4
0
1
A=Reg[IR[25:21]]
B=Reg[IR[20:16]]
ALUout=PC+(SignExt(IR(15:0))<<2)
6
2
ALUout=A op B
8
ALUout=A+SignExt(IR(15:0))
7
Reg[IR[15:11]]=ALUout
3
MDR=Mem(ALUout)
If A==B
PC=ALUout
5
9
PC=PC[31:28]&
IR[25:0]<<2
Mem(ALUout)=B
4
Reg[IR[15:11]]=MDR
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-15
Simple Questions
•
How many cycles will it take to execute this code?
Label:
•
•
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label
add $t5, $t2, $t3
sw $t5, 8($t3)
...
#assume not
What is going on during the 8th cycle of execution?
In what cycle does the actual addition of $t2 and $t3 takes place?
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-16
Implementing the Control
•
Value of control signals is dependent upon:
– what instruction is being executed
– which step is being performed
•
Use the information we’ve acculumated to specify a finite state machine
– specify the finite state machine graphically, or
– use microprogramming
•
Implementation can be derived from specification
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-17
Tabela dos sinais de controle
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-18
Graphical Specification of FSM
Instruction fetch
Start
Memory address
computation
6
(Op = 'LW')
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Branch
completion
8
ALUSrcA =1
ALUSrcB = 00
ALUOp= 10
Memory
access
3
1
Execution
2
Memory
access
5
MemRead
IorD = 1
Write-back step
4
RegDst=0
RegWrite
MemtoReg=1
•
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00
How many state bits will we need?
MemWrite
IorD = 1
RegDst = 1
RegWrite
MemtoReg = 0
Jump
completion
9
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
R-type completion
7
(Op = 'J')
0
Instruction decode/
register fetch
PCWrite
PCSource = 10
Finite State Machine for Control
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
Control logic
MemtoReg
PCSource
ALUOp
Outputs
ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
NS0
Instruction register
opcode field
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
S0
S1
S2
S3
Op0
Op1
Op2
Op3
Op4
Op5
Inputs
State register
1998 Morgan Kaufmann Publishers
Ch5B-20
Desempenho desta máquina para o gcc
•
•
•
lw
sw
Tipo R
beq
jump
%
23
13
43
19
2
CPI
5
4
4
3
3
CPI = 0,23*5 + 0,13*4 + 0,43*4 + 0,19*3 + 0,02*3 = 4,02
Melhor do que se todas as instruções tomassem 5 ciclos
Melhoria em MIPS (supor ck = 100 MHz)
– CPI = 4 -> 25 MIPS
– CPI = 5 -> 20 MIPS
– CPI = 1 (fazer tudo em um ciclo) -> 100 MIPS
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-21
PLA Implementation
•
If I picked a horizontal or vertical line could you explain it?
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
ALUOp0
ALUSrcB1
ALUSrcB0
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
NS0
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-22
ROM Implementation
•
•
ROM = "Read Only Memory"
– values of memory locations are fixed ahead of time
A ROM can be used to implement a truth table
– if the address is m-bits, we can address 2m entries in the ROM.
– our outputs are the bits of data that the address points to.
m
0
0
0
0
1
1
1
1
n
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
1
0
0
0
0
0
1
1
0
0
0
1
1
1
0
0
0
0
0
1
1
1
0
0
0
0
1
0
1
m is the "heigth", and n is the "width"
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-23
ROM Implementation
•
•
How many inputs are there?
6 bits for opcode, 4 bits for state = 10 address lines
(i.e., 210 = 1024 different addresses)
How many outputs are there?
16 datapath-control outputs, 4 state bits = 20 outputs
•
ROM is 210 x 20 = 1K x 20 bits
•
Rather wasteful, since for lots of the entries, the outputs are the same
— i.e., opcode is often ignored
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
(and a rather unusual size)
1998 Morgan Kaufmann Publishers
Ch5B-24
ROM vs PLA
•
Break up the table into two parts
— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM
— Total: 4.3K bits of ROM
•
PLA is much smaller
— can share product terms
— only need entries that produce an active output
— can take into account don't cares
•
Size is (#inputs  #product-terms) + (#outputs  #product-terms)
For this example = (10x17)+(20x17) = 460 PLA cells
•
PLA cells usually about the size of a ROM cell (slightly bigger)
Mario Côrtes - MO401 - IC/Unicamp- 2004s2
1998 Morgan Kaufmann Publishers
Ch5B-25
Download

ch5b_v1-cortes - Facom-UFMS