### William Stallings Computer Organization and Architecture

Chapter 16 Control Unit Operations

### **Execution of the Instruction Cycle**

- It has many elementary phases, each executed in a single clock cycle (remember pipelining)
- In each phase only very simple operations (called micro-operations) are executed:
  - Move contents between registers (internals, interface with ALU, interface with memory)
  - Activate devices (ALU, memory)
- Micro-operations are the CPU atomic operations, hence define its low-level behaviour
- A micro-operation is the set of actions (data flows and controls) that can be completed in a single clock cycle

### **Constituent Elements of Program Execution**



## Sequence of micro-operations for instruction fetch

<DF1 >

- t<sub>1</sub>: MAR <- PC
- $t_2$ : MBR <- memory <DF<sub>2</sub> DF<sub>3</sub> DF<sub>4</sub> DF<sub>5</sub> > PC <- PC +1 <DF<sub>7</sub> >
- t<sub>3</sub>: IR <- MBR
   <ul>
   (each t<sub>i</sub> is a clock cycle,
   i.e. an atomic time unit)

  An alternative organization
- t<sub>1</sub>: MAR <- PC
- t<sub>2</sub>: MBR <- memory
- t<sub>3</sub>: PC <- PC +1</li>
  IR <- MBR</li>



# Rules for micro-operation sequencing

- Proper precedence must be observed
  - MAR <- PC must precede MBR <- memory</p>
- Conflicts must be avoided
  - Must not read & write same register at same time
  - MBR <- memory & IR <- MBR must not be in same cycle</li>
- Also: PC <- PC +1 involves addition
  - Depending on the kind of ALU may need additional micro-operations, hence it is better to have it in t<sub>2</sub>
- Minimization of the number of micro-operations is an algorithmic problem on graphs

## Sequence of micro-operations for direct addressing

- $t_1$ : MAR <- IR<sub>address</sub> <DF<sub>1</sub> >
- $t_2$ : MBR <- memory <DF<sub>2</sub> DF<sub>3</sub> DF<sub>4</sub> DF<sub>5</sub> >



## Sequence of micro-operations for register addressing



Rev. 3.3 (2009-10) by Enrico Nardelli

## Sequence of micro-operations for register indirect addressing

- t<sub>1</sub>: MAR <- (IR<sub>register-address</sub>) <DF<sub>1</sub> DF<sub>2</sub> >
- t<sub>2</sub>: MBR <- memory

<DF<sub>3</sub> DF<sub>4</sub> DF<sub>5</sub> DF<sub>6</sub> >



Rev. 3.2 (2009-10) by Enrico Nardelli

## Sequence of micro-operations for indirect addressing

•  $t_1$ : MAR <- IR<sub>address</sub>  $< DF_1 >$ t<sub>2</sub>: MBR <- memory  $\langle DF_2 DF_3 DF_4 DF_5 \rangle$  $t_3$ : MAR <- MBR <DF<sub>6</sub> > MBR <- memory  $\langle DF_7 DF_8 DF_9 DF_{10} \rangle$ t₄: **CPU** 3 - 8 2 - 7 4 - 9 MAR Memory 4 - 9 3 - 8 Control 1 6 Unit 5 - 10 IR **MBR** MAR = Memory Address Register Address Data Control MBR = Memory Buffer Register Bus Bus Bus **IR = Instruction Register** PC = Program Counter

Rev. 3.2 (2009-10) by Enrico Nardelli

### Sequence of micro-operations for relative addressing

- $t_1$ : MAR <-  $IR_{address}$  + PC <  $OF_1 DF_2 DF_3 DF_4$  >
- $t_2$ : MBR <- memory <br/> <br/>OF<sub>5</sub> DF<sub>6</sub> DF<sub>7</sub> DF<sub>8</sub> >



## Sequence of micro-operations for base and indexed addressing

- t<sub>1</sub>: MAR <- (IR<sub>register-address</sub>) + IR<sub>address</sub>
  <DF<sub>1</sub> DF<sub>2</sub> DF<sub>3</sub> DF<sub>4</sub> DF<sub>5</sub> >
- $t_2$ : MBR <- memory <DF<sub>6</sub> DF<sub>7</sub> DF<sub>8</sub> DF<sub>9</sub> >



Rev. 3.2 (2009-10) by Enrico Nardelli

#### Sequence of micro-operations for combination of displacement and indirect addressing

• Try them yourself !

# Sequence of micro-operations for interrupt handling



# Micro-operation sequencing for the execution phase (1)

- Different for each instruction
- SUM X sum the contents of memory cell X and Accumulator and store back the result in cell X

Assuming that after the operand fetch phase content of cell at address X is in MBR :

- $t_1$ : ALU <- AC + MBR
- t<sub>2</sub>: AC <- ALU
- t<sub>3</sub>: MBR <- AC; MAR <- IR<sub>address</sub>
- t<sub>4</sub>: memory <- MBR</p>

# Micro-operation sequencing for the execution phase (2)

- ISZ X increment memory cell X and if it's zero skip the next instruction
  - Assuming that content of cell at address X is in MBR after the operand fetch phase:
  - t<sub>1</sub>: ALU <- MBR + 1</p>
  - t<sub>2</sub>: MBR <- ALU
  - $t_3$ : memory <- MBR
    - IF MBR == 0 THEN PC <- PC + 1
- Note:
  - IF-THEN is a single micro-operation

# Micro-operation sequencing for the execution phase (3)

- CALL X Save in stack the return address and jump to address X
  - t<sub>1</sub>: MBR <- PC</li>
    MAR <- SP</li>
  - t<sub>2</sub>: memory <- MBR</p>
    - ALU <- SP + 1
    - PC <- IR<sub>address</sub>
  - t<sub>3</sub>: SP <- ALU

## A simplified flow diagram for the execution of instruction cycle



### **Functions of Control Unit**

#### Sequencing

- Causing the CPU to step through a series of microoperations
- Execution
  - Causing the execution of each micro-op
- ALL THESE ACTIONS are performed by means of **Control Signals**

## A simplified data flow diagram of a Control Unit



### **Control Unit's Input Signals**

#### Clock

- One micro-op (or set of parallel ops) per clock cycle
- Different signals are needed for different steps
- Instruction register
  - Op-code for current instruction
  - Determines which micro-instructions are performed
- Flags
  - State of CPU
  - Results of previous operations
- Control Bus
  - Interrupts
  - Acknowledgments

### **Control Unit's Output Signals**

#### • To other CPU components

- For data movement
- To activate specific functions
- To the Control Bus
  - To control memory
  - To control I/O modules
- Output signals from the control unit (i.e., Control Signals) make all micro-operations happen

## Simplified schema of a CPU: connections and control gates



### Example of Control Signal Sequence – Instruction Fetch (1)

- t<sub>1</sub>: MAR <- PC
  - Control unit (CU) activates signal C2 to open gate from PC to MAR
- t<sub>2-1</sub>: MBR <- memory
  - CU activates C0 to open gate from MAR to address bus
  - CU activates the *memory read* control signal (CR not shown) to the memory
  - CU activates C5 to open gate from data bus to MBR

### Example of Control Signal Sequence – Instruction Fetch (2)

#### • t<sub>2-2</sub>: PC <- PC +1

- In the considered CPU's internal schema, ALU's output is not directly connected to PC but only to AC. Therefore this micro-operation has to be split in two subsequent time units.
- What would happen if we had ALU's output directly connected to PC?
- t<sub>3</sub>: IR <- MBR
  - CU activates C4 to open gate from MBR to IR

### Example of Control Signal Sequence – Instruction Fetch (3)

- Splitting the increment of Program Counter
  - ALU is a fast combinational circuit whose inputs and output are **not** buffered
  - ALU has a specific control signal CA for unitary increment without a second input

t<sub>2-2-1</sub>: ALU <- PC CU activates C14 from PC to ALU increment ALU CU act. control signal CA (not</li>

shown) for ALU

AC <- ALU CU activates C9 from ALU to PC t<sub>2-2-2</sub>: PC <- AC CU activates C15 from AC to PC

Rev. 3.2 (2009-10) by Enrico Nardelli

### Example of Control Signal Sequence – Instruction Fetch (4)

- Optimization
  - t<sub>2-1</sub> and t<sub>2-2-1</sub> can be executed together
  - t<sub>2-2-2</sub> and t<sub>3</sub> can be executer together
- New organization

| t <sub>1</sub> :   | MAR <- PC     | C2  |    |    |
|--------------------|---------------|-----|----|----|
| ■ t <sub>2</sub> : | MBR <- memory | C0  | CR | C5 |
|                    | ALU <- PC     | C14 |    |    |
|                    | increment ALU | CA  |    |    |
|                    | AC <- ALU     | C9  |    |    |
| ■ t <sub>3</sub> : | PC <- AC      | C15 |    |    |
|                    | IR <- MBR     | C4  |    |    |

### Example of Control Signal Sequence - Direct Addressing

- Direct addressing is executed right after instruction fetch in our simplified flow diagram for the execution of an instruction cycle. Hence:
  - t<sub>4</sub>: MAR <- IR<sub>address</sub>
    - CU activates C16 to open gate from IR to MAR
  - t<sub>5</sub>: MBR <- memory
    - CU act. C0 to open gate from MAR to address bus
    - CU act. the *memory read* control signal CR
    - CU act. C5 to open gate from data bus to MBR

### **Example of Control Signal Sequence - Indirect Addressing**

- t<sub>1</sub>: MAR <- IR<sub>address</sub>
  - CU activates C16 to open gate from IR to MAR
- t<sub>2</sub>: MBR <- memory
  - CU act. C0 to open gate from MAR to address bus
  - CU act. the *memory read* control signal CR
  - CU act. C5 to open gate from data bus to MBR
- $t_3$ : MAR <- MBR
  - CU activates C8 to open gate from MBR to MAR
- t<sub>4</sub>: MBR <- memory
  - CU activates C0, CR and C5 as above

### Example of Control Signal Sequence – Execution: SUM

- $t_1$ : ALU <- AC + MBR
  - CU activates C6 to open gate from MBR to ALU
  - CU activates C7 to open gate from AC to ALU
  - CU activates for the ALU the *sum* control signal CS
- t<sub>2</sub>: AC <- ALU
  - CU activates C9 to open gate from ALU to AC
- t<sub>2</sub>: MBR <- AC; MAR <- IR<sub>address</sub>
  - CU activates C11 to open gate from AC to MBR
  - CU activates C16 to open gate from IR to MAR
- t<sub>2</sub>: memory <- MBR
  - CU activates C0 to open gate from MAR to address bus
  - CU activates C12 to open gate from MBR to data bus
  - CU activates the *memory write* control signal CW
- NOTE: Now ALU's output needs to be "buffered": this means that its output lines are not directly coming from the internal combinational circuits but from a registry (*buffer*) that receives combinational circuits outputs and store them for a subsequent reading

### Limitations

- The simplified internal schema of a CPU does not show registers, hence we cannot show
  - Register addressing
  - Register indirect addressing
  - Base addressing
  - Indexed addressing
  - Combination of displacement and indirect addressing
  - Try adding to the simplified schema one or more of the above addressing modalities and derive the required micro-operations!

### **Internal Organization of CPU**

- Usually a single internal bus
  - less complex then having direct data paths between registers and ALU
- Control gates control movement of data onto and off the internal bus
- Control signals control also data transfer to and from external systems bus
- Temporary registers (i.e., buffers) in input to ALU are now needed for proper operation of ALU
- After Appendix B try yourself deriving the control signal sequences for a single internal bus CPU!

## How to implement the instruction cycle in hardware?



### Hardwired Implementation (1)

- Consider the control unit as a combinational circuit
  - Outputs of the circuit are the control signals
  - Inputs of the circuit are:
    - ICC register bits (note that ICC is not visualized in the CPU's simplified internal schema)
    - Status flags, including interrupt-enable/disable (T)
    - Direct/indirect address bit (D)
    - Decoded opcodes (OC<sub>n</sub>)
    - Clocks (t<sub>n</sub>)
  - For each configuration of inputs produce a proper output
  - That is, the activation of a given control signal Cn has to happen when (condition A is true) OR (condition B is true) OR ...

#### Data flow diagram of the Control Unit for the hardwired implement. of the CPU's simplified schema



### **Hardwired Implementation (2)**

- Control signals P and Q code ICC: then fetch (instruction + direct addr.) is coded by P'Q', indirect by P'Q, execute by PQ', and interrupt by PQ
- Direct addressing is coded by D (D' = indirect)
- Decoded opcodes provide are further control signals
- Interrupt enable is coded by T (T' = interrupt disabled)
- Each clock t<sub>n</sub> is a control signal
- Boolean expression activating C5 in the simplified schema of a CPU (see slides 26, 27, 28 and 13):

 $C5 = P'Q'(t_2 + Dt_5) + P'Q(t_2 + t_4) + PQ'B + PQ t_4$ 

- where B is the boolean expression representing, for all opcodes activating C5, all micro-operations actually activating it
- example: if C5 is activated only by opcode 3 during  $t_2$  and  $t_4$  and by opcode 7 during  $t_3$  and  $t_4$  then *B* is: OC3( $t_2+t_4$ )+OC7( $t_3+t_4$ )

### **Hardwired Implementation (3)**

#### • Updating ICC:

- Assume no micro-procedure requires more than 5 time units
- Update P and Q using clock t<sub>6</sub>, current values of P and Q, interrupt enabled flag (T), direct address signal (D)
- At the end of direct addressing branch of fetch phase (00)

• 
$$P = P'Q' t_6 D$$
  $Q' = P'Q' t_6 D$ 

At the end of indirect addressing branch of fetch phase (00)

• 
$$P' = P'Q' t_6 D'$$
  $Q = P'Q' t_6 D'$ 

- At the end of indirect addressing phase (01)
  - $P = P'Q t_6$   $Q' = P'Q t_6$

#### • At the end of interrupt enabled branch of execution phase (10)

$$P = PQ' t_6 T \qquad \qquad Q = PQ' t_6 T$$

• At the end of interrupt disabled branch of execution phase (10)

• 
$$P' = PQ' t_6 T'$$
  
Q' = PQ' t<sub>6</sub> T'

- At the end of interrupt phase (11)
  - $P' = PQ t_6$   $Q' = PQ t_6$

### **Hardwired Implementation (4)**

- Do we forget anything?
  - Discarding t<sub>6</sub> (present in all terms) we have

$$P = P'Q' D + P'Q + PQ' T$$

$$Q = P'Q' D' + PQ' T$$
 and

and P' = P'Q'D' + PQ'T' + PQQ' = P'Q'D + P'Q + PQ'T' + PQ

| Ρ    |    | P'Q' | P′Q | PQ | PQ' |
|------|----|------|-----|----|-----|
|      |    | 00   | 01  | 11 | 10  |
| T′D′ | 00 | 0    | 1   | 0  | 0   |
| T′D  | 01 | 1    | 1   | 0  | 0   |
| TD   | 11 | 1    | 1   | 0  | 1   |
| TD'  | 10 | 0    | 1   | 0  | 1   |

| Q    |    | P'Q' | P′Q | PQ | PQ' |
|------|----|------|-----|----|-----|
|      |    | 00   | 01  | 11 | 10  |
| T′D′ | 00 | 1    | 0   | 0  | 0   |
| T′D  | 01 | 0    | 0   | 0  | 0   |
| TD   | 11 | 0    | 0   | 0  | 1   |
| TD'  | 10 | 1    | 0   | 0  | 1   |

### **Problems with the Hardwired Implementation**

- Complex sequencing & micro-operation logic
- Difficult to design and test
- Inflexible design
- Difficult to add new instruction