## PIPELINE DESIGN DATA PATH AND CONTROL SYNTHESIS

### © Giovanni De Micheli

Stanford University

Outline

- © GDM -
- Synthesis of pipelined circuits:
  - Scheduling.
  - Binding.
- Data-path synthesis.
- Control-unit synthesis.

# High-level synthesis of pipelined circuits

— © GDM —

- Pipeline circuits:
  - Concurrent execution of operations on different data sets.
  - Increase *throughput*:
    - \* I/O data rate.
  - Preserve latency.
- Applicable to:
  - General purpose processors.
  - Digital signal processors.



### Synthesis of pipelined circuits

\_\_\_\_\_ © GDM \_\_\_\_\_

- DSP applications:
  - Mainly data-path pipelining.
  - Few exceptions/interrupts.
  - Mature area.
- Microprocessors:
  - Advanced features:
    - \* Stalls, flush, bypass, hazard avoidance.
  - Synthesis tools not ready yet.

# Issues in synthesis of pipelined circuits © GDM Partitioning: Pipe-stage formation. Scheduling: Source vertex of the sequencing graph fired at constant rate. Sharing:

- More concurrency.
- Binding and scheduling are affected.

### Scheduling of pipelined circuits

\_\_\_\_\_ © GDM \_\_

- Scheduling of non-pipelined circuit using pipelined resources.
- Scheduling of pipelined circuit using non-pipelined resources.
  - Functional pipelining.
- Both problems can be modeled by ILP.

### Scheduling for functional pipelining

\_\_\_\_\_ © GDM \_\_\_\_\_

- Choose:
  - cycle-time.
  - data-introduction interval  $\delta_0$ .
- Determine (area, latency) spectrum.
- Key fact:
  - Simultaneous operations at steps:
    - \*  $l + p\delta_0$
  - Reduced sharing.

# Scheduling for functional pipelining ILP model

- © GDM -

 $\sum_{p=0}^{\lceil\overline{\lambda}/\delta_0\rceil-1}$  $\sum_{i:\mathcal{T}(v_i)=k} \sum_{m=l-d_i+1+p\delta_0}^{l+p\phi_0} x_{im} \leq a_k \;\; orall k, orall k$ 

- Used in conjunction with other constraints.
- Use regular ILP solvers.



• List scheduling:

- Compute resource usage at each step.
- Determine candidates.
- Force-directed scheduling.
  - Operation-type distribution:
    - \* Account for overlapping.



• Distribution graphs for multiplier and ALU.



- Pipeline operations inside a loop.
  - Overlap execution of operations.
  - Need a prologue and epilogue.
- Use pipeline scheduling for loop graph model.





CONTROL-UNIT

DATA-PATH

- Distributed FSM.



### **FSM**-based implementation

\_\_\_\_ © GDM \_\_

- Simple model:
  - next-state function: unconditional.
  - *output function*: activate operations.
- Extended model:
  - Branching and iteration:
    - \* Conditional next-state function.
  - Hierarchy:
    - \* Hierarchical FSM connection.



- © GDM -





### Microcode implementation

\_\_\_\_\_© GDM -

- Horizontal microcode:
  - One bit per *activation* signal.
  - One microcode word per schedule level.
  - Maximum performance.
  - Wide words.
- Vertical microcode:
  - Encode each resource *activation* signal.
  - Shorter words.
  - One (or more) words per schedule level.







### Microcode compaction problem



- Partition ROM word into fields.
- Encode signals in each field.
- Allow for a code for NOP.
- Activation signals in each field must not be concurrent.
- Problems:
  - Minimize number of fields.
  - Minimize total ROM width.

### Example of vertical microcode

Microwords

 $\begin{array}{c} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 \end{array}$ 

0101

Decoder Activation signals



### Microcode optimization

\_\_\_\_\_ © GDM \_\_\_\_

- Conflict graph:
  - Concurrent operations.
  - Optimum vertex coloring yields minimum number of fields.
- Compatibility graph:
  - Non-concurrent operations.
  - Optimum clique partitioning yields minimum number of fields.
  - Minimum weighted clique partitioning yields minimum number of bits.

### Example

- © GDM -

| field       | ор | code |
|-------------|----|------|
| А           | 1  | 01   |
| А           | 3  | 10   |
| А           | 4  | 11   |
| В           | 2  | 1    |
| с<br>с<br>с | 6  | 01   |
| С           | 7  | 10   |
| С           | 5  | 11   |
| D           | 8  | 01   |
| D           | 9  | 10   |
| E           | 10 | 01   |
| E           | 11 | 10   |





- Handshake:
  - activate signals.
  - condition signals.
  - reset signals.

### Example

– © GDM –





### Control synthesis for unbounded-latency sequencing graphs

— © GDM —

- Data-dependent delay operations.
  - activate signals.
  - completion signals.
- Synchronization problem:
  - Wait on completion signals.
  - Wait on external synchronization.
- Several strategies.
  - Clustering.
  - Adaptive Control.
  - Relative Scheduling.

### Summary Control synthesis

— © GDM —

- Different approaches.
- Implementations:
  - FSM, connection of FSMs or ROM.
- Techniques:
  - Bounded delays only:
    - \* FSM microcode.
  - Unbounded delays:
    - \* Different methods to provide synchronizatio