**6.004 Worksheet Questions**

**L12 – Introduction to Pipelining**

**Latency**: the delay from when an input is established until the output associated with the input becomes valid.
- Combinational circuits: \( L = t_{PD} \)
- K-pipeline: \( L = K * t_{CLK} \)

**Throughput**: the rate at which inputs or outputs are processed.
- Combinational circuits: \( T = 1/L \)
- K-pipeline: \( T = 1/t_{CLK} \)

**Pipelining methodology**:
- Form 1-pipeline by adding registers to all outputs
- To add a pipeline stage, draw contour across all paths from inputs to outputs such that it doesn’t cross other contours and all input-output paths cross the contour in the same direction. This ensures the pipeline is well-formed (same # of registers on all input-output paths). A K-pipeline has K registers on all input-output paths.
- Contours must take into account pipelined components.

Unpipelined:
\( L = 45\text{ns}, \ T = 1/L = 1/(45\text{ns}) \)

2-stage pipeline \( [t_{CLK}=25\text{ns}] \):
\( L = 2*25 = 50\text{ns}, \ T = 1/(25\text{ns}) \)

<table>
<thead>
<tr>
<th>Clock cycle</th>
<th>i</th>
<th>i+1</th>
<th>i+2</th>
<th>i+3</th>
</tr>
</thead>
<tbody>
<tr>
<td>F &amp; G</td>
<td>[F(X)]</td>
<td>[F(X_{i+1})]</td>
<td>[F(X_{i+2})]</td>
<td>\ldots</td>
</tr>
<tr>
<td></td>
<td>[G(X)]</td>
<td>[G(X_{i+1})]</td>
<td>[G(X_{i+2})]</td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>[H(X)]</td>
<td>[H(X_{i+1})]</td>
<td>[H(X_{i+2})]</td>
<td></td>
</tr>
</tbody>
</table>

**INPUTS**

\[ A \rightarrow B \rightarrow C \rightarrow D \rightarrow E \rightarrow F \]

\[ A = 4\text{ns}, \ B = 3\text{ns}, \ C = 8\text{ns}, \ D = 4\text{ns}, \ E = 2\text{ns}, \ F = 5\text{ns} \]

\[ T = 1/8\text{ns}, \ L = 24\text{ns} \]
Pipelining trades **improved clock period and throughput** for **worse latency**. Consider the example shown below:

* The non pipelined circuit does not have any registers, so it is a combinational circuit. **Combinational circuits do not have a clock period.** Note that the latency of the combinational circuit is the sum of the latencies of each of its components.
Problem 1

A simple combinational circuit is to be pipelined for maximum throughput using a minimal number of registers. For each of the questions below, please create a valid K-stage pipeline. Show your pipelining contours and place large black circles (●) on the signal arrows to indicate the placement of ideal pipeline registers ($t_{PD} = 0, t_{SETUP} = 0$). Give the latency and throughput for each design. Remember that our convention is to place a pipeline register on each output.

(A) Show the maximum-throughput 1-stage pipeline.

\[
\begin{align*}
\text{Latency (ns)} & : \quad 12 \\
\text{Throughput (ns}^{-1}) & : \quad 1/12 \\
\end{align*}
\]

As is our convention, there is a pipeline register on the output. The entire circuit is one stage so we don’t add any other registers. The longest path is indicated with a dashed line. It has latency 12ns, and the throughput is its reciprocal ($1/12$).

(B) Show the maximum-throughput 2-stage pipeline using a minimal number of registers.

\[
\begin{align*}
\text{Latency (ns)} & : \quad 2\times6 = 12 \\
\text{Throughput (ns}^{-1}) & : \quad 1/6 \\
\end{align*}
\]

Both stages have latency 6ns, which is the best possible, since the total latency is still at least 12ns. So the clock period is 6ns and the throughput is 1/6ns.

(C) Show the maximum-throughput 3-stage pipeline using a minimal number of registers.

\[
\begin{align*}
\text{Latency (ns)} & : \quad 3\times6 = 18 \\
\text{Throughput (ns}^{-1}) & : \quad 1/6 \\
\end{align*}
\]

There is no way to improve the throughput over the 2-stage pipeline. You can see this by considering the critical path: if you wanted a throughput of more than 1/6, each stage latency would have to be <6ns. But you can only fit the 1ns and 2ns components in the first stage, the 3ns component in the second stage, and then the 4ns component in the third stage, so it’s impossible. So our 3-stage pipeline still has throughput 1/6ns. To use a minimal number of registers, we just add one register to our 2-stage pipeline; other answers are possible.
(D) Show the maximum-throughput 4-stage pipeline using a minimal number of registers.

Latency (ns): \(4 \times 4 = 16\)

Throughput (ns\(^{-1}\)): \(1/4\)

The four stages have latency 3ns, 3ns, 4ns, and 4ns.

The latency is the number of stages times the clock period, which equals the maximum latency of one stage, or 4ns here.

The throughput is 1 over the clock period, or 1/4.
Problem 2

The following 1-stage pipelined circuit computes $Z$ from the four inputs $A$, $B$, $C$, and $D$. Each component is annotated with its propagation delay in ns.

(A) Please pipeline the circuit above for maximum throughput with the minimum possible latency using ideal pipeline registers ($t_{PD} = 0$, $t_{SETUP} = 0$). Show the location of pipeline registers in the diagram above using filled-in circles, like the one shown on the $Z$ output. Please give the latency and throughput of the resulting pipelined circuit.

Latency (ns) $5 \times 3 = 15$

Throughput (1/ns) $\frac{1}{3}$

Since there’s a component with latency 3, and each component must be wholly in one pipeline stage, there will always be a pipeline stage with latency at least 3ns. So the minimum clock period is 3ns and the maximum throughput is $\frac{1}{3}$ns. This is achieved as above. Every pair of connected components with one output connected to another’s input must be separated into different pipeline stages if the components’ total latency is more than 3, which is almost every pair, only excluding the 2+1 and 1+2 at the start and end. We end up with 5 contours and 5 stages.
(B) Now suppose the “3” component is replaced by a 2-stage pipelined component with a minimum $t_{CLK}$ of 1.5ns. Again, please pipeline the circuit below for maximum throughput with the minimum possible latency using ideal pipeline registers. Show the location of pipeline registers in the diagram below using filled-in circles, like the one shown on the Z output. Please give the latency and throughput of the resulting pipelined circuit.

Latency (ns) __8*2 = 16_____

Throughput (1/ns) ___1/2_____

Now the maximum throughput is 1/2ns, and every pair of connected components with one output connected to another’s input must be separated into different pipeline stages. Note that we don’t add registers to the new 2-stage pipelined component. We assume it comes with its own internal registers, 2 along every input-output path, for the purpose of ensuring every path goes through the same number of registers. We end up with 8 contours and 8 stages.
Problem 3

An unidentified government agency has a design for a combinational device depicted below:

Although you don’t know the function of each of the component modules, they are each combinational and marked with their respective propagation delays. You have been hired to analyze and improve the performance of this device.

(A) What are the throughput and latency for the unpipelined combinational device?

Latency: __180____ ns; Throughput: ___1/180_____ ns⁻¹

The longest path is A → (B or D) → E → G → H → J, for 20+20+20+60+30+30ns

(B) Show how to pipeline the above circuit for maximum throughput, by marking locations in the diagram where registers are to be inserted. Use a minimum number of registers, but be sure to include one on the output. Assume that the registers have 0 t_{PD} and t_{SETUP}.

(mark register locations in diagram above)

Since G has the maximum latency of any component, 60ns, the maximum throughput is 1/60ns. It makes sense to draw contours right before and after G, and then in the right places so every stage has latency at most 60ns, and it turns out it works with three stages.

(C) What are the latency and throughput of your pipelined circuit?

Latency: ___180____ns; Throughput: ___1/60_____ ns⁻¹

Each stage has latency 60ns, so the clock period is 60ns. The pipelined circuit’s latency is the number of stages times that, or 180 ns. The throughput is 1 over the clock period.
Problem 4 ✴

The following circuit uses six full adder modules (as you’ve seen in lecture and lab) arranged in a combinational circuit that computes a 3-bit value $F = A + B + 5$ for 3-bit inputs A and B:

![Circuit Diagram]

The full adders have a $t_{PD}$ of 6ns.

(A) Give the latency and throughput of the combinational circuit.

Latency: __24___ ns; Throughput: ___1/24_____

The longest paths go from $A_0$ or $B_0$ through 4 full adders to $F_2$, so the latency is $4 \times$ the full adders’ propagation delay.

(B) Indicate, on the above diagram, appropriate locations to place ideal (zero-delay) registers to pipeline the circuit for maximum throughput using a minimum number of registers. Be sure to include a register on each output.

(mark circuit above)

As shown, we ensure no two connected full adders are in the same stage, and also that every input-output path goes through every contour.

(C) Give the latency and throughput of your pipelined circuit.

Latency: __24___ ns; Throughput: ___1/6_____

Clock cycle is 6ns

6.004 Worksheet Questions - 8 of 11 - L12– Introduction to Pipelining
Problem 5 ★

(A) You are provided with the circuit shown below. Each box represents some combinational logic. The number in each box is the \( t_{PD} \) of that combinational logic. The circuit has two inputs, \( X \) and \( Y \), and one output \( \text{Out} \). Pay close attention to the direction of the arrows, especially the arrows shown in **bold**. What is the latency and throughput of this combinational circuit?

[Diagram of the circuit]

**Latency (ns):** \( 22 \) 

**Throughput (1/ns):** \( \frac{1}{22} \)

Longest path is \( 7 \rightarrow 1 \rightarrow 3 \rightarrow 3 \rightarrow 2 \rightarrow 6 \)

(B) Draw contours through the circuit above to produce a valid pipelined circuit whose \( t_{CLK} = 9 \text{ns with minimum latency} \). Extra copies of the diagram are included below. Please use a large dot to indicate the location of each pipeline register. Assume that you have ideal pipeline registers (\( t_{PD}=t_{CD}=t_{Setup}=t_{Hold}=0 \text{ ns} \)). Pay close attention to the direction of each arrow to ensure that you produce a valid pipeline. What is the latency and throughput of this pipelined circuit?

[Diagram of the pipelined circuit]

**Latency (ns):** \( 27 \) 

**Throughput (1/ns):** \( \frac{1}{9} \)

3 pipeline stages
(C) You are now asked to consider the performance of this circuit using different clock periods while achieving the minimum latency. For each suggested $t_{\text{CLK}}$, specify whether or not you can create a valid pipelined circuit using that clock period. If you can, then provide the latency and throughput of the resulting circuit and specify the number of registers at each input. If it results in an invalid pipeline, enter NA for the rest of the row.

Extra copies of the circuit diagram are provided below.

<table>
<thead>
<tr>
<th>$t_{\text{CLK}}$</th>
<th>Valid/Invalid</th>
<th>Latency (ns)</th>
<th>Throughput (1/ns)</th>
<th>Pipeline registers at input X</th>
<th>Pipeline registers at input Y</th>
</tr>
</thead>
<tbody>
<tr>
<td>6 ns</td>
<td>Invalid</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
</tr>
<tr>
<td>7 ns</td>
<td>Valid</td>
<td>28</td>
<td>1/7</td>
<td>0</td>
<td>1 or 2</td>
</tr>
</tbody>
</table>

With one component of latency 7ns, a clock period of 6ns is not possible. There are several possible solutions for a clock period of 7ns.

Problem 6

A complex combinational circuit is constructed entirely from 2-input NAND gates having a propagation delay of 1 ns. If this circuit is pipelined for maximal throughput by adding (non-ideal) registers whose setup time and propagation delay are each 1 ns, what is the throughput of the resulting pipeline? Enter a number, a formula, or “CAN’T TELL”.

Throughput (ns\(^{-1}\)): \(\frac{1}{3}\)
Problem 7

The following combinational circuit computes \( F(X,Y) \) and \( G(X,Y) \) from inputs \( X \) and \( Y \). The \( t_{PD} \) (in ns) of each individual component is shown inside its box.

(A) Using ideal zero-delay registers, mark the location of the minimal number of registers necessary to achieve maximum throughput. Give the latency and throughput of your pipelined circuit.

mark diagram above

Latency: 21 \( \text{(ns)} \); Throughput: 1/7 \( \text{(1/ns)} \)

The clock period again is at least 7ns, the maximum latency of one component. 4 stages works.

Rummaging through the stockroom you find a pipelined component with two pipeline stages that can replace the “7” module. The minimum \( t_{CLK} \) for the new component is 4ns. The updated circuit is shown below.

(B) Using ideal zero-delay registers, mark the location of the minimal number of registers necessary to achieve maximum throughput. Give the latency and throughput of your pipelined circuit.

mark diagram above

Latency: 25 \( \text{(ns)} \); Throughput: 1/5 \( \text{(1/ns)} \)

Now the maximum clock period is 5ns and 5 stages works.