Please enter your name, Athena login name, and recitation section above. Enter your answers in the spaces provided below. Show your work for partial credit. You can use the extra white space and the backs of the pages for scratch work.

Problem 1. Sequential Design Problem (12 points)

You are asked to design module F which uses modules G and H to calculate results. As requests come in to F, they are sent to either G or H (as described below) and the results are collected and returned when they are ready. Modules G and H can handle multiple inputs simultaneously, but they are in-order processors so the results will always come out in the order that the requests were made. Module F should also obey this property. Module F should encapsulate G, H and a FIFO queue q as shown below.

![Diagram of Sequential Design](image)

Modules F, G, and H have the same interface, shown below, though internally they are completely different.

```markdown
interface Generic;
    method Action start(Bit#(32) x);
    method ActionValue#(Bit#(32)) getResult;
endinterface
```

The FIFO interface is:

```markdown
interface Fifo#(numeric type size, type t);
    method Action enq(t x);
    method Action deq;
    method t first;
endinterface
```

Please check your answers with the provided solutions.
The `start` method of F calls the `start` method of Module G if `x` is even and the `start` method of Module H if `x` is odd. It also enqueues `x` into an internal queue `q`. Suppose the first element of `q` is `y`. If `y` is even, the `getResult` method of F calls the `getResult` method of G, otherwise it calls the `getResult` method of H.

(A) (8 points) Please complete the code for mkF, a module that implements module F.

```verilog
module mkF (Generic);
    Generic g <- mkG;
    Generic h <- mkH;
    FIFO q <- mkFIFO(...);

    method Action start (Bit#(32) x);
    endmethod

    method ActionValue#(Bit#(32)) getResult;
    endmethod
endmodule
```

(B) Assume that modules G and H take different amounts of time to process their input. Consider how much the performance is affected by the size of FIFO `q` to answer the following.

1. (2 points) True or false: `q` of size 2 is sufficient to get maximum performance

Circle one: True ... False

2. (2 points) Provide a reason for your answer
Problem 2. Guarded Interfaces (16 points)

(A) (8 points) Consider the 1-input finite state machine (FSM) with the state transition diagram shown below. The module definition for this FSM is incomplete. Complete the rule definitions for S0-S3 and add any needed guards. If no guards are needed then leave the if (___) blank.

```plaintext
typedef enum { S0, S1, S2, S3 } StateName deriving (Bits, Eq);

interface exampleFSM;
    method Action next(Bit#(1) input);
    method StateName getState();
endinterface

module mkFSM(exampleFSM);
    Reg#(StateName) state <- mkReg(S0);
    Reg#(Bool) update <- mkReg(False);
    Reg#(Bit#(1)) in <- mkReg(0);

    method Action next(Bit#(1) input) if (!update);
        update <= True;
        in <= input;
    endmethod

    method Bit#(StateName) getState();
        return state;
    endmethod

rule State0 if (______________________________________);
    update <= False;
    state <= (____==____)?________: ________;
endrule

rule State1 if (______________________________________);
    update <= False;
    state <= (____==____)?________: ________;
endrule
```
rule State2 if (___________________________________________);
    update <= False;
    state <= (_____==_____)?___________: __________;
endrule

rule State3 if (___________________________________________);
    update <= False;
    state <= (_____==_____)?___________: __________;
endrule
endmodule

(B) (8 Points) In Lab 2, we often interacted with memory as if it was a stack. We’ve done so from a software perspective, using a dedicated register as our stack pointer. We interacted with the stack through memory requests, i.e., with loads and stores.

Suppose that we instead wish to have a small dedicated hardware stack, i.e., a Last-In-First-Out (LIFO) structure made with a vector of registers. Complete the definition of the stack module below by implementing the relevant interfaces. The stack is defined to hold 8 elements, each with a bit width of 32-bits. Include any needed guards. If a guard is not needed then leave the if (____) blank.

```
interface Stack;
    method Action push(Bit#(32) data);
    method ActionValue#(Bit#(32)) pop;
endinterface

module mkStack(Stack);
    Vector#(8, Reg#(Bit#(32))) buffer <- replicateM(mkRegU);  // Vector of n Uninitialized registers
    Reg#(Bit#(3)) ptr <- mkReg(0);
    Reg#(Bit#(4)) count <- mkReg(0);

    method Action push(Bit#(32) data) if (______________________________);  // Your implementation here
endmethod
```
Problem 3. Sequential Logic in BSV (24 points)

The code on the following page implements a simple sequential circuit as a module that computes a function over a series of clock cycles. Read the code and answer the questions about it below. Note that the code contains blanks that you will fill in as part (A) of this problem. Each blank will contain one of method (and its type), rule, or function.

(A) (2 points) For each section of the code, specify if it is a function, method, or rule specification. For the methods, also specify the type of the method (Action, ActionValue, or Value).

<table>
<thead>
<tr>
<th>Function, method, or rule</th>
<th>Method type (Action, ActionValue, or Value) or NA if not a method</th>
</tr>
</thead>
<tbody>
<tr>
<td>computeShift</td>
<td></td>
</tr>
<tr>
<td>doComputeStep</td>
<td></td>
</tr>
<tr>
<td>start</td>
<td></td>
</tr>
<tr>
<td>getResult</td>
<td></td>
</tr>
</tbody>
</table>
interface Foo#(numeric type n);
    __________ start(Bit#(n) aIn, Bit#(TLog#(n)) bIn);
    __________ #(Bit#(1)) getResult();
endinterface

module mkFoo(Foo#(n));
    Reg#(Bit#(n)) a <- mkReg(0);
    Reg#(Bit#(TLog#(n))) b <- mkReg(0);
    Reg#(Bit#(n)) i <- mkReg(0);
    Reg#(Bool) busy <- mkReg(False);

    __________ Bit#(n) computeShift(Bit#(n) in);
        Bit#(n) out = 1;
        for (Bit#(n) i =0; i < in; i = i + 1) begin
            out = 2*out;
        end
        return out;
end________

    __________ doComputeStep if (busy && i != fromInteger(valueOf(TLog#(n))));
        let shift = computeShift(i);
        a <= (b[i] == 1)? a >> shift : a;
        i <= i + 1;
end________

    __________ start(Bit#(n) aIn, Bit#(TLog#(n)) bIn) if (!busy);
        a <= aIn;
        b <= bIn;
        i <= 0;
        busy <= True;
end________

    __________ #(Bit#(1)) getResult() if (busy && i ==
        fromInteger(valueOf(TLog#(n))));
        busy <= False;
        return a[0];
end________
endmodule
(B) (6 points) The module using `mkFoo` is invoking all the methods of `mkFoo` at every clock cycle. Assume the following lines of code were run in the module before the zeroth clock cycle.

```plaintext
...  
  Foo#(8) dut <- mkFoo();
  Bit#(8) a = 8'b01101001;
  Bit#(3) b = 3'b011;

  rule runFoo;
    dut.start(a, b);
  ...
```

In the table below, fill in the register values of the `mkFoo` module at the beginning of the following clock cycles. Assume t0 corresponds to the zeroth clock cycle, and so on.

<table>
<thead>
<tr>
<th>time value</th>
<th>t0</th>
<th>t1</th>
<th>t2</th>
<th>t3</th>
<th>t4</th>
<th>t5</th>
</tr>
</thead>
<tbody>
<tr>
<td>a</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>i</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>busy</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

(C) (6 points) Assume `"$display("Result is %d", dut.getResult)"` is run in rule `getResult` of the module that is using `mkFoo`. This statement prints out the value of Result in decimal. Please answer the following questions. Remember that an invoked method can only execute when it is ready.

1. Please fill out the blank in the message shown in the console once the module is run.

   Result is ____________

2. Please write down an expression for what `computeShift` returns (in terms of `i`) and for the return value of the `getResult` method of `mkFoo` (in terms of `a` and `b`).

   `computeShift` returns: ____________________ (in terms of `i`)

   `getResult` returns: ____________________ (in terms of `a` and `b`)
(D) (4 points) Check if the module is still functionally correct with the following changes (keeping everything else the same). You DO NOT need to fill in the blanks.

1. ______ doComputeStep if (busy && i != fromInteger(valueOf(TLog#(n))));
   
   let shift = computeShift(i);
   
   i <= i + 1;
   
   a <= (b[i] == 1)? a >> shift : a;

   end

   Circle one : True … False

2. ______ doComputeStep if (i != fromInteger(valueOf(TLog#(n))));
   
   let shift = computeShift(i);
   
   a <= (b[i] == 1)? a >> shift : a;
   
   i <= i + 1;

   end

   Circle one : True … False

3. ______ doComputeStep if (busy);
   
   let shift = computeShift(i);
   
   a <= (b[i] == 1)? a >> shift : a;
   
   i <= i + 1;

   end

   Circle one : True … False

4. ______ doComputeStep if (i != fromInteger(valueOf(TLog#(n))));
   
   ...
   
   ______ start(Bit#(n) aIn, Bit#(TLog#(n)) bIn) if (a == 0);
   
   ...
   
   ______ #(Bit#(1)) getResult() if (i == fromInteger(valueOf(TLog#(n))));

   Circle one : True … False
(E) (6 points) A partial circuit diagram for the implementation of mkFoo is given below.

1. (2 points) Complete the start.rdy and getResult.rdy signals by drawing wires from the tails of the green arrows to the proper register, function, or gate output. Use **TWO** basic logic gates.

2. (4 points) Write the Boolean expressions for each write enable signal of registers a, b, i, and busy. You may use **ONLY** wire names in the figure (e.g., w1, w2, ... , w6, start.en, getResult.en) and at most **a total of eight** AND(∙), OR(+), or NOT(¬) gates across all expressions.

\[
\begin{align*}
\text{a.en} & : \text{____________________________} \\
\text{b.en} & : \text{____________________________} \\
\text{i.en} & : \text{____________________________} \\
\text{busy.en} & : \text{____________________________}
\end{align*}
\]
Problem 4. Multi-Cycle Processors (13 points)

The multicycle RISC-V processor built in Lab 6, which we call Processor A, uses:
1) a shared instruction/data memory with split request (req) and response (resp) methods,
2) a register file with two read ports (rd1, rd2 methods) and a write port (wr method).

Processor A uses three states (Fetch, Execute, LoadWait) to process instructions. It starts at Fetch and follows the state-transition diagram given below:

![State-Transition Diagram of Processor A](image)

Ben Bitdiddle has developed a new design for the register file that’s smaller and can be clocked faster but has the odd property that writes take two cycles. On the first cycle, you give it the data you want to write and on the next cycle, you tell it which register to write to. Instead of a single wr method, it now has separate wrData and wrIndex methods which must be invoked in that order. Assume that the register file takes care of storing the data in some register until the index is transmitted. To handle the two cycle write, Ben adds an extra state to create Processor B with the following state diagram:

![State-Transition Diagram of Processor B](image)

In Processor B, the wrData method is invoked in the same states that write to the register file in Processor A. Then, the RFwrite state completes the process by invoking the wrIndex method.

(A) (6 points) Assume that the memory responds three cycles after a request is made (e.g., if a load request is made on cycle 1, the memory replies in cycle 4). How many cycles would each instruction below take in Processor A and Processor B?

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Processor A</th>
<th>Processor B</th>
</tr>
</thead>
<tbody>
<tr>
<td>xor a4, a3, a1</td>
<td>___________</td>
<td>___________</td>
</tr>
<tr>
<td>blt a2, a3, loop</td>
<td>___________</td>
<td>___________</td>
</tr>
<tr>
<td>lw a3, 0x8(a2)</td>
<td>___________</td>
<td>___________</td>
</tr>
</tbody>
</table>
After looking more closely, Ben sees that some types of instructions could be executed in fewer cycles because they don’t need to use the RFwrite state. These instructions have already completed their work in the Execute state. Ben creates Processor C by adding a state transition from Execute directly back to Fetch, shown in red in the diagram below. In the diagram, there are now three paths leaving the Execute stage, labelled X, Y and Z. For each instruction, Ben can determine which path to take just by looking at the instruction’s iType.

Figure 3: State-Transition Diagram of Processor C

(B) (3 points) Fill in the table below; indicating which path (X, Y or Z) should be used for each iType to properly execute it in the fewest number of cycles in Processor C.

<table>
<thead>
<tr>
<th>iTyp</th>
<th>OP</th>
<th>OPimm</th>
<th>Branch</th>
<th>LUI</th>
<th>JAL</th>
<th>JALR</th>
<th>LOAD</th>
<th>STORE</th>
</tr>
</thead>
<tbody>
<tr>
<td>Best Path</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

(C) (4 Points) Alyssa P. Hacker looks at Ben’s design and realizes that he doesn’t actually need a separate RFwrite state at all and he could have kept the original state transition diagram from Processor A. Using Alyssa’s plan, which state would invoke the wrIndex method? What two pieces of information would need to be saved in registers for this state to use (assume the registers can only hold simple Bit values not structs)?

State: ________________________________

Info #1: ________________________________

Info #2: ________________________________
Problem 5. Bluespec Concurrency and Conflicts (18 points)

Consider the following trio of snippets of Bluespec code, each containing different versions of a pair of rules. In each version there are three registers $x$, $y$, and $z$ that are being read from and written to.

(A) (6 points) For each version, give the value tuple $(x, y, z)$ that the registers will have following a single execution of both rules given starting values of $(3, 6, 1)$ with each rule ordering.

<table>
<thead>
<tr>
<th>Rule Ordering</th>
<th>Version 1</th>
<th>Version 2</th>
<th>Version 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Concurrent</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>doA &lt; doB</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>doB &lt; doA</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

(B) (6 points) Fill in the Conflict Matrix for each version, with C representing conflicting, CF representing conflict free, and $<$ and $>$ representing that the corresponding rule of the row appears to execute before and after the corresponding rule of the column, respectively.

<table>
<thead>
<tr>
<th>Version 1</th>
<th>doA</th>
<th>doB</th>
</tr>
</thead>
<tbody>
<tr>
<td>doA</td>
<td></td>
<td></td>
</tr>
<tr>
<td>doB</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Version 2</th>
<th>doA</th>
<th>doB</th>
</tr>
</thead>
<tbody>
<tr>
<td>doA</td>
<td></td>
<td></td>
</tr>
<tr>
<td>doB</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
(C) (6 points) For each version, select the expression that when removed from rule \texttt{doA} causes \texttt{doA} and \texttt{doB} to become \textbf{CONFLICT FREE}. If removing neither of the options would do so, then select \textbf{Impossible}. If the rules are already conflict free, then select \textbf{Unnecessary}.

<table>
<thead>
<tr>
<th>Version 3</th>
<th>doA</th>
<th>doB</th>
</tr>
</thead>
<tbody>
<tr>
<td>doA</td>
<td></td>
<td></td>
</tr>
<tr>
<td>doB</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Version 1

\[ x \leq x + 1 \ \text{OR} \ \ z \leq z + x \ \text{OR} \ \ \text{Impossible} \ \text{OR} \ \ \text{Unnecessary} \]

Version 2

\[ x \leq y + 2 \ \text{OR} \ \ z \leq x - 1 \ \text{OR} \ \ \text{Impossible} \ \text{OR} \ \ \text{Unnecessary} \]

Version 3

\[ y \leq x + y \ \text{OR} \ \ z \leq z + x \ \text{OR} \ \ \text{Impossible} \ \text{OR} \ \ \text{Unnecessary} \]
Problem 6. Caches (17 points)

Consider a direct mapped cache shown above. It has 32 rows with a line size of 2 (2 words per line). Assume that data and addresses are 32 bits. Also assume that the cache has a valid bit and a dirty bit per row.

(A) (3 points) To ensure the best cache performance, which address bits should be used for the line offset (to select a word within the line)? Which bits should be used for the cache index? Which bits should be used for the tag field?

address bits used for line offset: A[_____:_____

address bits used for cache index: A[_____:_____

address bits used for tag field: A[_____:_____

(B) (2 points) Explain why your selection of bits in part (A) should result in the best cache performance.

(C) (4 points) Could the four 32-bit addresses 0x10, 0x18, 0x88, and 0x90 all be resident in the cache at the same time? **Explain why you chose your answer.**

All four addresses can be resident at the same time:   YES   NO
(D) (3 points) Assume that checking the cache on each read takes 2 cycles and that refilling the cache line on a miss takes an additional 12 cycles. If we wanted the average access time over many reads to be 2.4 cycles, what is the minimum hit ratio the cache must achieve during that period of time? You don’t need to simplify your answer.

minimum hit ratio for 2.4 cycle average access time:_______________

(E) (5 points) Now consider an 8 row direct mapped cache with a line size of 2 (2 words per line) as shown below. Suppose the cache is initially empty and the following sequence of addresses is repeatedly accessed: 0x0, 0x4, 0x10, 0x14, 0x18, 0x104, 0x210. Assuming the sequence has been accessed many times, what will be the cache hit ratio for accessing this sequence of addresses the next time around? You may use the cache below to help you.

<table>
<thead>
<tr>
<th>V</th>
<th>D</th>
<th>Tag</th>
<th>Word 0</th>
<th>Word 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>row 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 2</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 3</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 4</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 6</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>row 7</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Cache hit ratio:_______________

END OF QUIZ 2!