Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
38a8016
prototype affine controller design
ShangkunLi Feb 20, 2026
e8beb87
add three-level nested loop test
ShangkunLi Feb 24, 2026
30b997d
simplify dispatch logic
ShangkunLi Feb 24, 2026
03cbf99
add cross affine controller test
ShangkunLi Feb 24, 2026
33b732d
fix cmd id conflict
ShangkunLi Feb 24, 2026
d58bd57
modify workflow time
ShangkunLi Feb 25, 2026
4a7125c
update cmd bits 5->6
ShangkunLi Feb 25, 2026
ae5e5e2
Change the format
ShangkunLi Feb 26, 2026
979e359
feat: add direct path from register to routing crossbar, bypassing FU
guosran Mar 13, 2026
3f77bd9
init, halfway
Jackcuii Oct 14, 2025
cc139eb
at least seems ok, not tested
Jackcuii Oct 22, 2025
92e372b
fix all the bugs for the backbone
Jackcuii Nov 3, 2025
0f87a44
prologue
Jackcuii Nov 4, 2025
6c49b82
dummy version okay
Jackcuii Nov 9, 2025
9718a31
Pass the test with the real type.
Jackcuii Nov 9, 2025
8fe3e02
fix bugs and add yaml
Jackcuii Feb 18, 2026
d8c928c
tolerate const format
Jackcuii Feb 18, 2026
6c8d44a
delete un-used files
Jackcuii Feb 24, 2026
d87ad58
fix CI test
Jackcuii Feb 24, 2026
aa7ee79
delete unused files
Jackcuii Feb 25, 2026
efd4823
update preface and delete comments
Jackcuii Feb 25, 2026
b7e821c
clarify usage
Jackcuii Feb 25, 2026
255f8eb
edit comment
Jackcuii Feb 25, 2026
86ef1f6
[SV tests] Dynamically handshake rdy & val using an unbounded queue.
rp15 Mar 4, 2026
8cd6c9f
[SV tests] Comment describing the new data struct.
rp15 Mar 4, 2026
fc9e19d
Read reg has higher priority than routing.
yyan7223 Mar 12, 2026
9b814b0
Fixes bugs and comments.
yyan7223 Mar 12, 2026
b9f33fa
fix: update TileInType in test_readReg_routing_priority test
guosran Mar 13, 2026
7d8016c
Rename read_reg_from to read_reg_towards with 2-bit semantics
guosran Mar 13, 2026
8741e6c
Fix remaining b1 to b2 type conversions in read_reg_towards arrays
guosran Mar 13, 2026
2abfb55
refactor: move READ_TOWARDS_* constants to data_struct_attr, rename s…
guosran Mar 14, 2026
b440ed5
Move READ_TOWARDS enums to common
guosran Mar 14, 2026
9a6d2da
Updated TileRTL_test.py
guosran Mar 14, 2026
ea25657
Add tests
guosran Mar 14, 2026
9d2ffc6
Edited tests for syntax updates
guosran Mar 14, 2026
7c744fd
Corrected SV test files
guosran Mar 14, 2026
b4b2f83
Edited SV files and renamed constants
guosran Mar 14, 2026
d748294
Disabled SV test
guosran Mar 15, 2026
20a451d
Fix systolic SV tb: update packet hash and add explicit bit-widths to…
guosran Mar 15, 2026
57eb076
Re-enable SV tb CI step
guosran Mar 15, 2026
26ca13a
rename AC -> LC
ShangkunLi Mar 19, 2026
958a23a
Merge branch 'master' into affine-controller
ShangkunLi Mar 19, 2026
95c3d99
change bitwidth for sv_test
ShangkunLi Mar 19, 2026
5399515
update sv_test hash value
ShangkunLi Mar 19, 2026
db93ac4
add figure & doc for loop controller
ShangkunLi Mar 19, 2026
5022f77
update comment
ShangkunLi Mar 19, 2026
702eb76
add id explicitly in test
ShangkunLi Mar 19, 2026
3c2a8ce
add comments for ctrl_add in DCU and LC
ShangkunLi Mar 20, 2026
279925c
extend the sythesis time
ShangkunLi Mar 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ jobs:
end_time=$(date +%s)
duration=$((end_time - start_time))
echo "Synthesis step duration: $duration seconds"
if [ $duration -gt 900 ]; then
echo "ERROR: Synthesis step took longer than 15 minutes!"
if [ $duration -gt 1800 ]; then
echo "ERROR: Synthesis step took longer than 30 minutes!"
exit 1
fi

Expand Down
1 change: 0 additions & 1 deletion cgra/test/CgraRTL_fir_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1017,7 +1017,6 @@ def sim_fir_return(cmdline_opts, mem_access_is_combinational):
FuOutType(0), FuOutType(0), FuOutType(0), FuOutType(0),
FuOutType(0), FuOutType(0), FuOutType(0), FuOutType(0)]),
data = DataType(1, 1))),

# Launch the tile.
IntraCgraPktType(0, 0, payload = CgraPayloadType(CMD_LAUNCH))
],
Expand Down
405 changes: 405 additions & 0 deletions controller/LoopControllerRTL.py

Large diffs are not rendered by default.

394 changes: 394 additions & 0 deletions controller/test/LoopControllerRTL_test.py

Large diffs are not rendered by default.

131 changes: 131 additions & 0 deletions doc/figures/loop_controller/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Loop Controller (LC) Architecture

## Overview

The Loop Controller (LC) is a hardware module inside the CGRA Controller that manages **outer loop counters**. It works alongside the existing LoopCounter FU (DCU) on the tile array, which handles the innermost loop counting.

![LC in CGRA Overview](loop_controller_hierarchy.png)


## Where LC Sits in the Architecture

```
CPU
│ CMD_LC_CONFIG_* (via NoC → Controller)
┌──────────────────────────────────────────────────┐
│ CGRA │
│ │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ Controller │ │ Tile Array │ │
│ │ ┌────────────┐ │ │ ┌──────┐ ┌──────┐ │ │
│ │ │ Crossbar │ │ │ │Tile 0│ │Tile 1│ │ │
│ │ └────────────┘ │ │ │(DCU) │ │(DCU) │ │ │
│ │ ┌────────────┐ │ │ │Count │ │Deliv.│ │ │
│ │ │GlobalReduce│ │ │ └──────┘ └──────┘ │ │
│ │ └────────────┘ │ │ ┌──────┐ ┌──────┐ │ │
│ │ ┌────────────┐ │ │ │Tile 2│ │Tile 3│ │ │
│ │ │ Loop │◄─┼────┼──│(FU) │ │(FU) │ │ │
│ │ │ Controller │──┼────┼─►│ │ │ │ │ │
│ │ │ (LC) │ │ │ └──────┘ └──────┘ │ │
│ │ └────────────┘ │ └────────────────────┘ │
│ └──────────────────┘ │
│ ▲ │ │
│ │ │ send_to_remote / recv_from_remote │
└─────────┼──┼─────────────────────────────────────┘
│ │
│ ▼
┌─────────────┐
│ Other CGRA │
│ (with LC) │
└─────────────┘
```

## Command Flow

### 1. Configuration (CPU → Controller → LC)

The CPU sends config commands via the NoC. The Controller routes them to the LC.

| Command | Description | Sender → Receiver |
|---------|-------------|-------------------|
| `CMD_LC_CONFIG_LOWER` | Set loop lower bound | CPU → Controller → LC |
| `CMD_LC_CONFIG_UPPER` | Set loop upper bound | CPU → Controller → LC |
| `CMD_LC_CONFIG_STEP` | Set loop step | CPU → Controller → LC |
| `CMD_LC_CONFIG_CHILD_COUNT` | Set required child completions | CPU → Controller → LC |
| `CMD_LC_CONFIG_TARGET` | Register a target DCU | CPU → Controller → LC |
| `CMD_LC_CONFIG_PARENT` | Set parent CCU relationship | CPU → Controller → LC |
| `CMD_LC_LAUNCH` | Start all configured CCUs | CPU → Controller → LC |

### 2. Dispatch (LC → Tile Array DCUs)

When a CCU advances its loop variable, it dispatches commands to its target DCUs (1 command per target per cycle):

| Command | Target Type | Description | Sender → Receiver |
|---------|------------|-------------|-------------------|
| `CMD_RESET_LEAF_COUNTER` | Leaf DCU (`OPT_LOOP_COUNT`) | Reset inner loop counter to start next iteration | LC → DCU (tile) |
| `CMD_UPDATE_COUNTER_SHADOW_VALUE` | Delivery DCU (`OPT_LOOP_DELIVERY`) | Update shadow register with outer loop variable | LC → DCU (tile) |

### 3. Completion (DCU → LC)

When a leaf DCU finishes its inner loop, it sends a completion signal back to the LC:

| Command | Description | Sender → Receiver |
|---------|-------------|-------------------|
| `CMD_LEAF_COUNTER_COMPLETE` | Inner loop finished | DCU (tile) → LC |

### 4. Cross-CGRA Communication (LC ↔ LC)

For multi-CGRA loop nesting, LCs communicate via the inter-CGRA NoC:

| Command | Description | Sender → Receiver |
|---------|-------------|-------------------|
| `CMD_RESET_LEAF_COUNTER` | Reset remote DCU | LC (CGRA-0) → NoC → DCU (CGRA-1) |
| `CMD_UPDATE_COUNTER_SHADOW_VALUE` | Update remote shadow | LC (CGRA-0) → NoC → DCU (CGRA-1) |
| `CMD_LC_CHILD_COMPLETE` | Remote loop finished | LC (CGRA-1) → NoC → LC (CGRA-0) |

## Internal Structure: CCU DAG

```
Example: for(i) for(j) for(k) body(i,j,k)

┌──────────────┐
│ CCU[0] │ i = 0..N (root)
│ child_cnt=1 │
└──────┬───────┘
│ internal completion
┌──────▼───────┐
│ CCU[1] │ j = 0..M (parent=CCU[0])
│ child_cnt=1 │
└──────┬───────┘
│ targets
┌────┴────┐
▼ ▼
┌─────┐ ┌─────┐
│DCU-A│ │DCU-B│
│Count│ │Deliv│ (on tile array)
│k=0.K│ │j val│
└─────┘ └─────┘
```

- **CCU[0]** (root): Manages `i`. Target = delivery DCU for `i` value (shadow_only).
- **CCU[1]**: Manages `j`. Parent = CCU[0]. Targets = leaf DCU for `k` loop (reset) + delivery DCU for `j` value (shadow_only).
- When CCU[1] completes (j reaches bound), it **internally notifies** CCU[0] in the same cycle.
- When CCU[0] finishes dispatch, it **auto-resets** CCU[1] back to `j = lower_bound`.

## CCU State Machine

```
CMD_LC_LAUNCH
IDLE ──────────────→ RUNNING ◄────────────────┐
│ │
child completions │
count >= threshold │
│ │
▼ │
┌──────────┐ all targets │
value >= upper │DISPATCHING│──dispatched───→─┘
──────────→ └──────────┘
COMPLETE (1 cmd/cycle/target)
(no dispatch)
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 17 additions & 9 deletions fu/single/LoopCounterRTL.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,25 @@
Loop Counter (DCU) for CGRA tile.

This FU manages loop counters indexed by ctrl_addr from control memory.
Each ctrl_addr has:

NOTE: `ctrl_addr` serves as a "Context ID" or "Loop ID" multiplexer.
If ctrl_mem_size > 1, a single LoopCounter (DCU) can support MULTIPLE independent loops
simultaneously by mapping each loop's state to a different `ctrl_addr`.
In a spatial-only CGRA (where ctrl_mem_size == 1), this DCU only has 1 ctrl_addr
and thus can only support ONE loop. To support nested loops in spatial-only CGRAs,
different loops must be mapped to different tiles.

Each ctrl_addr (loop context) has:
- Leaf counter state (lower_bound, upper_bound, step, current_value)
- Shadow register (for relay/root counter values from AC)
- Shadow register (for relay/root counter values from LC)

Supports two operation modes per ctrl_addr:
1. OPT_LOOP_COUNT: Loop-Driven Mode (Leaf counter execution)
2. OPT_LOOP_DELIVERY: Loop-Delivery Mode (Shadow register output)

Configuration methods:
1. Initial config (via ConstQueue): Configure leaf counter parameters
2. Runtime updates (via AC): Reset leaf counter or update shadow register
1. Initial config (via Controller): Configure leaf counter parameters
2. Runtime updates (via LC): Reset leaf counter or update shadow register

Author : Shangkun Li
Date : January 27, 2026
Expand Down Expand Up @@ -55,7 +63,7 @@ def construct(s, CtrlPktType, num_inports, num_outports, vector_factor_power = 0
s.current_ctrl_addr = Wire(s.CtrlAddrType)
s.loop_terminated = Wire(1)

# Update triggers from AC (CMD).
# Update triggers from LC (CMD).
s.cmd_reset_counter = Wire(1)
s.cmd_update_shadow = Wire(1)
s.cmd_config_lower = Wire(1)
Expand Down Expand Up @@ -138,7 +146,7 @@ def comb_logic():
s.send_out[0].val @= b1(0)
s.recv_opt.rdy @= b1(0)

# ===== Handle messages from AC (CMD updates) =====
# ===== Handle messages from LC (CMD updates) =====
if s.recv_from_ctrl_mem.val:
s.recv_from_ctrl_mem.rdy @= b1(1)
s.target_ctrl_addr @= s.recv_from_ctrl_mem.msg.ctrl_addr
Expand Down Expand Up @@ -190,7 +198,7 @@ def update_leaf_counters():
s.leaf_current_value[addr].payload + s.leaf_step[addr].payload, b1(1)
)

# Runtime reset from AC.
# Runtime reset from LC.
if s.cmd_reset_counter:
addr = s.target_ctrl_addr
s.leaf_current_value[addr] <<= s.leaf_lower_bound[addr]
Expand All @@ -202,7 +210,7 @@ def update_shadow_registers():
s.shadow_regs[i] <<= s.DataType(0, 0)
s.shadow_valid[i] <<= b1(0)
else:
# Runtime update from AC.
# Runtime update from LC.
if s.cmd_update_shadow:
addr = s.target_ctrl_addr
s.shadow_regs[addr] <<= s.target_ctrl_data
Expand All @@ -224,7 +232,7 @@ def update_already_done():
addr = s.current_ctrl_addr
s.already_done[addr] <<= b1(1)

# Resets done flag when counter is reset from AC.
# Resets done flag when counter is reset from LC.
if s.cmd_reset_counter:
addr = s.target_ctrl_addr
s.already_done[addr] <<= b1(0)
Expand Down
36 changes: 32 additions & 4 deletions lib/cmd_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

# Total number of commands that are supported/recognized by controller.
# Needs to be updated once more commands are added/supported.
NUM_CMDS = 32
NUM_CMDS = 43

CMD_LAUNCH = 0
CMD_PAUSE = 1
Expand Down Expand Up @@ -42,12 +42,29 @@
CMD_CONFIG_STREAMING_LD_START_ADDR = 23
CMD_CONFIG_STREAMING_LD_STRIDE = 24
CMD_CONFIG_STREAMING_LD_END_ADDR = 25
CMD_UPDATE_COUNTER_SHADOW_VALUE = 26 # Updates shadow registers value from AC.
CMD_RESET_LEAF_COUNTER = 27 # Resets leaf counter to lower_bound.
CMD_UPDATE_COUNTER_SHADOW_VALUE = 26 # LC -> Target Tile (DCU): update shadow register
CMD_RESET_LEAF_COUNTER = 27 # LC -> Target Tile (DCU): reset counter to lower_bound
CMD_CONFIG_LOOP_LOWER = 28
CMD_CONFIG_LOOP_UPPER = 29
CMD_CONFIG_LOOP_STEP = 30
CMD_LEAF_COUNTER_COMPLETE = 31
CMD_LEAF_COUNTER_COMPLETE = 31 # Target Tile (DCU) -> LC: innermost loop finished

# Loop Controller (LC) Configuration Commands (from Controller).
CMD_LC_CONFIG_LOWER = 32 # Controller -> LC: Configures CCU lower_bound
CMD_LC_CONFIG_UPPER = 33 # Controller -> LC: Configures CCU upper_bound
CMD_LC_CONFIG_STEP = 34 # Controller -> LC: Configures CCU step
CMD_LC_CONFIG_CHILD_COUNT = 35 # Controller -> LC: Configures child_complete_count
CMD_LC_CONFIG_TARGET = 36 # Controller -> LC: Configures target (tile_id, ctrl_addr, is_remote, cgra_id)
CMD_LC_CONFIG_PARENT = 37 # Controller -> LC: Configures parent_ccu_id, is_root, is_relay
CMD_LC_LAUNCH = 38 # Controller -> LC: Launches LC (all CCUs enter RUNNING)

# Loop Controller Inter-CGRA Sync Commands.
CMD_LC_SYNC_VALUE = 39 # Parent LC (this CGRA) -> Child LC (another CGRA): sync current value
CMD_LC_CHILD_COMPLETE = 40 # Child LC (another CGRA) -> Parent LC (this CGRA): child complete
CMD_LC_CHILD_RESET = 41 # Parent LC (this CGRA) -> Child LC (another CGRA): reset child

# Loop Controller Status.
CMD_LC_ALL_COMPLETE = 42 # LC -> Controller: all outer loops complete

CMD_SYMBOL_DICT = {
CMD_LAUNCH: "(LAUNCH_KERNEL)",
Expand Down Expand Up @@ -82,5 +99,16 @@
CMD_CONFIG_LOOP_UPPER: "(CONFIG_LOOP_UPPER)",
CMD_CONFIG_LOOP_STEP: "(CONFIG_LOOP_STEP)",
CMD_LEAF_COUNTER_COMPLETE: "(LEAF_COUNTER_COMPLETE)",
CMD_LC_CONFIG_LOWER: "(LC_CONFIG_CCU_LOWER)",
CMD_LC_CONFIG_UPPER: "(LC_CONFIG_CCU_UPPER)",
CMD_LC_CONFIG_STEP: "(LC_CONFIG_CCU_STEP)",
CMD_LC_CONFIG_CHILD_COUNT: "(LC_CONFIG_CCU_CHILD_COUNT)",
CMD_LC_CONFIG_TARGET: "(LC_CONFIG_CCU_TARGET)",
CMD_LC_CONFIG_PARENT: "(LC_CONFIG_CCU_PARENT)",
CMD_LC_LAUNCH: "(LC_LAUNCH)",
CMD_LC_SYNC_VALUE: "(LC_SYNC_VALUE)",
CMD_LC_CHILD_COMPLETE: "(LC_CHILD_COMPLETE)",
CMD_LC_CHILD_RESET: "(LC_CHILD_RESET)",
CMD_LC_ALL_COMPLETE: "(LC_ALL_COMPLETE)",
}

Loading
Loading