Gossmap: compaction support by rustyrussell · Pull Request #8869 · ElementsProject/lightning

rustyrussell · 2026-01-28T03:43:38Z

~~Based on #8878~~ -- Merged

We used to do compaction of the gossip store, but it was buggy. So now we do it on every startup (which is slow). Better is to do it when required, using a separate process.

Not a complete decode, just the highlights (what channel was announced or updated, what node was announced). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

It didn't do anything, since the dev_compact_gossip_store command was removed. When we make it do something, it crashes since old_len is 0: ``` gossipd: gossip_store_compact: bad version gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded) ... gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516 gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294 gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24 gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255 gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294 gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Our poor scid generation clashes badly with simplified hashing (the next patch) leading to l1's startup time when using a generated map moving from 4 seconds to 14 seconds. Under CI it actually timed out several tests. Fixing our fake scids to be more "random" reduces it to 1.5 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

It's actually quite quick to load a cache-hot 308,874,377 byte gossip_store (normal -Og build), but perf does show time spent in siphash(), which is a bit overkill here, so drop that: Before: Time to load: 66718983-78037766(7.00553e+07+/-2.8e+06)nsec After: Time to load: 54510433-57991725(5.61457e+07+/-1e+06)nsec We could save maybe 10% more by disabling checksums, but having that assurance is nice. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

…changed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We also put this in the store_ended message, too: so you can tell if the equivalent_offset there really refers to this new entry (or if two or more rewrites have happened). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

This is the first record, and ignored by everything else. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

It's used by common/gossip_store.c, which is used by many things other than gossipd. This file belongs in common. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossmap doesn't care, so gossipd currently has to iterate through the store to find them at startup. Create a callback for gossipd to use instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

This way gossmap_manage can decide when to compact. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

We now only need to walk it if we're doing an upgrade. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).

This saves gossipd from converting it: ``` lightningd-1 2026-02-02T00:50:49.505Z DEBUG gossipd: Time to convert version 14 store: 890 msec ``` Reducing node startup time from 1.4 seconds to 0.5 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

This is the file responsible for all the writing, so it should be responsible for the rewriting if necessary (rather than gossmap_manage). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossip_store.c uses this to avoid two reads, and we want to use it elsewhere too. Also fix old comment on gossip_store_readhdr(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

A new subprocess run by gossipd to create a compacted gossip store. It's pretty simple: a linear compaction of the file. Once it's done the amount it was told to, then gossipd waits until it completes the last bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

This isn't called anywhere yet. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.

daywalker90 · 2026-02-07T14:33:06Z

I have tested cc2fade and i get a crash with this test:

def test_gossip(node_factory, bitcoind):
    l1, l2, l3, l4 = node_factory.get_nodes(
        4,
        opts=[
            {},
            {},
            {},
            {},
        ],
    )
    nodes = [l1, l2, l3, l4]

    l1.fundwallet(10_000_000)
    l2.fundwallet(10_000_000)
    l3.fundwallet(10_000_000)
    l4.fundwallet(10_000_000)
    l2.rpc.fundchannel(
        l4.info["id"] + "@localhost:" + str(l4.port),
        1_000_000,
    )
    l3.rpc.fundchannel(
        l4.info["id"] + "@localhost:" + str(l4.port),
        1_000_000,
    )
    l4.rpc.fundchannel(
        l1.info["id"] + "@localhost:" + str(l1.port),
        1_000_000,
    )
    bitcoind.generate_block(1)
    sync_blockheight(bitcoind, nodes)
    l4.rpc.fundchannel(
        l1.info["id"] + "@localhost:" + str(l1.port),
        1_000_000,
    )

    bitcoind.generate_block(6)
    sync_blockheight(bitcoind, nodes)

    wait_for(lambda: len(l1.rpc.listpeerchannels()["channels"]) == 2)
    l4_chans = l1.rpc.listpeerchannels(l4.info["id"])["channels"]
    scid_l1_l4_1 = l4_chans[0]["short_channel_id"]
    scid_l2_l4 = l2.rpc.listpeerchannels(l4.info["id"])["channels"][0][
        "short_channel_id"
    ]

    wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 8)

    l1.rpc.close(scid_l1_l4_1)
    bitcoind.generate_block(6, wait_for_mempool=1)
    sync_blockheight(bitcoind, nodes)

    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)

    l1.restart()
    l1.rpc.connect(l2.info["id"] + "@localhost:" + str(l2.port))
    l1.rpc.connect(l3.info["id"] + "@localhost:" + str(l3.port))
    l1.rpc.connect(l4.info["id"] + "@localhost:" + str(l4.port))

    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)

    l2.rpc.close(scid_l2_l4)
    bitcoind.generate_block(1, wait_for_mempool=1)
    sync_blockheight(bitcoind, nodes)

    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 4)

    l1.rpc.call("dev-compact-gossip-store", [])

    # wait_for(lambda: len(l1.rpc.listpeerchannels()["channels"]) == 2)
    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 4)

    l4.rpc.fundchannel(
        l1.info["id"] + "@localhost:" + str(l1.port),
        1_000_000,
    )

    bitcoind.generate_block(6)
    sync_blockheight(bitcoind, nodes)

    wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)

lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: gossip_store: get delete entry offset 4055/4151 (version v25.12-278-gcc2fade)
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: common/daemon.c:46 (send_backtrace) 0x562ed90bdbfd
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: common/status.c:207 (status_failed) 0x562ed90c2515
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:534 (gossip_store_get_with_hdr) 0x562ed90b637c
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:559 (check_msg_type) 0x562ed90b63f7
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:577 (gossip_store_set_flag) 0x562ed90b67fc
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:629 (gossip_store_del) 0x562ed90b6a03
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1362 (gossmap_manage_new_block) 0x562ed90b9b31
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:430 (new_blockheight) 0x562ed90b4730
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:532 (recv_req) 0x562ed90b4a79
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x562ed90bde7c
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x562ed90cc80c
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x562ed90ccae5
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x562ed90ccb9e
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:470 (io_loop) 0x562ed90cdfab
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:609 (main) 0x562ed90b5434
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) 0x7fc07be32ca7
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ../csu/libc-start.c:360 (__libc_start_main_impl) 0x7fc07be32d64
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x562ed90b38a0
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0xffffffffffffffff
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: gossip_store: get delete entry offset 4055/4151

It's a test from one of my plugins where i removed all my plugin specific lines. Also i would really like it if the commented wait_for lines work (they should, right?).

@daywalker90

Reported-by: @daywalker90 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

And tests! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

…ssip_store. Before I fixed the handling of dying channels: ``` lightning_gossipd: gossip_store: can't read hdr offset 2362/2110: Success (version v25.12-279-gb38abe6-modded) 0x6537c19ecf3a send_backtrace common/daemon.c:38 0x6537c19f1a1d status_failed common/status.c:207 0x6537c19e557a gossip_store_get_with_hdr gossipd/gossip_store.c:527 0x6537c19e5613 check_msg_type gossipd/gossip_store.c:559 0x6537c19e5a36 gossip_store_set_flag gossipd/gossip_store.c:577 0x6537c19e5c82 gossip_store_del gossipd/gossip_store.c:629 0x6537c19e8ddd gossmap_manage_new_block gossipd/gossmap_manage.c:1362 0x6537c19e390e new_blockheight gossipd/gossipd.c:430 0x6537c19e3c37 recv_req gossipd/gossipd.c:532 0x6537c19ed22a handle_read common/daemon_conn.c:35 0x6537c19fbe71 next_plan ccan/ccan/io/io.c:60 0x6537c19fc174 do_plan ccan/ccan/io/io.c:422 0x6537c19fc231 io_ready ccan/ccan/io/io.c:439 0x6537c19fd647 io_loop ccan/ccan/io/poll.c:470 0x6537c19e463d main gossipd/gossipd.c:609 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2026-02-08T05:54:30Z

It's a test from one of my plugins where i removed all my plugin specific lines. Also i would really like it if the commented wait_for lines work (they should, right?).

Nice catch! It's because I didn't correctly reset the "dying_chans" array. Fixed now, and added a reduced version of this test to be sure.

The reason your wait_for lines didn't work is that we don't immediately consider a channel closed: we give it 12 blocks before stopping its use. That allows us to keep using the channel if it turns out to be a splice. We make an exception for our own channels (in that case, we will know about the splice).

``` > assert len(layers['layers']) == 1 E AssertionError: assert 2 == 1 E + where 2 = len([{'layer': 'xpay', 'persistent': True, 'disabled_nodes': [], 'created_channels': [], 'channel_updates': [], 'constraints': [{'short_channel_id_dir': '45210x2134x44171/0', 'timestamp': 1770341134, 'minimum_msat': 289153519}, {'short_channel_id_dir': '1895x7x1895/1', 'timestamp': 1770341134, 'minimum_msat': 289007015}, {'short_channel_id_dir': '1906x1039x1906/1', 'timestamp': 1770341134, 'minimum_msat': 289008304}, {'short_channel_id_dir': '10070x60x10063/1', 'timestamp': 1770341134, 'minimum_msat': 289005726}, {'short_channel_id_dir': '18772x60x18743/0', 'timestamp': 1770341134, 'minimum_msat': 289005726}, {'short_channel_id_dir': '18623x208x18594/0', 'timestamp': 1770341134, 'minimum_msat': 289004859}, {'short_channel_id_dir': '33935x826x33727/1', 'timestamp': 1770341134, 'maximum_msat': 491501488}], 'biases': [], 'node_biases': []}, {'layer': 'xpay-94', 'persistent': False, 'disabled_nodes': [], 'created_channels': [], 'channel_updates': [], 'constraints': [], 'biases': [], 'node_biases': []}]) ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

…eout. Avoids guessing what the timeout should be, use a file trigger. This is more optimal, and should reduce a flake in test_sql under valgrind. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell added this to the v26.04 milestone Jan 28, 2026

rustyrussell requested a review from cdecker as a code owner January 28, 2026 03:43

rustyrussell force-pushed the gossmap-compact branch 7 times, most recently from e967d40 to 2639a79 Compare February 3, 2026 02:59

rustyrussell force-pushed the gossmap-compact branch 5 times, most recently from 5ad16e1 to 9024e31 Compare February 5, 2026 23:14

rustyrussell added 16 commits February 7, 2026 07:30

devtools: enhance dump-gossipstore to show some details of messages.

1d36a60

Not a complete decode, just the highlights (what channel was announced or updated, what node was announced). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

tools: delete gossip_store of needed for downgrade even if db hasn't …

d9d3446

…changed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossip_store: add UUID entry at front of the store.

014a6b9

We also put this in the store_ended message, too: so you can tell if the equivalent_offset there really refers to this new entry (or if two or more rewrites have happened). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

common/gossmap: use the UUID record on reopen.

ee66199

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossipd: write uuid record on startup.

a9d9fe1

This is the first record, and ignored by everything else. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

common: move gossip_store_wire.csv into common/ from gossipd/

4de03fb

It's used by common/gossip_store.c, which is used by many things other than gossipd. This file belongs in common. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossmap: add callback for gossipd to see dying messages.

e2b675f

gossmap doesn't care, so gossipd currently has to iterate through the store to find them at startup. Create a callback for gossipd to use instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossmap: keep stats on live/deleted records.

c6114ee

This way gossmap_manage can decide when to compact. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossipd: use gossmap to load the dying entries.

0c6bf04

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossipd: don't gather dying channels during compaction.

79519ca

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossipd: don't compact on startup.

191f0ab

We now only need to walk it if we're doing an upgrade. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).

gossipd: put the last_writes array inside struct gossip_store.

87e8a12

This is the file responsible for all the writing, so it should be responsible for the rewriting if necessary (rather than gossmap_manage). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

common: expose gossip_store "header and type" single-read struct.

c102567

gossip_store.c uses this to avoid two reads, and we want to use it elsewhere too. Also fix old comment on gossip_store_readhdr(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell force-pushed the gossmap-compact branch from 9024e31 to cc2fade Compare February 6, 2026 21:16

rustyrussell added 3 commits February 7, 2026 07:47

gossipd: code to invoke compactd and reopen store.

d96060b

This isn't called anywhere yet. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossipd: compact when gossip store is 80% deleted records.

e37e024

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.

rustyrussell enabled auto-merge (rebase) February 6, 2026 23:03

rustyrussell added 3 commits February 8, 2026 16:17

gossipd: reset dying_channels array after compact.

d09021e

Reported-by: @daywalker90 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

gossipd: dev-compact-gossip-store to manually invoke compaction.

2ab74f5

And tests! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell force-pushed the gossmap-compact branch from cc2fade to 934ead4 Compare February 8, 2026 06:25

pytest: make hold_timeout.py test plugin release on a prompt, not tim…

4b347dc

…eout. Avoids guessing what the timeout should be, use a file trigger. This is more optimal, and should reduce a flake in test_sql under valgrind. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell force-pushed the gossmap-compact branch from dd4c02f to 4b347dc Compare February 8, 2026 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gossmap: compaction support#8869

Gossmap: compaction support#8869
rustyrussell wants to merge 25 commits intoElementsProject:masterfrom
rustyrussell:gossmap-compact

rustyrussell commented Jan 28, 2026 •

edited

Loading

Uh oh!

daywalker90 commented Feb 7, 2026

Uh oh!

rustyrussell commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rustyrussell commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daywalker90 commented Feb 7, 2026

Uh oh!

rustyrussell commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rustyrussell commented Jan 28, 2026 •

edited

Loading