Skip to content

Gossmap: compaction support#8869

Open
rustyrussell wants to merge 25 commits intoElementsProject:masterfrom
rustyrussell:gossmap-compact
Open

Gossmap: compaction support#8869
rustyrussell wants to merge 25 commits intoElementsProject:masterfrom
rustyrussell:gossmap-compact

Conversation

@rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Jan 28, 2026

Based on #8878 -- Merged

We used to do compaction of the gossip store, but it was buggy. So now we do it on every startup (which is slow). Better is to do it when required, using a separate process.

@rustyrussell rustyrussell added this to the v26.04 milestone Jan 28, 2026
@rustyrussell rustyrussell requested a review from cdecker as a code owner January 28, 2026 03:43
@rustyrussell rustyrussell force-pushed the gossmap-compact branch 7 times, most recently from e967d40 to 2639a79 Compare February 3, 2026 02:59
@rustyrussell rustyrussell force-pushed the gossmap-compact branch 5 times, most recently from 5ad16e1 to 9024e31 Compare February 5, 2026 23:14
Not a complete decode, just the highlights (what channel was announced
or updated, what node was announced).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It didn't do anything, since the dev_compact_gossip_store command was
removed.  When we make it do something, it crashes since old_len is 0:

```
gossipd: gossip_store_compact: bad version
gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded)
...
gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe
gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a
gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516
gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb
gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294
gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24
gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255
gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294
gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our poor scid generation clashes badly with simplified hashing (the
next patch) leading to l1's startup time when using a generated map
moving from 4 seconds to 14 seconds.  Under CI it actually timed out
several tests.

Fixing our fake scids to be more "random" reduces it to 1.5 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's actually quite quick to load a cache-hot 308,874,377 byte
gossip_store (normal -Og build), but perf does show time spent
in siphash(), which is a bit overkill here, so drop that:

Before:
	Time to load: 66718983-78037766(7.00553e+07+/-2.8e+06)nsec
	
After: 
	Time to load: 54510433-57991725(5.61457e+07+/-1e+06)nsec

We could save maybe 10% more by disabling checksums, but having
that assurance is nice.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…changed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also put this in the store_ended message, too: so you can
tell if the equivalent_offset there really refers to this new
entry (or if two or more rewrites have happened).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the first record, and ignored by everything else.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's used by common/gossip_store.c, which is used by many things other than
gossipd.  This file belongs in common.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossmap doesn't care, so gossipd currently has to iterate through the
store to find them at startup.  Create a callback for gossipd to use
instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This way gossmap_manage can decide when to compact.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now only need to walk it if we're doing an upgrade.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
This saves gossipd from converting it:

```
lightningd-1 2026-02-02T00:50:49.505Z DEBUG   gossipd: Time to convert version 14 store: 890 msec
```

Reducing node startup time from 1.4 seconds to 0.5 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the file responsible for all the writing, so it should be
responsible for the rewriting if necessary (rather than
gossmap_manage).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip_store.c uses this to avoid two reads, and we want to use it
elsewhere too.

Also fix old comment on gossip_store_readhdr().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A new subprocess run by gossipd to create a compacted gossip store.

It's pretty simple: a linear compaction of the file.  Once it's done the amount it
was told to, then gossipd waits until it completes the last bit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This isn't called anywhere yet.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.
@rustyrussell rustyrussell enabled auto-merge (rebase) February 6, 2026 23:03
@daywalker90
Copy link
Collaborator

I have tested cc2fade and i get a crash with this test:

def test_gossip(node_factory, bitcoind):
    l1, l2, l3, l4 = node_factory.get_nodes(
        4,
        opts=[
            {},
            {},
            {},
            {},
        ],
    )
    nodes = [l1, l2, l3, l4]

    l1.fundwallet(10_000_000)
    l2.fundwallet(10_000_000)
    l3.fundwallet(10_000_000)
    l4.fundwallet(10_000_000)
    l2.rpc.fundchannel(
        l4.info["id"] + "@localhost:" + str(l4.port),
        1_000_000,
    )
    l3.rpc.fundchannel(
        l4.info["id"] + "@localhost:" + str(l4.port),
        1_000_000,
    )
    l4.rpc.fundchannel(
        l1.info["id"] + "@localhost:" + str(l1.port),
        1_000_000,
    )
    bitcoind.generate_block(1)
    sync_blockheight(bitcoind, nodes)
    l4.rpc.fundchannel(
        l1.info["id"] + "@localhost:" + str(l1.port),
        1_000_000,
    )

    bitcoind.generate_block(6)
    sync_blockheight(bitcoind, nodes)

    wait_for(lambda: len(l1.rpc.listpeerchannels()["channels"]) == 2)
    l4_chans = l1.rpc.listpeerchannels(l4.info["id"])["channels"]
    scid_l1_l4_1 = l4_chans[0]["short_channel_id"]
    scid_l2_l4 = l2.rpc.listpeerchannels(l4.info["id"])["channels"][0][
        "short_channel_id"
    ]

    wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 8)

    l1.rpc.close(scid_l1_l4_1)
    bitcoind.generate_block(6, wait_for_mempool=1)
    sync_blockheight(bitcoind, nodes)

    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)

    l1.restart()
    l1.rpc.connect(l2.info["id"] + "@localhost:" + str(l2.port))
    l1.rpc.connect(l3.info["id"] + "@localhost:" + str(l3.port))
    l1.rpc.connect(l4.info["id"] + "@localhost:" + str(l4.port))

    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)

    l2.rpc.close(scid_l2_l4)
    bitcoind.generate_block(1, wait_for_mempool=1)
    sync_blockheight(bitcoind, nodes)

    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 4)

    l1.rpc.call("dev-compact-gossip-store", [])

    # wait_for(lambda: len(l1.rpc.listpeerchannels()["channels"]) == 2)
    # wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 4)

    l4.rpc.fundchannel(
        l1.info["id"] + "@localhost:" + str(l1.port),
        1_000_000,
    )

    bitcoind.generate_block(6)
    sync_blockheight(bitcoind, nodes)

    wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: gossip_store: get delete entry offset 4055/4151 (version v25.12-278-gcc2fade)
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: common/daemon.c:46 (send_backtrace) 0x562ed90bdbfd
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: common/status.c:207 (status_failed) 0x562ed90c2515
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:534 (gossip_store_get_with_hdr) 0x562ed90b637c
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:559 (check_msg_type) 0x562ed90b63f7
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:577 (gossip_store_set_flag) 0x562ed90b67fc
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:629 (gossip_store_del) 0x562ed90b6a03
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1362 (gossmap_manage_new_block) 0x562ed90b9b31
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:430 (new_blockheight) 0x562ed90b4730
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:532 (recv_req) 0x562ed90b4a79
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x562ed90bde7c
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x562ed90cc80c
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x562ed90ccae5
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x562ed90ccb9e
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:470 (io_loop) 0x562ed90cdfab
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:609 (main) 0x562ed90b5434
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) 0x7fc07be32ca7
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: ../csu/libc-start.c:360 (__libc_start_main_impl) 0x7fc07be32d64
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x562ed90b38a0
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0xffffffffffffffff
lightningd-1 2026-02-07T14:29:18.185Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: gossip_store: get delete entry offset 4055/4151

It's a test from one of my plugins where i removed all my plugin specific lines. Also i would really like it if the commented wait_for lines work (they should, right?).

Reported-by: @daywalker90
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And tests!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…ssip_store.

Before I fixed the handling of dying channels:

```
lightning_gossipd: gossip_store: can't read hdr offset 2362/2110: Success (version v25.12-279-gb38abe6-modded)
0x6537c19ecf3a send_backtrace
        common/daemon.c:38
0x6537c19f1a1d status_failed
        common/status.c:207
0x6537c19e557a gossip_store_get_with_hdr
        gossipd/gossip_store.c:527
0x6537c19e5613 check_msg_type
        gossipd/gossip_store.c:559
0x6537c19e5a36 gossip_store_set_flag
        gossipd/gossip_store.c:577
0x6537c19e5c82 gossip_store_del
        gossipd/gossip_store.c:629
0x6537c19e8ddd gossmap_manage_new_block
        gossipd/gossmap_manage.c:1362
0x6537c19e390e new_blockheight
        gossipd/gossipd.c:430
0x6537c19e3c37 recv_req
        gossipd/gossipd.c:532
0x6537c19ed22a handle_read
        common/daemon_conn.c:35
0x6537c19fbe71 next_plan
        ccan/ccan/io/io.c:60
0x6537c19fc174 do_plan
        ccan/ccan/io/io.c:422
0x6537c19fc231 io_ready
        ccan/ccan/io/io.c:439
0x6537c19fd647 io_loop
        ccan/ccan/io/poll.c:470
0x6537c19e463d main
        gossipd/gossipd.c:609
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

It's a test from one of my plugins where i removed all my plugin specific lines. Also i would really like it if the commented wait_for lines work (they should, right?).

Nice catch! It's because I didn't correctly reset the "dying_chans" array. Fixed now, and added a reduced version of this test to be sure.

The reason your wait_for lines didn't work is that we don't immediately consider a channel closed: we give it 12 blocks before stopping its use. That allows us to keep using the channel if it turns out to be a splice. We make an exception for our own channels (in that case, we will know about the splice).

```
>       assert len(layers['layers']) == 1
E       AssertionError: assert 2 == 1
E        +  where 2 = len([{'layer': 'xpay', 'persistent': True, 'disabled_nodes': [], 'created_channels': [], 'channel_updates': [], 'constraints': [{'short_channel_id_dir': '45210x2134x44171/0', 'timestamp': 1770341134, 'minimum_msat': 289153519}, {'short_channel_id_dir': '1895x7x1895/1', 'timestamp': 1770341134, 'minimum_msat': 289007015}, {'short_channel_id_dir': '1906x1039x1906/1', 'timestamp': 1770341134, 'minimum_msat': 289008304}, {'short_channel_id_dir': '10070x60x10063/1', 'timestamp': 1770341134, 'minimum_msat': 289005726}, {'short_channel_id_dir': '18772x60x18743/0', 'timestamp': 1770341134, 'minimum_msat': 289005726}, {'short_channel_id_dir': '18623x208x18594/0', 'timestamp': 1770341134, 'minimum_msat': 289004859}, {'short_channel_id_dir': '33935x826x33727/1', 'timestamp': 1770341134, 'maximum_msat': 491501488}], 'biases': [], 'node_biases': []}, {'layer': 'xpay-94', 'persistent': False, 'disabled_nodes': [], 'created_channels': [], 'channel_updates': [], 'constraints': [], 'biases': [], 'node_biases': []}])
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…eout.

Avoids guessing what the timeout should be, use a file trigger.  This
is more optimal, and should reduce a flake in test_sql under valgrind.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants