Gossmap: compaction support#8869
Gossmap: compaction support#8869rustyrussell wants to merge 25 commits intoElementsProject:masterfrom
Conversation
e967d40 to
2639a79
Compare
5ad16e1 to
9024e31
Compare
Not a complete decode, just the highlights (what channel was announced or updated, what node was announced). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It didn't do anything, since the dev_compact_gossip_store command was removed. When we make it do something, it crashes since old_len is 0: ``` gossipd: gossip_store_compact: bad version gossipd: FATAL SIGNAL 6 (version v25.12rc3-1-g9e6c715-modded) ... gossipd: backtrace: ./stdlib/abort.c:79 (__GI_abort) 0x7119bd8288fe gossipd: backtrace: ./assert/assert.c:96 (__assert_fail_base) 0x7119bd82881a gossipd: backtrace: ./assert/assert.c:105 (__assert_fail) 0x7119bd83b516 gossipd: backtrace: gossipd/gossip_store.c:52 (append_msg) 0x56294de240eb gossipd: backtrace: gossipd/gossip_store.c:358 (gossip_store_compact) 0x56294 gossipd: backtrace: gossipd/gossip_store.c:395 (gossip_store_new) 0x56294de24 gossipd: backtrace: gossipd/gossmap_manage.c:455 (setup_gossmap) 0x56294de255 gossipd: backtrace: gossipd/gossmap_manage.c:488 (gossmap_manage_new) 0x56294 gossipd: backtrace: gossipd/gossipd.c:400 (gossip_init) 0x56294de22de9 ``` Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our poor scid generation clashes badly with simplified hashing (the next patch) leading to l1's startup time when using a generated map moving from 4 seconds to 14 seconds. Under CI it actually timed out several tests. Fixing our fake scids to be more "random" reduces it to 1.5 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's actually quite quick to load a cache-hot 308,874,377 byte gossip_store (normal -Og build), but perf does show time spent in siphash(), which is a bit overkill here, so drop that: Before: Time to load: 66718983-78037766(7.00553e+07+/-2.8e+06)nsec After: Time to load: 54510433-57991725(5.61457e+07+/-1e+06)nsec We could save maybe 10% more by disabling checksums, but having that assurance is nice. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…changed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We also put this in the store_ended message, too: so you can tell if the equivalent_offset there really refers to this new entry (or if two or more rewrites have happened). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the first record, and ignored by everything else. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's used by common/gossip_store.c, which is used by many things other than gossipd. This file belongs in common. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossmap doesn't care, so gossipd currently has to iterate through the store to find them at startup. Create a callback for gossipd to use instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This way gossmap_manage can decide when to compact. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We now only need to walk it if we're doing an upgrade. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Changed: `gossipd` no longer compacts gossip_store on startup (improving start times significantly).
This saves gossipd from converting it: ``` lightningd-1 2026-02-02T00:50:49.505Z DEBUG gossipd: Time to convert version 14 store: 890 msec ``` Reducing node startup time from 1.4 seconds to 0.5 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the file responsible for all the writing, so it should be responsible for the rewriting if necessary (rather than gossmap_manage). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip_store.c uses this to avoid two reads, and we want to use it elsewhere too. Also fix old comment on gossip_store_readhdr(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9024e31 to
cc2fade
Compare
A new subprocess run by gossipd to create a compacted gossip store. It's pretty simple: a linear compaction of the file. Once it's done the amount it was told to, then gossipd waits until it completes the last bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This isn't called anywhere yet. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Changelog-Added: `gossipd` now uses a `lightning_gossip_compactd` helper to compact the gossip_store on demand, keeping it under about 210MB.
|
I have tested cc2fade and i get a crash with this test: def test_gossip(node_factory, bitcoind):
l1, l2, l3, l4 = node_factory.get_nodes(
4,
opts=[
{},
{},
{},
{},
],
)
nodes = [l1, l2, l3, l4]
l1.fundwallet(10_000_000)
l2.fundwallet(10_000_000)
l3.fundwallet(10_000_000)
l4.fundwallet(10_000_000)
l2.rpc.fundchannel(
l4.info["id"] + "@localhost:" + str(l4.port),
1_000_000,
)
l3.rpc.fundchannel(
l4.info["id"] + "@localhost:" + str(l4.port),
1_000_000,
)
l4.rpc.fundchannel(
l1.info["id"] + "@localhost:" + str(l1.port),
1_000_000,
)
bitcoind.generate_block(1)
sync_blockheight(bitcoind, nodes)
l4.rpc.fundchannel(
l1.info["id"] + "@localhost:" + str(l1.port),
1_000_000,
)
bitcoind.generate_block(6)
sync_blockheight(bitcoind, nodes)
wait_for(lambda: len(l1.rpc.listpeerchannels()["channels"]) == 2)
l4_chans = l1.rpc.listpeerchannels(l4.info["id"])["channels"]
scid_l1_l4_1 = l4_chans[0]["short_channel_id"]
scid_l2_l4 = l2.rpc.listpeerchannels(l4.info["id"])["channels"][0][
"short_channel_id"
]
wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 8)
l1.rpc.close(scid_l1_l4_1)
bitcoind.generate_block(6, wait_for_mempool=1)
sync_blockheight(bitcoind, nodes)
# wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)
l1.restart()
l1.rpc.connect(l2.info["id"] + "@localhost:" + str(l2.port))
l1.rpc.connect(l3.info["id"] + "@localhost:" + str(l3.port))
l1.rpc.connect(l4.info["id"] + "@localhost:" + str(l4.port))
# wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)
l2.rpc.close(scid_l2_l4)
bitcoind.generate_block(1, wait_for_mempool=1)
sync_blockheight(bitcoind, nodes)
# wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 4)
l1.rpc.call("dev-compact-gossip-store", [])
# wait_for(lambda: len(l1.rpc.listpeerchannels()["channels"]) == 2)
# wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 4)
l4.rpc.fundchannel(
l1.info["id"] + "@localhost:" + str(l1.port),
1_000_000,
)
bitcoind.generate_block(6)
sync_blockheight(bitcoind, nodes)
wait_for(lambda: len(l1.rpc.listchannels()["channels"]) == 6)It's a test from one of my plugins where i removed all my plugin specific lines. Also i would really like it if the commented |
Reported-by: @daywalker90 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And tests! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…ssip_store.
Before I fixed the handling of dying channels:
```
lightning_gossipd: gossip_store: can't read hdr offset 2362/2110: Success (version v25.12-279-gb38abe6-modded)
0x6537c19ecf3a send_backtrace
common/daemon.c:38
0x6537c19f1a1d status_failed
common/status.c:207
0x6537c19e557a gossip_store_get_with_hdr
gossipd/gossip_store.c:527
0x6537c19e5613 check_msg_type
gossipd/gossip_store.c:559
0x6537c19e5a36 gossip_store_set_flag
gossipd/gossip_store.c:577
0x6537c19e5c82 gossip_store_del
gossipd/gossip_store.c:629
0x6537c19e8ddd gossmap_manage_new_block
gossipd/gossmap_manage.c:1362
0x6537c19e390e new_blockheight
gossipd/gossipd.c:430
0x6537c19e3c37 recv_req
gossipd/gossipd.c:532
0x6537c19ed22a handle_read
common/daemon_conn.c:35
0x6537c19fbe71 next_plan
ccan/ccan/io/io.c:60
0x6537c19fc174 do_plan
ccan/ccan/io/io.c:422
0x6537c19fc231 io_ready
ccan/ccan/io/io.c:439
0x6537c19fd647 io_loop
ccan/ccan/io/poll.c:470
0x6537c19e463d main
gossipd/gossipd.c:609
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Nice catch! It's because I didn't correctly reset the "dying_chans" array. Fixed now, and added a reduced version of this test to be sure. The reason your wait_for lines didn't work is that we don't immediately consider a channel closed: we give it 12 blocks before stopping its use. That allows us to keep using the channel if it turns out to be a splice. We make an exception for our own channels (in that case, we will know about the splice). |
```
> assert len(layers['layers']) == 1
E AssertionError: assert 2 == 1
E + where 2 = len([{'layer': 'xpay', 'persistent': True, 'disabled_nodes': [], 'created_channels': [], 'channel_updates': [], 'constraints': [{'short_channel_id_dir': '45210x2134x44171/0', 'timestamp': 1770341134, 'minimum_msat': 289153519}, {'short_channel_id_dir': '1895x7x1895/1', 'timestamp': 1770341134, 'minimum_msat': 289007015}, {'short_channel_id_dir': '1906x1039x1906/1', 'timestamp': 1770341134, 'minimum_msat': 289008304}, {'short_channel_id_dir': '10070x60x10063/1', 'timestamp': 1770341134, 'minimum_msat': 289005726}, {'short_channel_id_dir': '18772x60x18743/0', 'timestamp': 1770341134, 'minimum_msat': 289005726}, {'short_channel_id_dir': '18623x208x18594/0', 'timestamp': 1770341134, 'minimum_msat': 289004859}, {'short_channel_id_dir': '33935x826x33727/1', 'timestamp': 1770341134, 'maximum_msat': 491501488}], 'biases': [], 'node_biases': []}, {'layer': 'xpay-94', 'persistent': False, 'disabled_nodes': [], 'created_channels': [], 'channel_updates': [], 'constraints': [], 'biases': [], 'node_biases': []}])
```
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
cc2fade to
934ead4
Compare
…eout. Avoids guessing what the timeout should be, use a file trigger. This is more optimal, and should reduce a flake in test_sql under valgrind. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
dd4c02f to
4b347dc
Compare
Based on #8878-- MergedWe used to do compaction of the gossip store, but it was buggy. So now we do it on every startup (which is slow). Better is to do it when required, using a separate process.