It’s been a while since we encountered the limit on the number of parachains that a Relay Chain can handle. Further, not every parachain exhausts its PoV limits, meaning that the erasure coded chunks sent over the network are not their maximum size. So we don’t know how the Relay Chain will handle heavy usage on all the existing parachains.
On a call today, an idea came up of creating a parachain runtime that has no transactional API, but has an on_initialize that scrambles some trie values and runs some computations such that the PoV size and time used are near or at the limits.
These would simulate parachains under maximal load. We can then register multiple ParaIds of the same runtime until until Kusama starts to have problems. This would give us insight into a current reasonable max number of parachains and indicate where the next wave of optimizations should be focused.
Some other ways we can experiment:
Increase the number of validators assigned to each parachain (e.g. from 5 to 10)
Have these limit parachains send messages over XCMP(-lite) to each other
I opened this thread to see what other people think of the idea and if it’s worth pursuing. I think it’s quite aligned with Kusama’s role as canary network.
Yeah, I like this idea as well. Right now we all assume (a) that one parachain’s activity has no effect on other parachains, and (b) that Kusama/Polkadot can support at least 45 parachains. But without them fully using their blockspace, we really don’t know that either of these are true.
The thing that will be tough with transaction bots is that it will probably be easy to fill one dimension of weight (time or PoV size), but not both at the same time.
But in reality, using a variety of testing strategies will probably yield the best results. So I think a mix of both our ideas (plus others if they’re out there) will be a good approach.
Why does this need to happen in on_initialize?
Could we not have a permissioned origin like LoadTester who can submit PoV heavy transactions? Using on_initialize for that seems dangerous to me.
I think we need to test constant load, because we would want to add several of these parachains until Kusama breaks to find what the next limiting factor is. If you have 10 of these, then it’d be impossible to ensure that they’re all getting hit at the same time if you rely on a LoadTester to submit calls.
Not saying an implementation must use on_initialize, but it was my first thought. We could order the pallets such that the DMP queue gets processed first and could have a LoadTester origin from the Relay Chain that says “fill blocks to (s%, t%)” (space, time). So if the network got congested then Kusama could send a message to back off.
We could order the pallets such that the DMP queue gets processed first and could have a LoadTester origin from the Relay Chain that says “fill blocks to (s%, t%)” (space, time).
Or we limit the number of blocks that it goes on for. So like your "fill blocks to (space%, time%)” but for at most x blocks and then stop. Otherwise we nuke the network with no way of stopping it.
But yea, having a way to stop it before x blocks could also be good.
Just thinking out loud: Would the on_idle not be the perfect fit?
Yeah, that is a good idea, probably better indeed.
By setting (space%, time%) and configuring the number of paras, we should be able to do things like:
Set all to (50, 50) and see how many paras we can add until things break
Set some to (100, 0) and (0, 100) and see if some have problems (I’d expect network to be a bottleneck on large size PoVs but not on compute, but maybe running secondary checks is the bottleneck on compute).