Eliminating pre-dispatch weight

Ever since the decision was made that gas metering is too much of an overhead for the runtime we accepted the fact that we always need to know the pre-dispatch weight of every extrinsic. This forces developers to write benchmarks for every extrinsic and introduces room for mistakes while writing those benchmarks.

That said, @pepyakin came up with a way to do gas metering with less than 5% overhead. This opens up the possibility to remove pre-dispatch weight from extrinsics and replace it with a computation_limit parameter that needs to be chosen by the user. We will help choosing the value by providing dry-run functionality.

Imagine how much easier writing a runtime becomes when you don’t need to care about writing benchmarks or doing something non linear by accident and DoSing your chain in the process.

However, there is one obstacle left that I want to address in this post: We are currently adding another dimension to weight: bandwidth. Essentially this means the contribution of an extrinsic to the PoV. It is planned to determine a pre-dispatch bandwidth using benchmarks just as we do for computation (that we want to eliminate with gas metering). This prevents our plan of freeing developers from writing benchmarks. However, there is no need to have a pre-dispatch bandwidth because we can just add a bandwidth_limit parameter to an extrinsic and do the metering completely dynamic.

While we need to go to great length to make computation/gas metering low overhead (see the linked post by @pepyakin ) it is much easier for bandwidth and we should not force developers to come up with a pre-dispatch worst case for it. What we can do is the following:

  1. Have every storage item bounded by MaxEncodedLen (we are working on this anyways).
  2. The runtime itself dynamically meters storage accesses whenever an item is read: Before reading MaxEncodedLen it used to make sure that we can actually do this read with the remaining limit. It is immediately corrected after reading the value to the real size. A yet better solution would be if our trie would allow reading the size without putting the whole value into the witness. But that is not the case right now.
  3. After the extrinsic is applied we correct the bandwidth to the actual value with feedback from the block builder pov_len_1 - pov_len_0. This will always be lower because it accounts for values already in the witness.

In contrast to computation metering we can do the bandwidth metering completely within the parachains runtime: The size of a PoV is objective and will just be rejected by all relay chain validators if it is too big (parachchain’s runtime doesn’t properly regulate it). Computation on the other hand is subjective when measured by time and therefore gas metering needs to be applied by the (relay chain) client.

tl;dr: We have a chance to get rid of per-extrinsic benchmarks and we should seriously think about taking it. We will still need benchmarks for host functions and wasm instructions but that doesn’t concern runtime developers.

5 Likes

I will love this. Benchmarking is not only time consuming but can be super hard to get it right for any non trivial amount of code with some branches and loops.

Where is the linked post by @pepyakin?

One problem of PoV limit is that we don’t know the PoV usage of a storage read until we read it. But if it is too big, we have already read it and can’t unread it… So we have to recheck the max size, but that could be significantly larger than the actual size, and resulting a waste of the free bandwidth.

I can’t think of a good way to solve it other than also storing the size of the storage, but that also comes with some overhead as well.

Maybe it is already possible to proof the size of a storage and therefore be able to proof a call should fail due to PoV weight issue without including the whole data in PoV?

Where is the linked post by @pepyakin?

This conversation was extracted from an external Discourse thread titled "Deterministic Metering of Runtimes and PVFs".

@pepyakin @Alex @burdges @gilescope

Yeah but this will only be happening once the block nears completion. We might not able able to pack every block fully. I think this is acceptable. Additionally, this will be the case anyways metering or no metering. So we only improve here by down correcting with feedback from the block builder. Only downside is subjecting the proof size to consensus.

And it also requires a trie migration. This is kinda nasty.

This would be nice but nothing comes to mind. Maybe someone else had an idea.

Not every storage items have variable size so we could make this an opt-in to reduce migration impact.

A simple solution is create a new variant StorageMap / StorageValue that stores the size of the value on a different key and reading the value will read the size will charge a global PoV meter first. It comes with some overhead but definitely works.

One idea which was introduced into the preimage pallet was to include the size, in bytes, of a storage as part of the key for the storage item.

So if you want to store some image on chain, to read that image you need to go to:

images > (hash, size) > raw bytes

In the code it looks like:

#[pallet::storage]
pub(super) type PreimageFor<T: Config> =
	StorageMap<_, Identity, (T::Hash, u32), BoundedVec<u8, ConstU32<MAX_SIZE>>>;

So before you read any of the preimages, you must provide the expected size, and if you provide the wrong size, you will find nothing in the storage key. This would mean that we always know how much the PoV could grow before we attempt to read anything.

I don’t think this is applicable in general, but for the specific usecase of storing individual items, it works.

1 Like

For PoV weights, I think MEL + Preimages pallet will likely cover 99% of use-cases. MEL should be used in general for keys and values and for types whose sizes are necessarily impossible to bound tightly then Preimages pallet (and the associated Bounded<T> type and QueryPreimage/StorePreimage) work fine.

As far as I’m aware, high-performance deterministic metering has not yet been delivered into the mainline codebase, nor is there any expectation that it is on the horizon. All we have is some initial prototyping and results suggesting it may be possible. Even if/when it becomes possible, unless panics can be handled in a high-performance and graceful manner by the top-level runtime, then we have no way to limit the weight used to a limit presented by the user. Furthermore, forcing the user to prespecify their limit comes with its own centralisation issues (which I already went in the previous incarnation of this thread but was somehow not pasted above).

I think basic deterministic metering would open up some possibilities in this space (like precise weight-refunds allowing for systematic over-estimation). But it is no silver bullet.

Finally, I’d point out that the more Frame becomes a heavy-weight, lowish-performance, panic-friendly, rapid-development-centric, prototyping-friendly environment, the more it resembles a duplicate of smart contract environments such as pallet-contracts and Gear. That’s not what the runtime environment is meant to be, and by trying to make it one, we compromise our existing stack in pursuit of duplicating a solution. If a task is better done with a heavy-weight, lowish-performance, panic-friendly environment, then I’d suggest it be done as a smart contract rather than trying to turn Frame/the runtime into something it’s not.

The current prototype is also only about the execution weight and is not yet taking into account PoV size tracking. The size of the PoV usage will also need to be tracked much stricter as increasing the PoV size by one storage::get call is much easier than burning CPU time between these “checkpoints” where we check the execution weight usage.

This would require one extra storage lookup per item. Seems very costly. I personally would be much more in favor of storing the length of the item in the trie as well. For stuff like contracts it would otherwise get very complicated (impossible) to predict the length of a storage item that is being accessed by a contract. A malicious contract could probably abuse this. You call it with a given PoV limit and then access some huge item. Pallet-contracts would only be able to abort the execution after this item is read, but then it would be already in the proof and it can not be removed there (as the execution needs to be payed for). By having the size directly in the trie, we would only need to pull the trie nodes into the proof and be able to abort the execution before pulling the actual item. As we don’t have yet done the migration to state version V1, we could still include it there or bring this as V2.

I think pallet contracts’ need for strict gas metering should not impact Frame/the runtime. If pallet-contracts doesn’t enforce a reasonable upper bound on PoV impact of a storage read, then it would need to store the size somewhere prior to read, a bit like Preimages pallet. Perhaps this can be optimised by introducing an optional feature into the child trie, but to be honest I don’t think it’s that important - pallet contracts can probably just suck up the extra trie read for the present until/unless it becomes clear it is not economically viable for serious use-cases.

We refined our thinking somewhat at the retreat. Above my quoted comments in part concern my proposal to charge parachains/backers for slow parablocks, which nicely separates CPU utilization/performance and security concerns. Robert K noticed my billing proposal works fine with only CPU time too, without the mix of deterministic metering I originally proposed. It’s likely deterministic metering, or some third mixture, makes the worst cases less bad, but it’s no longer a central concern. I doubt this impacts the weights discussion much, but I wanted to say our thinking evolved.

As for doing weights by deterministic metering, maybe you could employ some vaguely fuzzing-ish scheme too, by which I mean check all the branches too, and somehow count them, like maybe assume branch prediction works poorly.

To expand on this, I think the best way forward to optimise for developer experience while not compromising on decentralisation or runtime performance is to embrace systemic overestimation.

Basically, once we have execution metering, then determining our weight usage after the fact becomes trivial. This doesn’t much help with avoiding the possibility of resource over-usage (since this requires us to know in advance the maximum possible resource usage of an operation). However it does mean that we don’t need to care about being very precise with our estimates as long as we are certain that it’s an over-estimation. This is because we can always correct downwards once the operation is completed.

If we know that an operation will always be within our acceptable resource usage (e.g. a well-constrained and mandatory on_initialize hook) then it means we can dispense entirely with weights. We just execute the code and then update our weight counter with the actual weights used according to the host.

We’re then left with the problem of over-estimation. For the PoV footprint weight component, we can just require MaxEncodedLen for all storage items, with the Bounded<T> trait (and Preimages pallet) for any types which are necessarily of indefinite size.

For execution time weight component, it’s a little tricker, but, combined with an audit, I think there’s a path: It’s not entirely unreasonable to set the weight to some large default (e.g. 1/10th block weight). This asserts that the dispatchable cannot possibly use up more than 1/10th of the block’s weight. If it did, then there would be resource-exhaustion attack and thus a high-quality audit should absolutely check this. However, for any dispatchable of constant or well-constrained complexity, it’ll mostly be fine. It comes with the drawback that once the block becomes more than 90% full, no such operations are able to use the remaining blockspace. However I don’t think it’s that important, and if it ever is, then the developer can always tune it down with a benchmark.

For highly complex operations, the WeightCounter pattern provides a fairly easy means of ensuring that weight limits are not exhausted by dramatic amounts, and this would not necessitate the introduction of runtime panics. I’ve now used this pattern twice (Scheduler and MessageQueue pallets) and it works well. I suspect there may be value in making it more ubiquitous throughout Frame’s APIs.

A nice element to systemic overestimation is that despite substantially alleviating the need for benchmarks, it requires no big changes to pallets already in production.

This sounds like an elegant solution that doesn’t require us to allow panicking (which seems to be a requirement when we actually want to abort on weight overrun). One thing worth pointing out is that the overestimation for proof_size will be higher than ref_time: ref_time doesn’t change that much when you measure it individually per extrinsic vs. during actual block building. However, the proof_size does: Individual extrinsics must assume that a value they are reading must be included in the proof despite the fact it might already be there due to another extrinsic being included in the block.

So in order to make this systemic overestimation work for proof size we might need to keep correcting down based on the actual proof size while block building. This information is held by the client. Assuming that extrinsics share a substantial amount of reads.

Indeed. Whether it’s extrinsics coming in from the block building process, pre-scheduled work items on-chain or XCM being executed from on-chain queues, we will need to be correcting down after every operation, for both compute time and PoV footprint.

I thought about the systemic overestimation in the context of contracts. For fixed logic pallets this solution is fine because we can assume that they always terminate as we make sure not to write unbounded logic or do any dumb stuff. So we can just use max weight and let it run.

However, for contracts we need to stop the execution on weight overrun since we cannot assume that they will ever terminate. I think try…catch for runtime can solve the problem if we can make it very low overhead. It will not affect the already written code of existing pallets. It will only require changes to the FRAME entry points.

Doing only metering without this mechanism will not be trivial either: Importing a block happens in a single wasm instance. But we need to meter weight per transaction in order to derive the precise fees from it. So we need a check pointing mechanism for this anyways. Having a sub-instance per extrinsic seems conceptually easier (and can be done with wasmtime). There are a lot of open questions and it all depends on whether this can be done without compromising performance.

Of course we don’t want FRAME to become slow and heavy. The idea is that we might be able to pull that off without paying a big price.