Pallet idea: Safe scheduler

xlc · November 8, 2022, 2:42am

There are few ways to brick a Substrate chain and panic/overweight in mandatory hook execution is one of them. Therefore, we should avoid code execution in mandatory hooks (on_initialize / on_finalize) as much as possible.

While we want to avoid panic/overwieght for extrinsics, it is less bad as collators can blacklist bad one (after a failed attempt to bundle it). It could be a active DoS vector of the chain, but can be mitigated.

There are mainly three category of sources of non extrinsic triggered executions:

Periodic business logic execution
Incoming XCM execution
Delayed execution (e.g. referenda enactment) via pallet-scheduler

Note that #1 and #2 can be refactored to use pallet-scheduler for execution so we really just need to make #3 to be safe.

Pallet scheduler execute the scheduled calls in on_intitalize hook, which means any panic will brick the chain. There will now way to construct a valid block without trigger the panic path and no way to inject other code execution before the panic path to potentially rescue the bad execution.

This property can be useful for some critical logic, but not strictly required by most of the use cases. For example, the enactment of a referendum can usually be delayed for a few blocks without causing issues. Dispatch of an incoming XCM are expected to be queued and delayed anyway.

For those executions that doesn’t have strict execute at a particular block requirements, we may better to offload them to a different pallet. i.e. the safe-scheduler.

I initially came up the idea of safe scheduler pallet at orml#481. The core idea is that instead of execute all the non-extrinsic triggered logic in on_initalize, we simply put them into a queue, and use offchain worker + unsigned tx to trigger them. This means any panic/overweight will only mark such unsigned tx to be unbundlable. It will not impact block production and therefore reduce the impact of the damage.

This wasn’t a such big concern before as it is relatively easy to proof a runtime cannot panic and the compute time of hooks are not unbounded. However for parachains it is now possible to overweight due to storage access, which can be hard to detect and rescue (it is hard to tell the size of the item without reading it, but after read it, it could already be too late).

Related issues:

github.com/paritytech/substrate

Custom DispatchClass

opened 11:44PM - 27 Apr 22 UTC

closed 10:12PM - 07 Aug 22 UTC

xlc

It will be good if we can define custom DispatchClass on top of the existing one…s. This will allow us to have more control on the block space usages. e.g. we can define certain transactions can only use up to x% of the block spaces. Use case: I am designing a pallet to replace the pallet-scheduler that uses offchain worker + unsigned tx to trigger scheduled call instead of `on_initlaize` hook, to eliminate potential overweight / panic issue. In order to limit of amount block spaces that a scheduled task can use, I would like to define a custom DispatchClass. This will allow me to define limits for custom DispatchClass with `BlockWeights` to ensure scheduled tasks can never use more than x% of block spaces.

bkchr · November 8, 2022, 8:59am

In general I like the idea of making the scheduler use unsigned extrinsics to schedule its work. It protects against panics and calls that are using too much weight. The only problem would be that when some call is always panicking we would always try to push the same work and it would always panic. As long as the scheduler requires some privileged origin, I think we can assume that there would not be any kind of dos attack using the scheduler.

You said that you are using an offchain worker to schedule the calls. Maybe we could add a similar function like inherent_extrinsics that the block builder could call to get these kind of transactions from the runtime.

xlc · November 8, 2022, 9:19am

One good thing about unsigned tx is that if the block is full, they can just stay in the tx pool waiting to be included in next block. The usual tx priority & longevity API can be used to manage them as well.

kianenigma · November 8, 2022, 10:36am

I really like this idea. We have also recently discussed scheduler with @gpestana and a few others and wished that it had more capabilities.

Some thoughts:

I have used the OCW+Unsigned code path in staking and it works well, but it does require a lot of boilerplate. We should think of wrappers around it to make it more programmable.
Alternatively, we have not really used the Task api in substrate, which spawns a new wasm instance and panics are therefore is less of a deal. Imagine that the scheduler pallet, on_inititalize will start executing its scheduled scheduled tasks each in a new wasm instance. If any of them panics, the main runtime will survive and can remove that task. This essentially gives us the property that you want to achieve via unsigned transactions, but entirely in the runtime.
another feature that I’d like to see is something akin to scheduled_on_idle. Imagine you want to schedule a task to happen after block N, if there is space for it. This will be useful for things like automatic staking rewards. It doesn’t need to happen at a certain block, just after a certain block.

xlc · November 8, 2022, 11:37am

We have already implemented idle scheduler and we are currently using it to run cleanup tasks for removed evm contracts

crystalin · November 10, 2022, 7:20pm

Those are good points.

If I can give a bit of additional information, in Moonbeam we only allow schedule from democracy. When we schedule heavy work, our script always split to ensure the extrinsic pov and the execution time/pov are always < 25% of a block allowed limit (it creates many batches).

However, we are worried about XCM execution yes, specially because of the EVM access which can’t too easily be controlled and have a weight/pov conversion that is not very precise.

Moving to an idle scheduler can introduce some issues however, this could allow a block producer to control/delay the execution of an XCM (front-running, censoring,…).

We thought about making XCM (specially the XCM->EVM) getting queued and executed through extrinsic, but that brings the same issues.

kianenigma · June 19, 2023, 12:27pm

This will be delivered as a part of FRAME: General system for recognising and executing service work · Issue #13530 · paritytech/substrate · GitHub.

Topic		Replies	Views
Make xcm execution non-mandatory Tech Talk xcm , security , governance	2	496	July 5, 2023
Safe Mode and Transaction Pause Pallets Tech Talk frame	7	793	January 19, 2023
Parachain Technical Summit - Next Steps Tech Talk roadmap , frame , polkadot-summit	26	4962	July 17, 2024
Introducing pallet-verifier - a tool for detecting common security vulnerabilities and insecure patterns in FRAME pallets Tech Talk frame , security	1	293	January 10, 2025
Pallet-verifier updates thread Tech Talk frame , security	2	69	April 23, 2025

Pallet idea: Safe scheduler

Related topics