This is to open the discussion about all the different tools and approaches used in the ecosystem for testing pallets.
I have been pushing forward a number of efforts in the last 1-2 years on the side to improve the testing quality both for Polkadot, and in the entire ecosystem. Here, I want to both share what I have learned, and see what others are doing.
Here is a list of approaches that come to my head, more or less in the order in which they have been used:
1. Try-runtime Classics
The first scenario that everyone absolutely must test is a runtime upgrade. Most often, an upgrade changes more than the :CODE:
and actually wants to transform some data as well, which we call runtime migration. All of this is coded in on_runtime_upgrade
hook of different pallets. This puts us in a situation where we absolutely want to make sure the on_runtime_upgrade
of an entire runtime executes successfully, and yields the correct result. Successful execution could mean:
- Weight and POV limits are respected
- migrations happens correctly.
The O.G. try-runtime
subcommand allows for testing runtime upgrade, among a few other utility commands, such as executing past blocks or executing pas offchain workers. These commands can be useful in diagnosing an old block that you are suspicious of.
To test item 2, you can additionally write some code that is executed before and after your migration, in separate functions called
pre_upgrade
andpost_upgrade
.
If you panic, or somehow fail in
on_runtime_upgrade
, it is a very hard position to recover from.
More info:
- https://docs.substrate.io/reference/command-line-tools/try-runtime/
- Substrate Seminar - Crowdcast
- Command in try_runtime_cli - Rust
Storage Invariants
This point is strongly relying on the concept of thinking of blockchains as state machines. A state machine has valid states and invalid states. In other words, every transition is not necessarily valid.
You should see the entire set of your palletās storage as the state, and any block (including extrinsics, and hooks) as the transition.
Ideally, we should check the state to be valid (i.e. the invariants to be met) after EACH transition of state.
The way to check if a state is valid or not is to check for as many invariants as you possibly can. An invariant is a statement that must be always held about a storage item. The best way to begin with this would be right when you start designing your palletās storage. As you are writing the definitions, think about what relations must always be held. The outcome are the invariants you are looking for. Here are a few simple examples:
/// Invariant: value should always be equal to the count of keys in [`Map`].
#[pallet::storage]
pub type Counter = StorageValue<u32>;
#[pallet::storage]
pub type Map = StorageMap<u32, u32>;
Or
#[pallet::storage]
pub type BondedPools = StorageMap<u32, Pools>;
/// Invariant: the keys of this map must always be equal to that of [`BondedPools`].
#[pallet::storage]
pub type RewardPools = StorageMap<u32, Vec<u8>>;
In the old days, we tried to to formulate some of these assumptions, and make sure, for example, they are always held at the end of each unit test.
Or:
Composite, Semi-Private Storage Types.
A common scenario that can appear here is: a number of storage items exist, which have a lot of invariants linking them to one another. If so, then we can:
- make them all as private as it can be (no
getter
function, onlypub(super)
). - crate a wrapper struct that handles any read/write operation on this storage item group. We call this wrapper a composite storage item.
- make sure any functions that mutates any of the underlying storage items is exclusively done as such:
struct Wrapper<T>(PhantomData<T>);
impl<T> Wrapper<T> {
fn mutate_checked<R>(mutate: impl FnOnce() -> R) -> R {
let r = mutate();
#[cfg(debug_assertions)]
assert!(Self::all_invariants_should_be_checked_here().is_ok());
r
}
}
Here is a full-fledged example of this.
In general, the goal here is to make sure these invariants are checked more frequently, ideally after each transition of state, not just after a test is executed.
Try-Runtime: Follow-Chain, TryState
With all of this in place, as of recently, we added a new hook to each pallet to (somewhat) standardize this notion of āinvariantā. This was eventually called fn try_state
(open to new names of you have any), and it lives inside the impl Hooks
part of your pallet. It is meant to be per-pallet. Since it does not distinguish between different storage items, it is more sensible to be called per-block.
This is not a prefect abstraction, and I would appreciate further feedback, if any. For example, when I explained this to @shawntabrizi he preferred having an abstraction to define checks individually, not as a single function per-pallet. I think having a
try_state
per storage could also be a better abstraction.
Moreover, a new command was added to try-runtime
, called FollowChain
. This command runs the transactions of a real chain, on top of its real state, with a new runtime (and the state root check is disabled, because it will no longer match). This is a very powerful testing primitive, and I hope over time more teams start using it.
In essence, this is a dry-run for an unpublished WASM runtime.
With follow-chain
, you can also specify the try_state
of which pallets should be executed.
Defensive Traits
An invariants, as the name suggest, should be ALWAYS held. So it makes sense to try and check this condition as frequent as you can.
One issue that we encountered along the way was having to sprinkle the code with #[cfg(debug_assertions)]
too much. To combat this, we borrowed a keyword from Defensive Programming and added a few traits that do the following:
- Defensively handle an
Option<_>
orError<_>
, meaning that the fallibility is still handled, but: - If in tests, panic.
- If in production, raise an error log and move on.
This allows the developer clearly express where they think some invariant is always held, without needing to write expect("proof")
. For example:
Because these two maps have an invariant that their key-set must always be the same, which is in fact checked in the try_state
hook of the pallet.
Fuzzing
An interesting discussion that came up recently in a substrate issues. The reason I bring it up here is that in order to have meaningful fuzzing, we exactly lacked something like try_state
. Once we have a function that can represent the correctness of a pallet, we can actually think about running a fuzzer and continuously call the correctness function. Read more about this in:
Hooked Storage Items
Lastly, if and once we have storage types that can have on_insert
, on_update
, on_remove
hooks etc., we can integrate all of these checks at a much lower level.
For example, we can check on_update
that certain invariant are met.
This is merely at the idea level and we havenāt done any work on it either, other than a little bit of prototyping. Just raising it here as an idea.
Future Plans
The major next step that I have in mind is to integrate try_state
into a real client, so you can run a normal Polkadot/Substrate/Cumulus node, add an extra feature flag, and then you should be able to run try_state
of all blocks as you import them. This can help us reach a much higher coverage across the ecosystem and make sure try_state
of many chains are called on many blocks.
Ideally, we can ask some infrastructure providers to also run one node with try-state
checks enabled.
Other than that, I am temporarily tracking the issues related to testing in this project board:
What do you think?
If you have used any of the above tools/approaches and have any feedback on it, do share them here! Or are there any other tools in the ecosystem that do similar things? Such as:
- GitHub - maxsam4/fork-off-substrate: This script allows bootstrapping a new substrate chain with the current state of a live chain
- GitHub - shaunxw/xcm-simulator: Test kit to simulate cross-chain message passing and XCM execution
- GitHub - centrifuge/fudge: FUlly Decoupled Generic Environment for Substrate-chains