Testing Complex FRAME Pallets: Discussion + Tools

This is to open the discussion about all the different tools and approaches used in the ecosystem for testing pallets.

I have been pushing forward a number of efforts in the last 1-2 years on the side to improve the testing quality both for Polkadot, and in the entire ecosystem. Here, I want to both share what I have learned, and see what others are doing.

Here is a list of approaches that come to my head, more or less in the order in which they have been used:

1. Try-runtime Classics

The first scenario that everyone absolutely must test is a runtime upgrade. Most often, an upgrade changes more than the :CODE: and actually wants to transform some data as well, which we call runtime migration. All of this is coded in on_runtime_upgrade hook of different pallets. This puts us in a situation where we absolutely want to make sure the on_runtime_upgrade of an entire runtime executes successfully, and yields the correct result. Successful execution could mean:

  1. Weight and POV limits are respected
  2. migrations happens correctly.

The O.G. try-runtime subcommand allows for testing runtime upgrade, among a few other utility commands, such as executing past blocks or executing pas offchain workers. These commands can be useful in diagnosing an old block that you are suspicious of.

To test item 2, you can additionally write some code that is executed before and after your migration, in separate functions called pre_upgrade and post_upgrade.

If you panic, or somehow fail in on_runtime_upgrade, it is a very hard position to recover from.

More info:

Storage Invariants

This point is strongly relying on the concept of thinking of blockchains as state machines. A state machine has valid states and invalid states. In other words, every transition is not necessarily valid.

You should see the entire set of your palletā€™s storage as the state, and any block (including extrinsics, and hooks) as the transition.

Ideally, we should check the state to be valid (i.e. the invariants to be met) after EACH transition of state.

The way to check if a state is valid or not is to check for as many invariants as you possibly can. An invariant is a statement that must be always held about a storage item. The best way to begin with this would be right when you start designing your palletā€™s storage. As you are writing the definitions, think about what relations must always be held. The outcome are the invariants you are looking for. Here are a few simple examples:

/// Invariant: value should always be equal to the count of keys in [`Map`].
#[pallet::storage]
pub type Counter = StorageValue<u32>;
#[pallet::storage] 
pub type Map = StorageMap<u32, u32>;

Or

#[pallet::storage] 
pub type BondedPools = StorageMap<u32, Pools>;
/// Invariant: the keys of this map must always be equal to that of [`BondedPools`].
#[pallet::storage] 
pub type RewardPools = StorageMap<u32, Vec<u8>>;

In the old days, we tried to to formulate some of these assumptions, and make sure, for example, they are always held at the end of each unit test.

Or:

Composite, Semi-Private Storage Types.

A common scenario that can appear here is: a number of storage items exist, which have a lot of invariants linking them to one another. If so, then we can:

  1. make them all as private as it can be (no getter function, only pub(super)).
  2. crate a wrapper struct that handles any read/write operation on this storage item group. We call this wrapper a composite storage item.
  3. make sure any functions that mutates any of the underlying storage items is exclusively done as such:
struct Wrapper<T>(PhantomData<T>);
impl<T> Wrapper<T> {
  fn mutate_checked<R>(mutate: impl FnOnce() -> R) -> R {
	let r = mutate();
	#[cfg(debug_assertions)]
	assert!(Self::all_invariants_should_be_checked_here().is_ok());
	r
  }
}

Here is a full-fledged example of this.

In general, the goal here is to make sure these invariants are checked more frequently, ideally after each transition of state, not just after a test is executed.

Try-Runtime: Follow-Chain, TryState

With all of this in place, as of recently, we added a new hook to each pallet to (somewhat) standardize this notion of ā€œinvariantā€. This was eventually called fn try_state (open to new names of you have any), and it lives inside the impl Hooks part of your pallet. It is meant to be per-pallet. Since it does not distinguish between different storage items, it is more sensible to be called per-block.

This is not a prefect abstraction, and I would appreciate further feedback, if any. For example, when I explained this to @shawntabrizi he preferred having an abstraction to define checks individually, not as a single function per-pallet. I think having a try_state per storage could also be a better abstraction.

Moreover, a new command was added to try-runtime, called FollowChain. This command runs the transactions of a real chain, on top of its real state, with a new runtime (and the state root check is disabled, because it will no longer match). This is a very powerful testing primitive, and I hope over time more teams start using it.

In essence, this is a dry-run for an unpublished WASM runtime.

With follow-chain, you can also specify the try_state of which pallets should be executed.

Defensive Traits

An invariants, as the name suggest, should be ALWAYS held. So it makes sense to try and check this condition as frequent as you can.

One issue that we encountered along the way was having to sprinkle the code with #[cfg(debug_assertions)] too much. To combat this, we borrowed a keyword from Defensive Programming and added a few traits that do the following:

  1. Defensively handle an Option<_> or Error<_>, meaning that the fallibility is still handled, but:
  2. If in tests, panic.
  3. If in production, raise an error log and move on.

This allows the developer clearly express where they think some invariant is always held, without needing to write expect("proof"). For example:

Because these two maps have an invariant that their key-set must always be the same, which is in fact checked in the try_state hook of the pallet.

Fuzzing

An interesting discussion that came up recently in a substrate issues. The reason I bring it up here is that in order to have meaningful fuzzing, we exactly lacked something like try_state. Once we have a function that can represent the correctness of a pallet, we can actually think about running a fuzzer and continuously call the correctness function. Read more about this in:

Hooked Storage Items

Lastly, if and once we have storage types that can have on_insert, on_update, on_remove hooks etc., we can integrate all of these checks at a much lower level.

For example, we can check on_update that certain invariant are met.

This is merely at the idea level and we havenā€™t done any work on it either, other than a little bit of prototyping. Just raising it here as an idea.

Future Plans

The major next step that I have in mind is to integrate try_state into a real client, so you can run a normal Polkadot/Substrate/Cumulus node, add an extra feature flag, and then you should be able to run try_state of all blocks as you import them. This can help us reach a much higher coverage across the ecosystem and make sure try_state of many chains are called on many blocks.

Ideally, we can ask some infrastructure providers to also run one node with try-state checks enabled.

Other than that, I am temporarily tracking the issues related to testing in this project board:

What do you think?

If you have used any of the above tools/approaches and have any feedback on it, do share them here! Or are there any other tools in the ecosystem that do similar things? Such as:

14 Likes

Oh, thatā€™s a great post!

One meta thing I must add is that it might get hard to discuss since it packs many different things together. I think maybe separate topics would have been better, but it is what it is and prolly fine.

Explicitly identifying and checking the invariants is as powerful a technique as it is humbling. Here is one example of how I was using it:

  1. Storage items, the state, get invariants identified, like in this example.
  2. There is a module-wide function that checks all the storage invariants exhaustively. Exhaustively means that that function just iterates over all the storage and ensures it is consistent.

This identified many bugs, and I bet would be very useful for the people just onboarding to that code. Of course, this approach has a cost: sometimes it flags false positives. Ultimately, though, I think itā€™s a very good trade, even before accounting for the potential consequences of defects.

Now, about the wrapper types. I think many FRAME developers neglect that technique. Take this example. There, the types abstract the manipulation of the storage. Those types enable in-memory caching and give a clue how the storage types are used. IMO this is way better than manipulating those storage entries directly. It also indirectly improves the reliability of the code, and checks could be added. A wrapper approach with mutate is also a good idea, although a bit verbose.

I must say that FRAME is a bit hostile to that approach. I wish I could declare storage in one of the pallet crateā€™s modules the storage and only expose high-level wrappers to interact with it. Instead, most of the times, I have to express all the logic close to the storage and mostly all in one file. Other maintainers might not be aware of that they have to use only the wrapper and might use the storage entry directly potentially invalidating the invariants.

(cc @shawntabrizi)

He preferred having an abstraction to define checks individually, not as a single function per-pallet. I think having a try_state per storage could also be a better abstraction.

Well, maybe, but if he had in mind per-storage checks then that probably also not the best idea. Check out this example invariant: the list should have all the items that the set has and the set should have all the items that are in the list. There is no natural home for this invariant.

One may argue that as a convenience feature, you may still introduce something like that, but I am not sure if thatā€™s a good idea.

Now, regarding the Defensive trait.

  1. Defensively handle an Option<_> or Error<_>, meaning that the fallibility is still handled, but:
  2. If in tests, panic.
  3. If in production, raise an error log and move on.

This has to be approached with care. Sometimes, it might be needed to test the fallback code, i.e., what happens in None or Err case. That becomes impossible since in tests it panics and that requires some sort of hack to disarm those checks.

Now, regarding fuzzing. This one should be obvious for anyone who used it for testing their code. I think FRAME code is in a very good position to be fuzzed. We can also go further:

  1. Use the real chain tips to initialize the state and then apply arbitrary transactions suggested by the fuzzer,
  2. Do the same, but for migrations.

And finally, it would be great if this discussion resulted in to a set of written guidelines.

4 Likes

It is indeed a bit too much to react to all of it. Maybe we just collect ideas in here and then create dedicated discussions for each part.

I like this idea. Imagine writing the invariants actually as code and not as comment:

#[pallet::storage] 
pub type Map = StorageMap<u32, u32>;

#[pallet::storage]
#[pallet::invariant(
    event: changed(_previous, current) {
        assert!(current == Map::iter().count());
    },
    check_on: StorageLayerCommit,
    // Could think of when to check these conditions; per Extrinsic, per Block, Alwaysā€¦
)]
pub type Counter = StorageValue<u32>;
3 Likes

Fuzzing

In my experience, when done right, fuzz testing can find bugs quicker than property-based tests. My personal preference is cargo-fuzz because itā€™s based on llvmā€™s libFuzzer which mutates the inputs based on branch coverage (although there should be ways to make the interface universal). Due to this, adding some debug assertions in your code will create more branches and increase the probability of hitting a bug. The only downside is that it requires a nightly compiler. Honggfuzz doesnā€™t.

Failpoints

fail is a crate that dynamically injects errors. I assume this should also play nicely with fuzzing. Itā€™s being used in some other blockchains.

Type-safety

Use of nested generic types that make it impossible to create invalid instances of a type (one example is Penumbraā€™s tiered commitment tree). Another example is NonZeroUsize or types that do validation in fn new().

Formal verification

There are plenty of tools available for doing formal verification in Rust: Rust verification tools (2020) ā€“ Alastair Reid ā€“ Researcher at Intel.

Iā€™m personally excited about GitHub - model-checking/kani: Kani Rust Verifier because of the simplicity of writing tests (looks like a property-test).

6 Likes

I didnā€™t quite follow this, can you give some examples of what you mean? FWIW I think the Defensive trait is a great idea, Iā€™ve used something similar before.

1 Like

+1 for Type Safety. You can cut down on tests and potential errors by simply making some states not representable. Itā€™s also a low-hanging fruit in many codebases (not sure about Parity yet).

Coverage Marks

An interesting technique to be aware of is coverage marks, which can link code paths with the tests that test them. Maybe too verbose to use extensively, but could be useful in some circumstances.

1 Like

Any defensive_unwrap_or cannot be tested, since the test immediately panics.
Had the same problem yesterday in Add `DefensiveTruncateFrom` by ggwpez Ā· Pull Request #12515 Ā· paritytech/substrate Ā· GitHub where I add a defensive_truncate_from function.
This function first tries to use a non-truncating conversion and falls back to truncating if that fails.
Problem is that the truncating case cannot be tested for stated reason that it panics in test.

#[test]
#[should_panic(
	expected = "Defensive failure has been triggered!: \"DefensiveTruncateFrom truncating\""
)]
fn defensive_truncate_from_vec_panics() {
	// NOTE: We cannot test the truncating case since defensive failures panic in tests.
	let unbound = vec![1u32, 2];
	let _ = BoundedVec::<u32, ConstU32<1>>::defensive_truncate_from(unbound);
}
2 Likes

Defenisve extends the appropriate data types with functions analogous to unwrap. That is, the match arm previously panicking would now return the control to the caller. Should you want to test that path, you would have to get creative.

First example:

fn squizle_foobers() {
  // .. 
  process_reward_pool(reward_pool.defensive_unwrap_or_default())
  // .. 
}

Employing Defensive means that you are not 100% sure that this wonā€™t ever be None. It is reasonable to you would like to test how squizle_foobers() behaves in case reward_pool was None. But how would you do that? Would you re-run the test without cfg(debug_assertions)? How would that work in CI (I donā€™t think we would want to build the tests twice)? Maybe there should be some runtime switch that you would flick?

That might seem like a local issue, where you could come up with a module-local solution specifically for that test. But that does not scale. If the Defensive is planted in a module that you use, there might be no way to defuse it, without touching the source code of the dependency module. Hence I claim that Defensive as is suffers from composability issues.

Thatā€™s why I said it has to be approached carefully. But now, I actually clicked through the links and it seems that it already made its way into the library code which makes me a bit sad.

UPD: Did not see @OliverTY 's message but yeah thatā€™s a good demonstration of one of my concerns.,

3 Likes

This seems to use thread local storage to remember which marks got hit.
Maybe we can use something like that as well to decide whether or not a defensive! should panic or log.
Then we could test both cases. These markers should not be compiled into the runtime, since AFAIK thread local storage does not make sense there. But maybe it works for testing.

2 Likes

I just found loom today and thought it was interesting.

Loom is a tool for testing concurrent programs.

At a high level, it runs tests many times, permuting the possible concurrent executions of each test according to what constitutes valid executions under the C11 memory model. It then uses state reduction techniques to avoid combinatorial explosion of the number of possible executions.

1 Like