Testing Complex FRAME Pallets: Discussion + Tools

kianenigma · September 13, 2022, 10:36am

This is to open the discussion about all the different tools and approaches used in the ecosystem for testing pallets.

I have been pushing forward a number of efforts in the last 1-2 years on the side to improve the testing quality both for Polkadot, and in the entire ecosystem. Here, I want to both share what I have learned, and see what others are doing.

Here is a list of approaches that come to my head, more or less in the order in which they have been used:

1. Try-runtime Classics

The first scenario that everyone absolutely must test is a runtime upgrade. Most often, an upgrade changes more than the :CODE: and actually wants to transform some data as well, which we call runtime migration. All of this is coded in on_runtime_upgrade hook of different pallets. This puts us in a situation where we absolutely want to make sure the on_runtime_upgrade of an entire runtime executes successfully, and yields the correct result. Successful execution could mean:

Weight and POV limits are respected
migrations happens correctly.

The O.G. try-runtime subcommand allows for testing runtime upgrade, among a few other utility commands, such as executing past blocks or executing pas offchain workers. These commands can be useful in diagnosing an old block that you are suspicious of.

To test item 2, you can additionally write some code that is executed before and after your migration, in separate functions called pre_upgrade and post_upgrade.

If you panic, or somehow fail in on_runtime_upgrade, it is a very hard position to recover from.

More info:

Storage Invariants

This point is strongly relying on the concept of thinking of blockchains as state machines. A state machine has valid states and invalid states. In other words, every transition is not necessarily valid.

You should see the entire set of your pallet’s storage as the state, and any block (including extrinsics, and hooks) as the transition.

Ideally, we should check the state to be valid (i.e. the invariants to be met) after EACH transition of state.

The way to check if a state is valid or not is to check for as many invariants as you possibly can. An invariant is a statement that must be always held about a storage item. The best way to begin with this would be right when you start designing your pallet’s storage. As you are writing the definitions, think about what relations must always be held. The outcome are the invariants you are looking for. Here are a few simple examples:

/// Invariant: value should always be equal to the count of keys in [`Map`].
#[pallet::storage]
pub type Counter = StorageValue<u32>;
#[pallet::storage] 
pub type Map = StorageMap<u32, u32>;

Or

#[pallet::storage] 
pub type BondedPools = StorageMap<u32, Pools>;
/// Invariant: the keys of this map must always be equal to that of [`BondedPools`].
#[pallet::storage] 
pub type RewardPools = StorageMap<u32, Vec<u8>>;

In the old days, we tried to to formulate some of these assumptions, and make sure, for example, they are always held at the end of each unit test.

github.com

paritytech/substrate/blob/c1103fd803a974ee4d4098eb631a8839f6cb8618/frame/staking/src/mock.rs#L541-L546


      
          	pub fn build_and_execute(self, test: impl FnOnce() -> ()) {
          		sp_tracing::try_init_simple();
          		let mut ext = self.build();
          		ext.execute_with(test);
          		ext.execute_with(post_conditions);
          	}

Or:

github.com

paritytech/substrate/blob/ded44948e2d5a398abcb4e342b0513cb690961bb/frame/bags-list/src/mock.rs#L144-L149


      
          	pub fn build_and_execute(self, test: impl FnOnce() -> ()) {
          		self.build().execute_with(|| {
          			test();
          			List::<Runtime>::sanity_check().expect("Sanity check post condition failed")
          		})
          	}

Composite, Semi-Private Storage Types.

A common scenario that can appear here is: a number of storage items exist, which have a lot of invariants linking them to one another. If so, then we can:

make them all as private as it can be (no getter function, only pub(super)).
crate a wrapper struct that handles any read/write operation on this storage item group. We call this wrapper a composite storage item.
make sure any functions that mutates any of the underlying storage items is exclusively done as such:

struct Wrapper<T>(PhantomData<T>);
impl<T> Wrapper<T> {
  fn mutate_checked<R>(mutate: impl FnOnce() -> R) -> R {
	let r = mutate();
	#[cfg(debug_assertions)]
	assert!(Self::all_invariants_should_be_checked_here().is_ok());
	r
  }
}

Here is a full-fledged example of this.

In general, the goal here is to make sure these invariants are checked more frequently, ideally after each transition of state, not just after a test is executed.

Try-Runtime: Follow-Chain, `TryState`

github.com/paritytech/substrate

Runtime State Test + Integration with `try-runtime`

paritytech:master ← paritytech:kiz-sanity-check-trait

opened 11:28AM - 04 Nov 21 UTC

kianenigma

+621 -179

This PR adds a new hook to the `try-runtime follow-chain` subcommand to execute …a custom hook called `sanity_check` per block. This allows us to avoid dumping too much testing code into `debug_assertion`, and instead test them in real life. To see more about what `follow-chain` itself is doing, see #9788. Polkadot companion: https://github.com/paritytech/polkadot/pull/5907 ## (Updated) Context: I have pulled this PR out of the grave again, since I need it for my nomination pools. Reviews can be easy going, since this is all offchain code, but, please be aware: 1. I genuinely dislike the name `sanity_check` now. I think we should rename it to `on_block_check`, and rename `integrity_test`to `on_construct_runtime_check`. If all agree, I will do it in this PR. 2. I had to add a new type `AllPalletWithSystemFlat`, which is basically the same thing as before but tuples are not nested. This is needed because `PalletInfoAccess` is implemented on each individual pallet, but not on tuples. In reviews, please double check for me that this has no side-effect, other than us needing to increase some of the limits on `impl_trait_for_tuples` from 30 to 100. Currently, the `follow-chain` command is an okay-ish way to continuously run these sanity checks, but honestly it is kinda slow and sluggish. As a follow-up, I will work on integrating this into the main client, so you can import blocks but use the try-runtime api instead of Core for block import. #### TODO: 1. [x] think of a wrapper trait to hide the second parameter. 2. [x] feature gate functions again. 4. [x] fix all tests. 5. [x] double check that `AllPalletsWithSystemFlat` has not side-effect. 6. [ ] Clean up the return type of sanity checks 7. [ ] (optional) Rename `integrity_check` -> `on_construct_runtime_check` 8. [x] (optional) Rename `sanity_check` -> `state_consistency_check`

With all of this in place, as of recently, we added a new hook to each pallet to (somewhat) standardize this notion of “invariant”. This was eventually called fn try_state (open to new names of you have any), and it lives inside the impl Hooks part of your pallet. It is meant to be per-pallet. Since it does not distinguish between different storage items, it is more sensible to be called per-block.

This is not a prefect abstraction, and I would appreciate further feedback, if any. For example, when I explained this to @shawntabrizi he preferred having an abstraction to define checks individually, not as a single function per-pallet. I think having a try_state per storage could also be a better abstraction.

Moreover, a new command was added to try-runtime, called FollowChain. This command runs the transactions of a real chain, on top of its real state, with a new runtime (and the state root check is disabled, because it will no longer match). This is a very powerful testing primitive, and I hope over time more teams start using it.

In essence, this is a dry-run for an unpublished WASM runtime.

With follow-chain, you can also specify the try_state of which pallets should be executed.

Defensive Traits

An invariants, as the name suggest, should be ALWAYS held. So it makes sense to try and check this condition as frequent as you can.

One issue that we encountered along the way was having to sprinkle the code with #[cfg(debug_assertions)] too much. To combat this, we borrowed a keyword from Defensive Programming and added a few traits that do the following:

Defensively handle an Option<_> or Error<_>, meaning that the fallibility is still handled, but:
If in tests, panic.
If in production, raise an error log and move on.

This allows the developer clearly express where they think some invariant is always held, without needing to write expect("proof"). For example:

github.com

paritytech/substrate/blob/c1103fd803a974ee4d4098eb631a8839f6cb8618/frame/nomination-pools/src/lib.rs#L1505-L1506


      
          			let mut reward_pool = RewardPools::<T>::get(pool_id)
          				.defensive_ok_or::<Error<T>>(DefensiveError::RewardPoolNotFound.into())?;

Because these two maps have an invariant that their key-set must always be the same, which is in fact checked in the try_state hook of the pallet.

Fuzzing

An interesting discussion that came up recently in a substrate issues. The reason I bring it up here is that in order to have meaningful fuzzing, we exactly lacked something like try_state. Once we have a function that can represent the correctness of a pallet, we can actually think about running a fuzzer and continuously call the correctness function. Read more about this in:

github.com/paritytech/polkadot-sdk

Vision: Random call generator for fuzz-testing

opened 01:32PM - 12 Sep 22 UTC

kianenigma

I5-enhancement T1-FRAME

> Some notes about the fuzzing outcome: As you see in the code, I randomly gener…ate calls that may or may not be valid and bombard the pallet with these calls, and every so often I execute the entire set of sanity checks that are defined in the pallet. We could create a tool which generates a set of calls from the metadata of a pallet. Could even be done in PolkadotJS, as general fuzzy testing framework. _Originally posted by @ggwpez in https://github.com/paritytech/substrate/issues/12002#issuecomment-1243619988_

Hooked Storage Items

Lastly, if and once we have storage types that can have on_insert, on_update, on_remove hooks etc., we can integrate all of these checks at a much lower level.

For example, we can check on_update that certain invariant are met.

This is merely at the idea level and we haven’t done any work on it either, other than a little bit of prototyping. Just raising it here as an idea.

Future Plans

The major next step that I have in mind is to integrate try_state into a real client, so you can run a normal Polkadot/Substrate/Cumulus node, add an extra feature flag, and then you should be able to run try_state of all blocks as you import them. This can help us reach a much higher coverage across the ecosystem and make sure try_state of many chains are called on many blocks.

Ideally, we can ask some infrastructure providers to also run one node with try-state checks enabled.

Other than that, I am temporarily tracking the issues related to testing in this project board:

What do you think?

If you have used any of the above tools/approaches and have any feedback on it, do share them here! Or are there any other tools in the ecosystem that do similar things? Such as:

pepyakin · September 13, 2022, 3:05pm

Oh, that’s a great post!

One meta thing I must add is that it might get hard to discuss since it packs many different things together. I think maybe separate topics would have been better, but it is what it is and prolly fine.

Explicitly identifying and checking the invariants is as powerful a technique as it is humbling. Here is one example of how I was using it:

Storage items, the state, get invariants identified, like in this example.
There is a module-wide function that checks all the storage invariants exhaustively. Exhaustively means that that function just iterates over all the storage and ensures it is consistent.

This identified many bugs, and I bet would be very useful for the people just onboarding to that code. Of course, this approach has a cost: sometimes it flags false positives. Ultimately, though, I think it’s a very good trade, even before accounting for the potential consequences of defects.

Now, about the wrapper types. I think many FRAME developers neglect that technique. Take this example. There, the types abstract the manipulation of the storage. Those types enable in-memory caching and give a clue how the storage types are used. IMO this is way better than manipulating those storage entries directly. It also indirectly improves the reliability of the code, and checks could be added. A wrapper approach with mutate is also a good idea, although a bit verbose.

I must say that FRAME is a bit hostile to that approach. I wish I could declare storage in one of the pallet crate’s modules the storage and only expose high-level wrappers to interact with it. Instead, most of the times, I have to express all the logic close to the storage and mostly all in one file. Other maintainers might not be aware of that they have to use only the wrapper and might use the storage entry directly potentially invalidating the invariants.

(cc @shawntabrizi)

He preferred having an abstraction to define checks individually, not as a single function per-pallet. I think having a try_state per storage could also be a better abstraction.

Well, maybe, but if he had in mind per-storage checks then that probably also not the best idea. Check out this example invariant: the list should have all the items that the set has and the set should have all the items that are in the list. There is no natural home for this invariant.

One may argue that as a convenience feature, you may still introduce something like that, but I am not sure if that’s a good idea.

Now, regarding the Defensive trait.

Defensively handle an Option<_> or Error<_>, meaning that the fallibility is still handled, but:

If in tests, panic.

If in production, raise an error log and move on.

This has to be approached with care. Sometimes, it might be needed to test the fallback code, i.e., what happens in None or Err case. That becomes impossible since in tests it panics and that requires some sort of hack to disarm those checks.

Now, regarding fuzzing. This one should be obvious for anyone who used it for testing their code. I think FRAME code is in a very good position to be fuzzed. We can also go further:

Use the real chain tips to initialize the state and then apply arbitrary transactions suggested by the fuzzer,
Do the same, but for migrations.

And finally, it would be great if this discussion resulted in to a set of written guidelines.

OliverTY · September 13, 2022, 4:24pm

It is indeed a bit too much to react to all of it. Maybe we just collect ideas in here and then create dedicated discussions for each part.

I like this idea. Imagine writing the invariants actually as code and not as comment:

#[pallet::storage] 
pub type Map = StorageMap<u32, u32>;

#[pallet::storage]
#[pallet::invariant(
    event: changed(_previous, current) {
        assert!(current == Map::iter().count());
    },
    check_on: StorageLayerCommit,
    // Could think of when to check these conditions; per Extrinsic, per Block, Always…
)]
pub type Counter = StorageValue<u32>;

andronik · September 15, 2022, 6:27pm

Fuzzing

In my experience, when done right, fuzz testing can find bugs quicker than property-based tests. My personal preference is cargo-fuzz because it’s based on llvm’s libFuzzer which mutates the inputs based on branch coverage (although there should be ways to make the interface universal). Due to this, adding some debug assertions in your code will create more branches and increase the probability of hitting a bug. The only downside is that it requires a nightly compiler. Honggfuzz doesn’t.

Failpoints

fail is a crate that dynamically injects errors. I assume this should also play nicely with fuzzing. It’s being used in some other blockchains.

Type-safety

Use of nested generic types that make it impossible to create invalid instances of a type (one example is Penumbra’s tiered commitment tree). Another example is NonZeroUsize or types that do validation in fn new().

Formal verification

There are plenty of tools available for doing formal verification in Rust: Rust verification tools (2020) – Alastair Reid – Researcher at Intel.

I’m personally excited about GitHub - model-checking/kani: Kani Rust Verifier because of the simplicity of writing tests (looks like a property-test).

m-cat · October 17, 2022, 8:37pm

I didn’t quite follow this, can you give some examples of what you mean? FWIW I think the Defensive trait is a great idea, I’ve used something similar before.

m-cat · October 17, 2022, 8:43pm

+1 for Type Safety. You can cut down on tests and potential errors by simply making some states not representable. It’s also a low-hanging fruit in many codebases (not sure about Parity yet).

Coverage Marks

An interesting technique to be aware of is coverage marks, which can link code paths with the tests that test them. Maybe too verbose to use extensively, but could be useful in some circumstances.

OliverTY · October 18, 2022, 11:19am

Any defensive_unwrap_or cannot be tested, since the test immediately panics.
Had the same problem yesterday in Add `DefensiveTruncateFrom` by ggwpez · Pull Request #12515 · paritytech/substrate · GitHub where I add a defensive_truncate_from function.
This function first tries to use a non-truncating conversion and falls back to truncating if that fails.
Problem is that the truncating case cannot be tested for stated reason that it panics in test.

#[test]
#[should_panic(
	expected = "Defensive failure has been triggered!: \"DefensiveTruncateFrom truncating\""
)]
fn defensive_truncate_from_vec_panics() {
	// NOTE: We cannot test the truncating case since defensive failures panic in tests.
	let unbound = vec![1u32, 2];
	let _ = BoundedVec::<u32, ConstU32<1>>::defensive_truncate_from(unbound);
}

pepyakin · October 18, 2022, 12:01pm

Defenisve extends the appropriate data types with functions analogous to unwrap. That is, the match arm previously panicking would now return the control to the caller. Should you want to test that path, you would have to get creative.

First example:

fn squizle_foobers() {
  // .. 
  process_reward_pool(reward_pool.defensive_unwrap_or_default())
  // .. 
}

Employing Defensive means that you are not 100% sure that this won’t ever be None. It is reasonable to you would like to test how squizle_foobers() behaves in case reward_pool was None. But how would you do that? Would you re-run the test without cfg(debug_assertions)? How would that work in CI (I don’t think we would want to build the tests twice)? Maybe there should be some runtime switch that you would flick?

That might seem like a local issue, where you could come up with a module-local solution specifically for that test. But that does not scale. If the Defensive is planted in a module that you use, there might be no way to defuse it, without touching the source code of the dependency module. Hence I claim that Defensive as is suffers from composability issues.

That’s why I said it has to be approached carefully. But now, I actually clicked through the links and it seems that it already made its way into the library code which makes me a bit sad.

UPD: Did not see @OliverTY 's message but yeah that’s a good demonstration of one of my concerns.,

OliverTY · October 19, 2022, 3:10pm

This seems to use thread local storage to remember which marks got hit.
Maybe we can use something like that as well to decide whether or not a defensive! should panic or log.
Then we could test both cases. These markers should not be compiled into the runtime, since AFAIK thread local storage does not make sense there. But maybe it works for testing.

m-cat · November 25, 2022, 4:12pm

I just found loom today and thought it was interesting.

Loom is a tool for testing concurrent programs.

At a high level, it runs tests many times, permuting the possible concurrent executions of each test according to what constitutes valid executions under the C11 memory model. It then uses state reduction techniques to avoid combinatorial explosion of the number of possible executions.

Topic		Replies	Views
Test frameworks and utilities for Polkadot and friends Tech Talk	3	780	April 17, 2023
Polkadot Release Analysis v0.9.43 Ecosystem release-analysis	3	1517	July 25, 2023
Parachain Technical Summit - Next Steps Tech Talk roadmap , frame , polkadot-summit	26	4962	July 17, 2024
New Tool for Runtime Developers: Try Runtime Github Action Tech Talk	0	233	March 18, 2024
Best Practices for Runtime Upgrade Pre-check Tech Talk	3	551	April 3, 2023

Testing Complex FRAME Pallets: Discussion + Tools

1. Try-runtime Classics

Storage Invariants

Composite, Semi-Private Storage Types.

Try-Runtime: Follow-Chain, TryState

Defensive Traits

Fuzzing

Hooked Storage Items

Future Plans

What do you think?

Fuzzing

Failpoints

Type-safety

Formal verification

Coverage Marks

Related topics

Try-Runtime: Follow-Chain, `TryState`