State Trie Migration

The trie format of substrate has changed recently in order to become more PoV-friendly. Once done a state_version field has appeared in each chain’s runtime specification, where 0 is the old format, and 1 is the new format.

Migrating to the new trie can happen lazily, and technically, setting the version to 1 will put your trie in a hybrid mode where keys will be lazily migrated as they are read. This is okay, but it is a small overhead and it might take a long time until the state is fully migrated.

This is why we wrote a special pallet that can be configured to read and write all storage keys in the background, in a safe way, to facilitate the migration.

I wrote a guide about this a long time ago, which some of you might have already seen, but I want to re-share it again: Substrate State Trie Migration Guide - HackMD.

I am aware that Parity is in the process of migrating our testnets using the mentioned tool and method, and I will ask them to share their outcome here once done. Until then, I suggest being extra cautious when it comes to performing the migration.

And we can use this thread to share our experience performing the migration and help with troubleshooting.

Good luck!

7 Likes

Yes, that is the current plan. I want that we try on Rococo and on all of the Parachains we control there. Then do the same again for Westend. While doing this we probably also improve the guide even more to make it easier and more approachable.

After all of that this done, we will be able to move on to do this for Kusama/Polkadot and the Parachains.

2 Likes

Basti,

Did you migrate all polkadot networks to the v1 ?

No, it isn’t moving that fast. But I think we have migrated Rococo and Westend. Next step is Kusama and then Polkadot.

Cc @Emeric

Also I looked at the tutorial and given the numbers and the fact we have 100M+ keys at the moment, I’m afraid it will take a long time to migrate (like months) and can be problematic if we run some runtime upgrades in the middle :confused:

Runtime upgrades should not be an issue, just need to remove the on_runtime_upgrade hook (or write it in a way it will not restart the migration).

But the migration duration looks rather bad. There may be some way to adjust the limit depending on the data, or even skip some pallet if we know there is no value > 32 byte in their prefix (hardcoded or in a configuration).

There is two limits, a number of items and a total length of values.
Look redundant, but number of items is here to take account of the trie branch in the proof (from runtime we can only count size of values).
But since the migration iterate on the merkle trie state, the branch are shared a lot, and the overhead may be a bit over estimated (note overhead is slightly smaller if there is a lot of items in the chain).

Another thing to considerate, is to maybe change the warpsync to allow synching on a chain that is migrating (need to attach the version to each values but it is doable (actually version is in the proof already but we inject data from key value form in a single state version at the time).

Do you mean the migration duration of moonbeam? I mean it should take in maximum one week to migrate the entire state of moonbeam or not? And that already assumes quite a lot of overhead.

We just added support for warpsync on parachains and most probably still don’t have it. So, I think it is fine if it isn’t working for the migration time. We just need to finally get the guide ready for people to run the migration.

Do you mean the migration duration of moonbeam? I mean it should take in maximum one week to migrate the entire state of moonbeam or not? And that already assumes quite a lot of overhead.

Yes.
I am not too sure I don t have the numbers in head anymore (I remember westend taking longer than expecting, but I think it may be related to the low number of item limit I did set (160), but when I set this I was using a worst case estimation of branch size I think, I am pretty sure it did not take into account the fact that we have consecutive items so branch are compacted.
I will try to play with try-runtime on kusuma to check a bit.

Any update?

got few issue running with try runtime, had to customize a branch, got some test running, will still need to extract data from log, but it looks like for a size limit of 200k, 160 item limit was really bad (5400 blocks I guess 6sec bloc : 22 days) , 1600 (952 blocks and from random looking all bellow 300k) or 3200 (800 blocks but did see a few 600K but may be the same with 1600) looks more correct.
Could maybe try also with a limit set at 400k.
But I feel like the try runtime did not get me the child trie content which is a big part of kusama (850k key value exported only). So will need to look a bit more closely at the different size and do some more test (Will told me he got a limit for new code on next wednsday Ithink).

Ok, there was definitely no child trie due to try runtime issue, but there is not much child trie content so data is not that different.

I put in an open data spreadshit the data I got from logs:
https://raw.githubusercontent.com/cheme/substrate/try-runtime-mig/ksm.ods

Name of the tab is 1600_200 means max 1600 item and max 200 ko value data.
columns are: number of top trie items recorded, number of child trie items recorded, proof size before compaction, proof compacted, I could also calculate the zstd proof size but skip on it (only need to uncomment some code).

Numbers seems to make sense (big 8megs end of child trie is the :code item).

number of items can be put a lot upper (I will change kusama pr to target 400ko proof per blocks), the overhead per item to be (when normal sized value) around 60 to 80 bytes.

So I will switch the kusama PR to target 600ko (4800 item 400 ko).

Is there any update to this migration, by curiosity?

I know that there exists a GitHub tracking issue for this feature, but it’s in a private repository, so I (and anyone else) have no idea what is happening.

I was wondering the same: https://github.com/paritytech/polkadot/pull/7015#issuecomment-1620403929

The pr is now merged and thus we should hopefully migrate Kusama when this hits the runtime. After that we hopefully can migrate Polkadot.

A fully two and a half months have passed since the PR has been merged.
Is there another blocker now?

I don’t see this pallet and its migration being part of 9430 runtime, so whatever that PR was, it never made it into a runtime release. We need to wait for the next fellowship-based runtime release to include the change, as far as I can say.

So what’s the progress here now?

2 Likes

Updated these two issues to track process for all Relay- and System-chains: