Decoding Historic Blocks/Storage in Rust: August 2024 Update

jsdw · August 21, 2024, 4:37pm

Back in February, I wrote about my plans to bring historic block and storage entry decoding to Rust. Lots of progress has been made, and there’s a bunch more still to come, so I thought an update was long overdue!

Recap

As a quick recap, the problem being addressed is that prior to V14 metadata (which landed on block 7229126), we had no information about the shape of the various extrinsics and storage entries that existed at older blocks. Metadatas version 13 and below only contain the names of various extrinsic arguments and storage entries, but nothing about how to decode those things.

The JavaScript library PJS (@polkadot/api) is capable of decoding these historic types in JavaScript, but by bringing this ability to decode historic types in Rust, we:

Open the door for arbitrary other languages like Python/Golang/Java to bind to this Rust code in order to gain access to the same ability running at native speeds.
Open the door for platforms and devices which are more CPU/memory constrained to decode historic information.
Make it possible to do everything in Rust (or languages that can bind to it) that was once only possible in JavaScript; we can already work with the tip of a chain via libraries like Subxt, but have always struggled to work with historic data.
Consequently, also make it possible to eventually deprecate PJS (@polkadot/api) in the future, which brings with it a large ongoing burden.

Progress so far

To date, we have made the following progress towards this goal:

`scale-info-legacy` has been created to handle describing types by their names.

Just as scale-info defines the structures for describing V14+ type information via a PortableRegistry type, scale-info-legacy defines the structures necessary for describing historic types.

scale-info-legacy is the foundation of any work in Rust to decode historic types. The basic idea is that you provide a description of what the shapes of various types are given their name. These descriptions can be provided in the form of some JSON, YAML or some other serde compatible format. The format that we take in can handle the fact that names can change in meaning across different runtime versions (eg as new struct fields or enum variants are added over time), and handle the fact that the same type name is sometimes seen in more than one pallet.

See this documentation for more information on the format for providing historic type information.

Existing libraries have been made generic over how type information is provided.

Libraries like scale-encode (for SCALE encoding Rust types), scale-decode (for decoding SCALE bytes into Rust types) and scale-value (a representation like serde_json::Value which any SCALE bytes can decode into) were previously all tied to using scale-info type information to decide how to decode/encode things.

Nowdays, all of these libraries are now generic over how type information is provided to them, so that we can provide modern type info (via scale_info::PortableRegistry) or historic type information (via scale_info_legacy::TypeRegistry or scale_info_legacy::TypeRegistrySet).

`scale-type-resolver` has been created to enable the above.

In order for libraries to be generic over how type information is provided, we have created scale-type-resolver. This library exposes an interface, TypeResolver, which provides a generic means to obtain type information given some arbitrary TypeId in an efficient manner.

scale-type-resolver::TypeResolver has been implemented by scale_info::PortableRegistry, scale_info_legacy::TypeRegistry and scale_info_legacy::TypeRegistrySet, and is then expected in various interfaces across scale-decode, scale-encode, scale-value in order that they can all be generic over the sort of type information that they can be given.

A side effect of this work is that these libraries no longer depend explicitly on scale-info.

We’ve used these libraries to decode historic Polkadot blocks.

After building the above libraries, the next step was to use them to actually decode historic Polkadot relay chain blocks. This was done to validate the approach, but also to begin building up the historic type information that we need in order to work with historic Polkadot blocks.

As of commit 1baba102e5c29c6f3ec878a7fe461a26dce980f1, my decoding example is now capable of decoding all of the extrinsics in all historic Polkadot relay chain blocks. In doing so, this historic type information has been constructed. Those familiar with @polkadot/api might find the format for this information familiar; I’ve tried to keep it close to how such historic types types are defined in @polkadot/api (which also helps when porting type information across from there to Rust).

What next?

We’ve come quite far, and already I think that the foundational libraries here should allow people to begin experimenting with decoding and working with historic data. Here’s what I aim to do next:

Get historic storage entries decoding in Rust, too. This is trickier than decoding blocks, since there is a lot more storage to decode in order to actually validate that things are working properly, but I have a plan for this (which essentially is to decode all of the storage entries found in semi-randomly chosen blocks across different spec versions to build confidence that we can decode everything).
Use what I’ve learned to build a higher level decoding interface in Rust. I need to think about what this will look like, but essentially I’d like there to exist a library which exposes a very simple interface that takes some type information and some bytes to decode, and hands back a type that the user wants to decode the bytes into (ie, but not limited to, a scale_value::Value).

The aim by the end of the year is that we’ll be able to decode all historic Polkadot relay blocks (done), have reasonable confidence that we can decode all historic Polkadot relay storage entries (in progress), have a demo of this at work (in progress), and have some higher level library to make this simpler (not started yet).

How can you help?

I’m only focusing on building up the type information to decode Polkadot relay chain blocks/storage at the moment. I might look to Kusama next, but I won’t have time to also look at historic blocks across the various parachains.

As such, one amazing way to help out would be to help to build up historic type information for other chains and make it available to people.

The approach that I’ve been using to do this is essentially:

Check out and run the GitHub - jsdw/polkadot-historic-decoding-example: An example for decoding historic Polkadot blocks and printing out some information about them example. You can ask it to decode blocks (and soon storage), and point it at some URLs of RPC nodes for a given parachain, as well as point it at some type information (perhaps the included polkadot-types.yaml is a good place to start).
Whenever this fails to decode something, it returns a (hopefully) informative error message describing what went wrong. Often this is that some type is missing from the given types. To find out what to add, you can then:
- Search for the missing type name in the GitHub - polkadot-js/api: Promise and RxJS APIs around Polkadot and Substrate based chains via RPC calls. It is dynamically generated based on what the Substrate runtime provides in terms of metadata. repo (easiest to clone it locally to search it). often you’ll then find a definition which can be adapted into the format needed here. Or…
- Run the JS command at polkadot-historic-decoding-example/js at main · jsdw/polkadot-historic-decoding-example · GitHub to decode the same block using PJS, and then use the output from that to help to work out what is wrong with the output from the Rust command.

It takes a little time to get used to the sorts of quirks that you can encounter, and it’s definitely easier if you’re familiar with some of the historic types of the chain you’re working to decode, but it gets easier!

If you’d like to build up types for some parachain in this way, I’m happy to help and can be reached on Matrix/Element at @james.wilson:parity.io.

Thanks a bunch for reading, and if you have any questions I’ll endeavour to answer them!

Topic		Replies	Views
The path towards decoding historic blocks/storage in Rust Tech Talk	6	495	February 29, 2024
Decoding old and new blocks and storage entries in Rust: October 2024 update Tech Talk	0	201	October 2, 2024
Rethinking historical storage	6	520	October 7, 2023
Archive RPC-V2 Methods Tech Talk	0	45	December 12, 2024
Announcing the merkleized-metadata TypeScript bindings Tech Talk	0	282	June 21, 2024