At present, the PolkadotJS API is the only means (that I’m aware of) of decoding historic blocks on Polkadot chains (aside from desub, which I’ll get to later).
We’d like to add the ability to decode historic blocks in Rust for a couple of reasons:
- To open the door for Rust tooling to be built around this ability instead of restricing developers to using TypeScript.
- A rust implementation also opens the door for other languages to leverage the same code (ie TypeScript via WASM compilation, or many other langs via C bindings).
- To provide an alternative to PolkadotJS, which is no longer being actively developed by its primary author (although we are aiming to continue maintaining it until suitable alternatives exist).
So, I’d like to share with you all my plan, and the progress so far, towards being able to decode historic blocks using Rust.
(Side note: this post is an elaboration of, and update on, the issue originally posted here)
Introduction
I’ll start by summarizing the overall problem that we face.
Today, if you ask for the metadata from a chain, you’ll probably get back version 14 or version 15 metadata. These versions both contain a scale_info::PortableRegistry
, which itself contains all of the type information needed to construct and encode (or decode) valid extrinsics, storage keys and values, and more. Types in this PortableRegistry
each have a u32
identifier, and this is used to point to them elsewhere in the metadata when we are describing what calls or storage entries are available.
Libraries like Subxt (Rust) and Polkadot API work by downloading the metadata from a chain and using it to understand how to SCALE encode and decode values based on this type information so that they can build and submit valid extrinsics and such.
If you go back a few years (ie to when Polkadot ran runtimes that contained V13 or below metadata), this type information (ie the scale_info::PortableRegistry
) did not exist at all. Instead, all that we had in the metadata were the names of the various types that were used in things. There was no information about what those names meant, or how to encode/decode types with certain names to the right shape. So how did we know how to encode/decode anything?
PolkadotJS was created in 2017 as a client which was capable of interacting with Polkadot (and later, its parachains). It required type information to know how to encode/decode things, but none was available, so it had to construct its own. It built a mapping from the name of a type to some description of how to encode/decode it (nowadays this is mostly here). Since the shapes of many types evolved over time, PolkadotJS would add overrides to its type information that would take effect in certain spec versions and on certain chains in order to continue to be able to understand their shapes. Thus, if PolkadotJS knew which chain and spec version you were targeting, it would be able to look up how to decode information for it.
Newer libraries like Subxt and Polkadot-API were able to leverage the type information in modern metadata and so have never evolved this ability, meaning that PolkadotJS remains the only way to decode historic information today. This is now changing, as we have recently started work on building the relevant features to be able to decode historic data in Rust.
Decoding historic data in Rust
First, I’ll start with what we had in place until recently in Rust. Then I’ll summarize our overall plan for adding the ability to decode old data in Rust. Finally I’ll explain each step in more detail, as well as where we’re at today.
What we had until recently in Rust
This diagram gives a rough idea of the main Rust libraries that we had until recently that are relevant here. Arrows are “depends on” and show the rough hierarchy of them (various dependencies are not represented here).
Let’s summarize each of these, starting from the bottom (follow the links to read more about each one):
- parity-scale-codec provides the basic SCALE encoding and decoding implementation. This library does not care about any type information, and simply encodes and decodes Rust types according to their static shape. Its main exports are the traits
Encode
andDecode
. Simply put:Encode
has a functionfn(&self) -> Vec<u8>
to SCALE encode self to bytes.Decode
has a functionfn(bytes: Vec<u8>) -> Self
to SCALE decode bytes into Self.
- scale-info provides a structure (
PortableRegistry
) which contains the type information needed to know how to SCALE encode and decode types. Types can be obtained from this structure if you know their type ID (au32
). This is present in V14 and V15 metadata. - frame-metadata defines the format that metadata will take. One can SCALE encode or SCALE decode metadata into this format. The format has changed over time, and so metadatas are all wrapped in an enum to which a new variant is added each time we produce a new metadata version. Newer versions of the metadata (V14 and V15) contain a
PortableRegistry
and point to types in it when describing things like the available extrinsics. - scale-encode and scale-decode primarily export
EncodeAsType
andDecodeAsType
traits, and implement them for common Rust types. These both build onparity-scale-codec
. Simply put:EncodeAsType
has a functionfn(&self, type_id: u32, types: PortableRegistry) -> Vec<u8>
. In other words, this encodes values based on the type information provided (and not based only on the shape of&self
).DecodeAsType
has a functionfn(bytes: Vec<u8>, type_id: u32, types: PortableRegistry) -> Self
. In other words, this decodes some SCALE bytes into some type based on the type information provided (and not based only on the shape ofSelf
).
- scale-value primarily exports a
Value
type. This type is analogous toserde_json::Value
and represents any valid SCALE encodable/decodable type. ThisValue
type has a string andserde
representation, and also implementsEncodeAsType
andDecodeAsType
. Any SCALE bytes can be decoded into aValue
. - subxt is a client library for interacting with chains in Rust, and doing things like submitting transactions or looking up storage values. It relies on all of the above to be able to intelligently encode and decode values to construct transactions and such.
We can see here that scale-info
is pretty integral in that it provides the type information that all of the higher level libraries use to SCALE encode and decode bytes. This is a problem if we want to be able to re-use these libraries to also be able to encode and decode historic data that doesn’t have any associated scale-info
type information, though.
What we’re now working towards
In summary, we’d like to work towards modifying our relevant libraries to look more like this:
Green boxes represent new crates, and yellow boxes represent areas of significant change.
Let’s dig into this more:
Step 1: scale-type-resolver
In order to be able to re-use scale-encode
, scale-decode
and scale-value
to decode historic data, the approach that we are taking is to remove scale-info
from their dependency trees and replace concrete uses of it with a generic trait that can be implemented for anything that is capable of providing the required type information (including scale_info::PortableRegistry
).
So, the first step is to create such a trait, which we’ve called TypeResolver
and have recently implemented in the new scale-type-resolver crate. This crate is no-std
, and the trait exposes an interface that can be implemented on scale-info::PortableRegistry
with zero additional overhead (in theory at least). In order to be zero cost, the trait works by being given a visitor which implements ResolvedTypeVisitor
; the relevant method on this visitor is called depending on the shape of the type being resolved.
Step 2: Make use of scale-type-resolver
throughout the stack.
The next step is to make use of this new trait in scale-encode
, scale-decode
and scale-visitor
instead of having any explicit dependency on scale-info
. This had the effect of generalizing all of these crates so that they can be used to work with historic types as well as modern ones.
scale-encode
version 0.6 and scale-decode
version 0.11.1 have already been updated to depend on scale-type-resolver
instead of scale-info
. We’re now working on porting scale-value
and subxt
to using the latest versions of these libraries.
Step 3: scale-info-legacy
By this point, our main Rust libraries can all, in theory, decode historic types. But we only have a way to describe modern types via scale-info
! So, in the same way that scale-info
describes modern types, scale-info-legacy
will provide the means to describe historic types. Some notes about this:
- Historic types are referenced by something like a name rather than a numeric ID: in older metadata versions, we only have type names to go by. So we’ll want to be able to build type mappings that can be handed a type name and resolve it into a description of the type (that’s compatible with
TypeResolver
). - Historic type information doesn’t exist in metadata, so we should also strive to provide a bunch of default type information that is aware of changes across spec versions. These can provide a starting point for chains to then extend with type information for any custom types that they used. We can obtain a bunch of this from PolkadotJS to get us started.
- It should be really easy for users to provide their own type information on top of (or instead of) the defaults.
- We need great error messages in the event that type information couldn’t be found, to make it as easy as possible for users to add missing types as they are encountered, until they have provided all of the necessary type information. It’s expected that this will happen a lot to begin with.
My plan is to start work on this crate in the next week or two. I am aiming for it to be ready some time during Q2 2024, although there may be a long tail of work involving building up a test suite for decoding historic types and adding missing types to the defaults that we’ll provide for Polkadot/Substrate.
Step 4: desub
A desub
crate (well, set of crates) already exists, and was used as part of substrate-archive
to decode historic blocks into JSON for storing in a database. It’s marked as green on the diagram because the plan is to effectively replace it with something that can leverage the scale-*
crates we’ve developed in order to provide a more generally applicable and better integrated decoding experience (although we’ll adapt and make use of various bits and pieces that it offers).
The goals of this crate will be:
- To provide generic interfaces for decoding extrinsics and storage values given some bytes and type information in a way that builds on top of the lower level “decoding types” functionality now available to us. Subxt may eventually be able to re-use this logic rather than having its own storage/extrinsic decoding logic, so that’s something we’ll keep in mind here.
- To put on top of this a simple, high level interface for decoding arbitrary block/storage info from bytes given the relevant metadata and type information.
- I have a suspicion that we will also provide a simple RPC layer to connect directly to archive nodes and pull the relevant information, rather than requiring the user to obtain the bytes themselves first.
- To contain any CLI tooling that might be useful in helping users to construct the correct type information (for example, perhaps we’ll add a scanner to find out which blocks contain a spec version change; something that PolkadotJS historically kept track of internally).
There’s still some uncertainty around exactly what the interface will look like here; we’ll probably need to try some things to see what works.
By the end of Q2 2024 I expect we’ll have made some decent progress on this, with an initial release expected in Q3-Q4. As with scale-info-legacy
, I expect there to be a long tail of testing to discover decode issues in historic blocks and storage data.
Step 5: scale-info -> scale-type-resolver
scale-type-resolver
currently exposes the TypeResolver
trait, and also contains an implementation of TypeResolver
for scale-info::PortableRegistry
(behind a feature flag). Thus, exactly one scale-info
version will implement TypeResolver
at any one time (the version that scale-type-resolver
is pulling in). if scale-info
has a major update, then we need to update scale-type-resolver
to point to it, leading to the entire hierarchy of crates depending on scale-type-resolver
to need updating too.
So, a small thing I’d like to do once the dust has settled is to instead have scale-info
depend on scale-type-resolver
and then implement the TypeResolver
trait itself. This means that multiple versions of scale-info
can implement the TypeResolver
trait, and our core libraries (scale-encode
, scale-decode
and scale-value
primarily) are no longer impacted at all by scale-info
updates.
This should be left until everything is working well and we’ve found no obvious reason to update scale-type-resolver
.
Future
With all of this in place, there may be some desire to update subxt
to be more generic over how it handles historic types too, so that it can take on the task of fetching historic data as well as modern data, and is able to decode everything nicely. An advantage of subxt
doing it all is that we avoid duplicating some of the logic around making RPC calls to nodes and decoding extrinsics/storage bits.
For now though, I think that it’s better to focus subxt
on working at the head of some chain, and to keep functions for accessing historic data separate. Let’s see how things shape up in the next year or two!
Alternatives
I considered a couple of alternate approaches prior to this:
- Reusing
scale_info::PortableRegistry
as a means to store legacy type information, rather than being generic over it. The reason that I did not pursue this line was that being generic over type information gives us more overall flexibility, making it more likely that we can create legacy type information that is efficient to query, and generally making it less likely that we run into any major road blocks (iePortableRegistry
not being able to handle generic “type names” likeVec<u8>
in a way we’d like). - Being more generic! We’ve taken the approach to be generic over the structure that can resolve type IDs into the corresponding type information, but we could have gone further and decided to be generic over the entire process of decoding types (ie having a
TypeDecoder
trait that takes in any ID and returns some decoded thing). This was the original plan, as it allows complete flexibility over how we handle historic type decoding, but I abandoned this line when I realized that it would lead to us duplicating a bunch of type encoding/decoding logic, and prevent us from using libraries likescale-decode
in the way that I’d like.
Summary
- A
scale-type-resolver
crate has been added to be generic over how we obtain type information. scale-encode
andscale-decode
now use this instead of directly depending onscale-info
.scale-value
andsubxt
are heading this way too (well,subxt
will still depend onscale-info
, but it’ll use up-to-date versions of things). Expected in a couple of weeks.- We’ll build a
scale-info-legacy
crate for providing historic type information. Expected some time in Q2 2024. - We’ll build a new
desub
crate to contain all of the high level interfaces we’ll want for fetching and decoding historic data. Expected by Q3-Q4 2024.
If you’ve read this far, then well done! I’m open to any questions or thoughts on this.