Getting started using Rust and subxt for Polkadot data extraction

As part of the Parity Infrastructure & Data initiatives, we wanted to explore how we could use Rust, and more specifically the subxt library maintained by Parity, to work with and extract PolkadotSDK/substrate based chains data for both reading data and submitting extrinsics.

For instance, the current version of our data ingest implements a batch ingest approach to storing the blocks in our DotLake, mainly leveraging Python and the excellent py-substrate-interface library. This essentially means that instead of getting the blocks in real time, we do this on a 10 minutes schedule and have a daily check to QA the data. The technical decision was essentially guided by two things:

  • Less maintenance if something goes wrong
  • Backfilling (ie fetching missed blocks) is the same pipeline as the ingest one

This helped us keep things simple with a team of ~ 2 data engineers to cover most Polkadot ecosystem chains. A more in depth, although a bit outdated write-up is available here.

This guide is the result of one attempt to keep the maintenance low and reliability high, all the while introducing ecosystem wide real-time capabilities, which shows quite a lot of promise in the context of our new initiative (inspired by this forum post: Stabilizing Polkadot more on that later) around “Stabilizing Polkadot”. The need for real time data processing is becoming more and more relevant as more features are rolled out.

The benefits include but aren’t limited to: immediately knowing and alerting if something is going wrong, at the block level and at the scale of the complete ecosystem.

The following guide assumes little knowledge of Rust or subxt or even data engineering for that matter. It’s supposed to help anyone get started and hit the ground running by starting from simple basics to complement the already existing examples.

My personal belief is that the rate of adoption of something doesn’t only scale with it’s quality but also with the amount of working examples available, an example of this is the Python programming language, but I digress, let’s dive in.


Subxt

Subxt is mainly focused at submitting transactions to any substrate based chain. It’s secondary use is also reading block data from the chains.

We’ll gradually introduce the concepts and at the end settle on a solution that doesn’t force us to re-architect our infrastructure. There are a few examples in the repository to get you started, but you’ll need to know where to look for them to find them. Hopefully this’ll help.

Note: We assume Rust and cargo are installed and functioning properly on your machine.

First steps

In an empty directory, run:

cargo new rust-blocks
cd rust-blocks
cargo run

If you’re seeing “Hello World” after running the last command, you’re good to proceed. Otherwise something might not have worked with setting things up.

Now let’s install a few packages that we will need:

cargo add tokio --features tokio/rt --features tokio/rt-multi-thread 
cargo add parity-scale-codec --features parity-scale-codec/derive --features parity-scale-codec/bit-vec
cargo add futures
cargo add subxt

This should install all the dependencies we will need for now. If you check the Cargo.toml file that is in the same directory, you’ll see your dependencies have been added in there.

Running cargo run again, should give the same result as before.

Now, let’s write code that writes some output to the command line every time a new Block is created on the Polkadot relay chain. Replace all the contents of the src/main.rs file with the following:

use subxt::{OnlineClient, PolkadotConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
	// here we connect to one Polkadot RPC endpoint
    let api = OnlineClient::<PolkadotConfig>::from_url("wss://rpc.polkadot.io:443").await?;

	// here we subscribe to newly finalized blocks
    let mut blocks_sub = api.blocks().subscribe_finalized().await?;
	
	// We keep the program running and everytime we get a block
	// we print something out
    while let Some(_) = blocks_sub.next().await {
        println!("New block created! ✨");
    }

    Ok(())
}

And run cargo run again. You should see this at regular intervals:

New block created! ✨
New block created! ✨
New block created! ✨
New block created! ✨

What you’ll see isn’t very exciting right, we are only informed about the creation of a new block but we’re missing some information about the blocks themselves. We can edit the above code to get the data for a block. Replace the contents of src/main.rs with this:

use subxt::{OnlineClient, PolkadotConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api = OnlineClient::<PolkadotConfig>::from_url("wss://rpc.polkadot.io:443").await?;
    let mut blocks_sub = api.blocks().subscribe_finalized().await?;
    
    while let Some(block) = blocks_sub.next().await {
	    // instead of dropping the block (by using _) we use it!
        let new_block = block?; 
        let block_number = new_block.header().number;
        
        println!("New block #{block_number} created! ✨");
    }

    Ok(())
}

The output should be the following, roughly:

New block #20172722 created! ✨
New block #20172723 created! ✨
New block #20172724 created! ✨

At this point you might have noticed that even though the RPC (wss://rpc.polkadot.io:443) is essentially “something” that returns some data, we didn’t have to write any specific indexing code to get that information. The heavy lifting is already being done for us, at least for the parameters on the block level (like the number of the block). There are a few more parameters that you can already read:

let new_block = block?; 
let block_number = new_block.header().number;
// new_block.header() has more parameters, try a few out :) 
// hint: .parent_hash

More data

Alright, now we have an example of code that gives us high level block data (hash, number etc). Let’s go one level deeper and try to get events, extrinsics.

To have a bit more information on the data model and architecture of a block, I refer you to the Polkadot Wiki which has a great guide here, but for the sake of simplicity, we can say that a block is made of Extrinsics, Events. Extrinsics are stored as an array of the functions and their parameters. Similarly, Events represent information/results/artifacts from the execution of these Extrinsics. An Extrinsic can produce many Events.

With that said, how does that look in the code?

For extrinsics :

// Get all the extrinsics for a block
let extrinsics = new_block.extrinsics().await?;

// Get all the events for a block
let events = new_block.events().await?;

The full code:

use subxt::{OnlineClient, PolkadotConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api = OnlineClient::<PolkadotConfig>::from_url("wss://rpc.polkadot.io:443").await?;
    let mut blocks_sub = api.blocks().subscribe_finalized().await?;
    
    while let Some(block) = blocks_sub.next().await {
        let new_block = block?; 
        let block_number = new_block.header().number;
        
        println!("New block #{block_number} created! ✨");

        let extrinsics = new_block.extrinsics().await?;

        let events = new_block.events().await?;

        for extrinsic in extrinsics.iter() {
            match extrinsic {
                Ok(extrinsic_details) => {
                    let extrinsic_index = extrinsic_details.index();
                    let extrinsic_name = extrinsic_details.pallet_name()?;
                    println!("Extrinsics: {extrinsic_name} #{extrinsic_index}");
                },
                Err(e) => {
                    println!("Encountered an error: {}", e);
                },
            }
        }

        for event in events.iter() {
            match event {
                Ok(event) => {
                    let event_index = event.pallet_index();
                    let event_name = event.pallet_name();
                    println!("Event: {event_name} #{event_index}");
                },
                Err(e) => {
                    println!("Encountered an error: {}", e);
                },
            }
        }
    }

    Ok(())
}

Which will result in something like this:

New block #20584547 created! ✨
Extrinsics: Timestamp #0
Extrinsics: ParaInherent #1
Extrinsics: NominationPools #2
Event: Treasury #19
Event: System #0
Event: ParaInclusion #53
Event: ParaInclusion #53
... truncated ... 
Event: ParaInclusion #53
Event: System #0
Event: Balances #5
Event: Balances #5
Event: NominationPools #39
Event: Balances #5
Event: Balances #5
Event: VoterList #37
Event: Staking #7
Event: NominationPools #39
Event: Balances #5
Event: Treasury #19
Event: Balances #5
Event: TransactionPayment #32
Event: System #0

We’re getting somewhere. Now we can see Events, Extrinsics, pallet names and more. I’ve said previously that there is a certain mapping between Extrinsics and Events, which makes it possible to rewrite the code a little bit, as we want to know which Extrinsic fired which event.

Subxt conveniently makes the Events also accessible within an Extrinsic, meaning you can do this:

use subxt::{OnlineClient, PolkadotConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api = OnlineClient::<PolkadotConfig>::from_url("wss://rpc.polkadot.io:443").await?;
    let mut blocks_sub = api.blocks().subscribe_finalized().await?;
    
    while let Some(block) = blocks_sub.next().await {
        let new_block = block?; 
        let block_number = new_block.header().number;
        
        println!("New block #{block_number} created! ✨");

        let extrinsics = new_block.extrinsics().await?;

        for extrinsic in extrinsics.iter() {
            match extrinsic {
                Ok(extrinsic_details) => {
                    let idx = extrinsic_details.index();
                    // here we get all the events for the current extrinsic we're looping over :D 
                    let events = extrinsic_details.events().await?;

                    println!("    Extrinsic #{idx}:");
                    println!("      Events:");
                
                    for evt in events.iter() {
                        let evt = evt?;
                
                        let pallet_name = evt.pallet_name();
                        // I've used variant_name here, can you figure out what it is? 
                        let event_name = evt.variant_name();
                
                        println!("        {pallet_name}_{event_name}");
                    }
                },
                Err(e) => {
                    println!("Encountered an error: {}", e);
                },
            }
        }
    }

    Ok(())
}

And the result looks like this, much easier to parse visually:

New block #20584603 created! ✨
    Extrinsic #0:
      Events:
        System_ExtrinsicSuccess
    Extrinsic #1:
      Events:
        ParaInclusion_CandidateIncluded
        .. truncated ... 
        ParaInclusion_CandidateBacked
        ParaInclusion_CandidateBacked
        System_ExtrinsicSuccess
    Extrinsic #2:
      Events:
        Balances_Withdraw
        System_NewAccount
        Balances_Endowed
        Balances_Transfer
        Balances_Deposit
        Treasury_Deposit
        Balances_Deposit
        TransactionPayment_TransactionFeePaid
        System_ExtrinsicSuccess

Combining both methods above, you can start understanding which Extrinsic caused which Event and understand how actions translate into data and how this data ends up stored.

This is now all fine and dandy but how about getting the actual parameters and values for these Extrinsics / Events? It’s one thing knowing a transfer did happen, but it’s much more interesting to know who sent what to whom. For this, we need to get the values, change the code as follows:

for evt in events.iter() {
    let evt = evt?;
  
    let pallet_name = evt.pallet_name();
    let event_name = evt.variant_name();
    // Get the field values for the event 
    let values = evt.field_values();
  
    println!("        {pallet_name}_{event_name}");
    println!("{:?}", values);
}

Unfortunately when you run this, you’ll get the following (truncated because it’s a lot):

     ParaInclusion_CandidateBacked
Ok(Unnamed([Value { value: Composite(Named([("descriptor", Value { value: 
Composite(Named([("para_id", Value { value: Composite(Unnamed([Value { value: 
Primitive(U128(3344)), context: 4 }])), context: 174 }), ("relay_parent", Value { value: 
Composite(Unnamed([Value { value: Composite(Unnamed([Value { value: Primitive(U128(51)), context: 2 
}, Value { value: Primitive(U128(96)), context: 2 }, Value { value: Primitive(U128(73)), context: 2 }, Value { 
value: Primitive(U128(166)), context: 2 }, Value { value: Primitive(U128(72)), context: 2 }, Value { valu

This is as far as we can go here without talking about a key concept in Polkadot based chain, called “Metadata”. As Polkadot based chains can be upgraded seamlessly, the data types and underlying functions can change over time. To keep track of how to decode these changes, you need to download a thing called “Metadata” that contains the information about the functions, their expected inputs/types and expected outputs/types.

Subxt needs this information and I’ll show you how to get it.

Decoding chain data

In order to properly decode the events into what Rust calls “structs”, directly from the chain data you need to get the metadata of the chain. There are a few ways of doing this, the Subxt maintainers recommend the following:

// in the root of your current project, ie where you run "cargo run"
cargo install subxt-cli
subxt metadata --url="wss://rpc.polkadot.io:443" -f bytes > polkadot_metadata.scale

and add the following at the top of your src/main.rs file:

use subxt::{OnlineClient, PolkadotConfig};

#[subxt::subxt(runtime_metadata_path = "polkadot_metadata.scale")]
pub mod polkadot {}

... rest of the code

This gives you access to the polkadot metadata and all the associated types, that you can refer to in your code. The polkadot object contains all the necessary information to parse the event data. It does this using the metadata from the chain. Now even if the chain is updated, we know the events will be decoded properly.

Concretely speaking, this is how you’ll adapt the code:

use subxt::{OnlineClient, PolkadotConfig};

#[subxt::subxt(runtime_metadata_path = "polkadot_metadata.scale")]
pub mod polkadot {}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let api = OnlineClient::<PolkadotConfig>::from_url("wss://rpc.polkadot.io:443").await?;
    let mut blocks_sub = api.blocks().subscribe_finalized().await?;
    
    while let Some(block) = blocks_sub.next().await {
        let new_block = block?; 
        let block_number = new_block.header().number;
        
        println!("New block #{block_number} created! ✨");

        let extrinsics = new_block.extrinsics().await?;

        for extrinsic in extrinsics.iter() {
            match extrinsic {
                Ok(extrinsic_details) => {
                    let events = extrinsic_details.events().await?;
                
                    for evt in events.iter() {
                        // try to parse the current event into a Transfer Event
                        let parsed_transfer = evt?.as_event::<polkadot::balances::events::Transfer>()?;
                        
                        // check if we have some valid transfer 
                        match parsed_transfer {
                            Some(transfer) => println!("
                                {:?} transfered {:?} to {:?}",  transfer.from.to_string(), transfer.amount, transfer.to.to_string()),
                            _ => println!("No transfer events in this block") 
                        }
                    }
                },
                Err(e) => {
                    println!("Encountered an error: {}", e);
                },
            }
        }
    }

    Ok(())
}

Letting the code run for a while you’ll see this:

New block #20584829 created! ✨

                                "5HZJSop1HGenRkzi8Z2aTDN78ZUP5ucRHTDv77JjjHuTyy5T" transfered 19006618000 to "5CC9SLbSU6UDJR6hyg2BgNznysq3ougTv5XE9cUkqjB23h9R"
New block #20584830 created! ✨

                                "5GxjDgZmf4odT7QimmQNwCDjtPSjiRgfPfkcZxgFr7mynidY" transfered 1671018508000 to "5HMo5swE5ULUpG8yqEmLNTFPnhDuF5qq4r55KDc1GQFR7iJP"

                                "5DWMgEDbQv1YD2QewneV9nvsYh51Rmars25GzyMScFRhAxXt" transfered 124916764712 to "5F32MxdjmQ6vPxRVew8edmaS5mtbzWpzmpCzNohiXeXbYQck"

                                "5CwnR3cHszkDg48uXU4waYFQcB1fxWMmTvXYMZHtw6opshv1" transfered 10041892221 to "5CEsaZ9fEBPNURDTMw81kgZtMPkQEKtLGYxA7Nc5fhtbwUiw"

This decoding can be done independently, using for example the scale codec library directly, but subxt does most of the heavy lifting there and we can confidently use it to decode all the events and extrinsics data from the chain. And this is absolutely essential for us data engineers to ensure precision and correctness of the data, as Rust will help enforcing the types and there won’t be any surprises ending up in BigQuery, for example :wink:

So now we can also leverage subxt to get data from the chains (through blocks or storage functions) and doing so in a very performant way since by leveraging Rust, we won’t have to pay any performance tax for interacting and processing on the data returned by the RPC (looping over all the accounts? easy peasy).

There is a lot more to this (and more data you can extract), but this example should get you started fetching Polkadot data in real time, using Subxt! :smiley:

Let me know in case any of this doesn’t work, the goal of making this is that I couldn’t find a guide available yet. Additionally, I hope for it to rank on #1 on Google for “subxt polkadot data”… Will it work?

Note: In case the formatting is a bit bad or confusing, I’m happy to port it to another place like GitHub (or a very long X thread :crazy_face: )

What’s next

This was just a first exploration of what is possible to do today using subxt in Rust. Of course, subxt can do much much more than this since it is intended for submitting transactions to the chains, not only read data. But for our use case it’s perfect.

Next we can explore how we can use this to develop maintainable and scalable (and extremely cost effective) services for real time data ingest, adding logging, monitoring and recovery when things go wrong (connection drops, timeouts etc).

18 Likes

super valuable!! thanks so much for putting this together!

1 Like

Thank you @bader I am very glad you like it. I’ve also gotten confirmation that the guide actually works so I’m relieved :stuck_out_tongue:


A small update as I didn’t think things would pick up this fast already, but my hypothesis turned out true, this article is now ranking on rank 1 in Google for “polkadot data subxt”:

And more interestingly, it ranks on spot 1 for a better intent “polkadot data rust”

Takeaway & Next steps

Using the forum to rank on key educational topics should be encouraged (not abused), as it has a pretty strong domain authority.

Now, what will it take for things like “Polkadot data” to point developers to the right resources? Right now, the first hit for “Polkadot data” is a business unrelated to the Polkadot ecosystem and this bugs me a bit.

I think more experienced people in SEO could chime in with ideas for what we can do to improve developer experience and educational material but the solutions from my point of view might be simple.

A potential goal could be gathering and identifying a list of Keywords/Intents for anything really related to Polkadot or the top 25 things new developers would want to build on Polkadot and have them rank extremely well for high quality guides/tutorials/resources. Ideally people could be compensated to make sure these resources are up to date and always work without any unnecessary indirection or implied knowledge.

If you want to help out and get something going here, even if minimal please ping me and I’m sure we’ll do something nice. Meanwhile, have a lovely rest of the week :hugs:

3 Likes

Not a big thing, but your search results don’t match mine:

1 Like

Ah that’s a bummer, but great feedback thank you for double checking. Perhaps it will pick up in a little bit depending on region (or other factors)?

Also, can you please try the query with “subxt” (without the t between b and x)? My hunch would be that the Google search algorithm might try to match “subtxt” with “subtext”.

1 Like

well this is embarrassing–you’re completely right, and my typo wasted your time. Sorry Karim!

1 Like

No worries at all, this didn’t waste my time. Happens to the best of us :hugs:

1 Like

Karim,

Thanks so much for putting this together! I found it super useful as it happens to do exactly what I am trying to learn to do at the moment.

Just as a minor detail, when I ran your last example, it would repeat the “No transfer events in this block” line, seemingly for each non-transfer event in the block, which quickly became really noisy. I replaced that line with just this: _ => () which got rid of the extra lines and made the output easier to read.

Thanks again!

1 Like

Thank you for the very useful tutorial, Karim. It would be great to see a similar one focused on creating dynamic transactions or managing data storage. I’ve found only a few examples in the subxt repo on how to do it: subxt/subxt/examples/storage_fetch_dynamic.rs at ddb5d4c9d7de07f9d02a832c4264b1a32e39eaf4 · paritytech/subxt · GitHub and subxt/subxt/examples/tx_boxed.rs at ddb5d4c9d7de07f9d02a832c4264b1a32e39eaf4 · paritytech/subxt · GitHub.

3 Likes