Cross-Consensus Query Language (XCQ)

XCM enables a way to interact with a chain (write operation) but we still need another subsystem to query the information (read operation). A previous discussion can be found here: XCM as a Standard for Reading And Interacting with Parachains - #19 by gavofyork

I would like to kick off the design and development of XCQ and hence this post.

Design Draft

Key highlights

  • Use PolkaVM for execution
    • So that we can reuse the tooling for PolkaVM
  • Extension based design
    • To make it extensible and flexible
    • New functionality can be added without modification of the XCQ core protocol
    • Ensure the minimal scope of the core protocol
  • Hash based extension ID
    • No need to predefine Extension ID, which will require a global registry and could be a blocker
    • More decentralized as anyone can just use whatever extension they want
    • We can still have a place to gather all known extension and have a global registry for discussion and discovery purpose. But this won’t be a blocker for new extensions.
  • Generic
    • With a built-in meta type system, we don’t limit how chains defines data structure. e.g. The AssetId can be numeric, or XCM Location, or an enum.

Request for comments

I would like to reach out to the ecosystem to validate my design draft and ensure it covers all the use cases and can be adapted by all the teams.

Here are some specific questions:

  • Are there any additional key use cases that should be included? So that we can ensure those use cases are supported.
  • PolkaVM vs custom DSL:
    • PolkaVM is currently chosen but I would like additional confirmation that it is the best option and it indeed can fulfil all the requirements
    • Program Size: A custom DSL can be very size efficient and that’s something we want for onchain message passing as we have limited bandwidth for HRMP / XCMP. Can we ensure that the PolkaVM program size are reasonably small that it will not be too costly to send them with HRMP / XCMP?
    • Program generation: It can be relatively easy to generate a program with custom DSL. This is not the case for PolkaVM. While it is possible to generate the assembly code manually or via some helper utility, it is always going to be more involved compare to DSL. We obviously cannot include a compiler in the runtime and therefore some additional work may be required. On the other hand, maybe we can do something to avoid such requirement? i.e. Never compile program onchain, but instead use pre-built template program.
  • Extension based design and Hash based Extension ID:
    • While I think it is the best option, I still would like to ensure all the alternatives are sufficiently explored before making a final decision
  • Meta type system
    • It adds some complexity but I think it is necessary to deal with such diversified use cases across all the chains
    • Do we want a custom one or reuse type-info? To reuse type-info, we will need to ensure it meets all the requirements such as encode efficiency

Repo

The research, PoC, and development will happen here:

8 Likes

This looks great - also, in terms of simplicity I’m very much in favor of having basic entry points like execute_program that then can call into different “extensions” of the runtime, rather than trying to bake them all into the language itself. This does effectively make XCQ a turing-complete-with-gas smart contracting system, which is something I think all chains can benefit from.

2 Likes

If the plan is to use RISC-V then the program sizes will be small. The question is whether or not endianness is going to matter much (EVM is big endian)

It will depend on some implementation details. For example, ink! wasm contract was too big for it to be used by parachains initially and a lot of work was done to reduce the size. I want to ensure we take the lesson and build this with program size in mind in the first place.

But yeah I see no reason why it can’t be small. Just matter of how much of extra work we may need to do to optimize it.

1 Like

Very valuable use case to support DOT => USDT + USDC conversion

Get the DOT/USDT + DOT/USDC rates from AssetHub so that this adhoc “10” here is not hardcoded Set Asset Rates for Treasury Assets

I think looking through the benchmarks for PolkaVM would be a good place to compare binary sizes.

1 Like

I guess we could build some onchain asset metadata mechanism and have XCQ be able to query token info from there.

1 Like

Great initiative! One that I’ve been taking a look into myself, but bigger priorities came up :sweat_smile:.

These extensions remind me of dialects.

XCM does not have dialects (though it allows arbitrary execution via the Transact escape hatch) and the XCVM is not turing-complete. That is by design.
We want XCM as a way to standardize things we do in Polkadot, or in general in consensus systems.
I’d imagine XCQ to have the same goals in mind.

That said, I’d use a DSL for XCQ, with the runtime API entrypoint you mentioned. I don’t think we need the PolkaVM. Then, new queries can be added via an RFC process like the one for XCM.

Then, both XCQ and XCM should be managed by either an ecosystem collective or an XC collective, to make sure we all agree on the standards.

The main idea of how new features are added is that a subset of users in the ecosystem will experiment on a new subset (so as to differentiate it from dialect) of features using Transact and then create an RFC to make sure it benefits the whole ecosystem.
We can create something similar for XCQ, have an escape hatch (QueryRaw?) that they can use to experiment on a new query type. Then, when that’s widely used or they think it will benefit the whole ecosystem, they put up an RFC to include it in the standard.

I think also relevant to note is that not every chain has to support every possible XCM instruction/XCQ query, but senders can send anything to anyone.
I’d want to see a way of exposing which instructions/queries your chain supports.

There’s power in having a standard, wallets can know that if they support XCM/XCQ then they’ll be able to do the same operations/queries on each chain that implements them.
A common pain point in the ecosystem is that everyone does things differently, we should focus on standardizing more.
We are already flexible enough, if you want to expose something custom (an extension) you can always create a runtime API.

Then, new queries can be added via an RFC process like the one for XCM

This is exactly something I want to avoid. The XCM RFC process takes forever and is blocking innovations.

Then, both XCQ and XCM should be managed by either an ecosystem collective or an XC collective, to make sure we all agree on the standards.

From the lesson I had in past few years, I figured it is very hard to agree on a standard without some maybe not-standard-confirming implementation been used on the wild first. It is often impossible to make the right decision without backing of real life usage data.

The main idea of how new features are added is that a subset of users in the ecosystem will experiment on a new subset (so as to differentiate it from dialect) of features using Transact and then create an RFC to make sure it benefits the whole ecosystem.

I don’t see it happening so I want to do a different approach.

I think also relevant to note is that not every chain has to support every possible XCM instruction/XCQ query, but senders can send anything to anyone.
I’d want to see a way of exposing which instructions/queries your chain supports.

That’s exactly the extension system solves. And I have feature discovery detailed since the very first draft.

2 Likes

I have the very first PoC running

I have a simple PolkaVM program calling into host and do some simple calculation

The hex of the this PolkaVM program is:

50564d0001010400009000040e0100000000686f73745f63616c6c05070100046d61696e061500001002110703104e02775401100211081300694a00

I have a runtime implements the XCQ Runtime API

It implements the host function and execute the provided XCQ program and return the result

9 Likes

If the XCM RFC process takes forever then we should make it go faster :grin:.
The polkadot-sdk release process also takes forever, we just need to make things faster.

I agree the maybe-not-standard needs to live in the wild before we can actually standardize things.

I think trying out a different approach is good. I just don’t think the extensions is the whole reason we should do XCQ. If there’s a standard set of queries users can do, and there’s a process for extensions to become a part of them, then I think that’d be the way forward.

My argument is basically the same for dialects.

How small exactly are we talking about?

In general PolkaVM is optimized first and foremost for execution performance and compilation speed, and keeping the programs small is only a secondary goal. This isn’t to say that we don’t care about being compact - we certainly do (and I’m already using a bunch of tricks to keep the programs as small as I can), but it’s not the top priority.

For reference, last time I checked these were the numbers for a full blown Substrate runtime when compiled to WASM and to PolkaVM (don’t remember which exact runtime it was, possibly the Rococo runtime):

  • WASM: 625505 bytes
  • PolkaVM: 550476 bytes
  • WASM after wasm-opt -Oz: 536852 bytes

So as you can see out-of-box we should be mostly competitive with WASM.

For extremely tiny programs some further optimizations could be done by e.g. repacking the VM bytecode in a custom container and maybe compressing it with something like zstd while having a hardcoded dictionary (so it wouldn’t have to be transmitted), etc.

If we want Turing-completeness and solid gas metering and flexibility then I think PolkaVM could be a good choice.

If we don’t want/need Turing-completeness then there are probably better choices.

Well, for what is worth the PolkaVM assembly is very simple, and could even be written manually relatively easily. But it is, as every assembly language, very low level, and maps directly to what the CPU executes. So if you’d want to map high level concepts to it you’d definitely need some sort of a compiler, be it either rustc with a helper library or a custom DSL-like description that’d be compiled directly to PolkaVM assembly.

I and many others have tried to propose a token standard for Polkadot but without much success. No one knows what a Polkadot standard is going to look like. With XCQ, we can define such standard as a XCQ extension.
We can define a standard based on runtime API but it doesn’t work for onchain consumption so only solve half of the problems.

XCM dialects could be an alternative solution but I don’t think anyone is working on it so it is not really an option. Also again, XCM RFC process takes forever and there is a good chance a version XCQ can be deployed before we complete the XCM RFC process for XCM dialects…

1 Like

The reason I am concerned for program size is that ink! contracts were too big for it to be usable in parachain context. I want to make sure we don’t hit a similar blocker in future. But from what we have so far, it seems like we can keep a simple query program to be less than 200 bytes. Not tiny, but acceptable. Also good to know that we can do extra optimizations in future to reduce it further if needed.

For the program generation, I still need to do some case study to see if it is an absolute requirement, and then some PoC to see how feasible to do it. I can already think of a few potential solutions so hopefully it won’t become a blocker.

Yeah, so at these sizes a custom container would definitely make sense, since out of box the default container has something like ~60 bytes of constant overhead. And by a “custom container” I mean something like this:

struct Blob {
    version: u8,
    code_section: [u8; N],
}

The polkavm crate doesn’t currently support creating a ProgramBlob from a custom container like this, but it will very soon since PolkaJam also needs this functionality.

3 Likes

With the newest PolkaVM master it should now be possible to have a custom container for programs (although it might still be a little bit janky, and this isn’t necessarily the complete final API).

There’s a new struct called ProgramParts which contains a partially deserialized PolkaVM program split into parts. So what you can do is to link a PolkaVM program with polkavm-linker as you’d normally do, and then call ProgramParts::from_bytes to split it into parts, and then only save those parts which are relevant to you. Then you can use ProgramBlob::from_parts to load it back up for execution, potentially leaving some of the fields empty and/or just hardcode them.

But as I said, this is not final and I will still be making improvements here. For example, ideally you’d most likely want to hardcode imports and statically assign each possible host function a static number, but currently the linker doesn’t yet give you a way to force-assign these. You’d also want to merge ro_data into rw_data as at these sizes there’s not much point in splitting them up (this also requires linker’s cooperation). I’ll let you know once I’ll make further improvements here.

2 Likes