Deferred execution of XCMP messages

I would like to discuss this Proof-of-Concept MR by @jak-pan et al. from HydraDX.

The current behaviour is to process any incoming XCMP message ASAP wrt. weight limits and QoS.
That MR implements a way to defer the execution of incoming XCMP messages on the para-chains’ side. This allows to delay and/or rate-limit the execution of possibly harmful messages to protect the para-chain.

quote:

The main reasoning behind this is that we would like to protect Polkadot and Kusama users from toxic flow / XCM in the ecosystem without hindering the UX too much. By implementing a queue on XCM messages we can postpone processing of messages that come to our chain or are leaving the chain.
This filter is optional and completely customizable. If you don’t implement it it will just pass the message through.

Please check out the aforementioned link for proposed Example uses like Volume limits, Unsupported message queue.

Questions

Personally I dont know a lot of how these messages are normally used, or for what. So in order to find a suitable solution I would be pleased to get input from other builders on this issue.

  1. Do other para-chain teams have similar issues or already a solution in place?
  2. Only XCMP? The MR specifically addresses incoming XCMP messages. Is this to be extended for DMP as well?

Requirements

(to be extended)

Properties that a possible solution has to fulfill.

  1. A way to filter incoming messages and decide whether to:
  • Defer them for at least n blocks.
  • Stash them for manual inspection and processing.
  • Immediately process them (like it is currently the case).
  1. A priority-queue for deferred messages. Messages in this queue should execute not before their delay expired and FIFO for messages who’s delay results in the same target block number.
  2. A storage for stashed messages. Messages in this queue should only be executable from a privileged origin in random order.
  3. Usable by all para-chains: this depends on whether that is needed… please discuss.
  4. PoV economical since we are on a parachain :smile:

Possible Implementations

  1. Extending the MessageQueue pallet. Would be nice, since we can re-use that logic in the DMP/UMP pallet, if needed. But the MQ pallet is primarily designed as function as Round-Robin FIFOs… It does something like the 1. property for permanently overweight messages: they are stashed for future manual inspection. So we could probably extend that for other use-cases as well, but it should only happen rarely since otherwise it uses too much PoV.
  2. Scheduler pallet: Maybe this could be used for the 2. property to execute deferred messages in a FIFO manner. Have not looked whether that is really viable. Although there is an edge-case which prevents to scheduler from guaranteeing the eventual execution of a scheduled task. Not sure if this is an issue here.
  3. Some custom stashing and deferring logic similar to the PoC.
3 Likes

We have a WIP rate limitation implementation for orml-xtokens aims to rate limit outgoing messages impl rate limit for xtokens by wangjj9219 · Pull Request #854 · open-web3-stack/open-runtime-module-library · GitHub

4 Likes

I have been talking about it since last year ( [Transaction pause pallet · Issue #11626 · paritytech/substrate · GitHub](https://Github link)). We have implemented a similar idea in Polkadex, where there is a time delay for all transactions moving out of the Polkadex ecosystem.

1 Like

Ok so we are finishing up on our side. We need some polishing to allow manual execution and tweak the params. We want to add more stuff to the deferring logic (like global limit and reset button) but you can see how we want to generally use this.

TLDR; look at incoming XCM messages and rate-limit incoming token transfers if the value of incoming transactions is “over” the pre-set limit. The limit is global per asset and is set as rate per time period and it is linear.

This means there is a FIFO queue.

Example
Ratelimit: 10000 DOT tokens per 10 blocks
Alice sends 10000 DOT tokens from Acala to Basilisk tokens arrive straight away.
5 blocks pass
Bob sends 10000 DOT tokens from relaychain to Basilisk since the limit is 10000 and 5 blocks passed we are 5000 over the limit we defer the transaction by half the time 5 blocks.
Charlie immediately sends another 10000 DOT, his message is deffered by full amount + the time that would deplete the limit before so 10 + 5 blocks.

The limits should be high enough to not prevent normal usage but still discourage hackers to dump tokens into our ecosystem.

This is the defer logic impl

This is the updated cumulus change (to not spam the original PR until ready)

And here are some integrations tests

We would like to know WDYT?

2 Likes

There are a few features / challenges that we need to overcome.

  1. Determine monetary value of a XCM. I would image we want different treatment for message transfer of 1 DOT vs 1m DOT. So we need some oracle to determine monetary value for each asset. This doesn’t need to be accurate and doesn’t need to be constantly updated.
  2. Self-service fast track. For some use cases, such as lp, people will transfer DOT in, and put into dex pool, and that’s it. It will be great if we can make such use case completely bypass the rate limit somehow. So basically, user can lock $1000 worth of tokens to fast track a deferred message that XCM in $1000 worth of token. Combined with some clever design, we can achieve the scenario that user transfer 1m DOT, deposit into a dex pool, and have the lp token acted as deposit for the fast tracked message. If for some reason, the governance decided this message is from a malicious actor, the governance can still confiscate the lp token to compensate the victim.
  3. Visibility. We need to be able to provide great visibility for users before they XCM an asset. If they know the message will be deferred for 24 hours, they may choose not to do so. Or we can implement a return feature to allow people XCM asset back that’s in defer queue, but I would image the implementation could be relatively complicated. So the easier way is to build nice UI/UX to make sure all user understand exactly what’s about to happen when they XCM assets.
  1. This is up to implementor as it is configurable, but in the first version we have this limit set per asset in asset registry as number of units for each individual asset. It can be updated via governance / technical track or later some oracle provider as you said. The problem with oracle is that when something happens it’s not hard to manipulate it, so it should be pretty long oracle.

  2. This specific scenario would defeat the purpose, since if you want to LP with hacked token then you have infiltrated with toxic liquidity and then you can easily withdraw and circumvent the limit. But I agree there should be fast track or track that can circumvent this. However this is again up to implementor in this version we propose. For example → Acala wants to provide ACA liquidity to Omnipool, we know this will trigger the limit so we can temporarily allow Acala parachain root origin to circumvent the deferring logic, again in our own implementation. By default the deferring logic does nothing.

  3. Sure this is really up to frontend UX we plan to implement chart with fullness of channels to our X-chain UI and also provide this as SDK. The cancel TX will be nice upgrade we can think about this.

XCM is a scripting language. Attempting to distill a script (read: XCM message) in terms of a value is, if not strictly impossible, a fool’s mission. If you restricted the kind of XCM messages which your chain could process to something totally trivial then it’s solvable, but I would presume that is out of the question long term since you’d lose so much functionality.

We already have the overweight queue which I think can adequately cover eventuality number 3. We now have the concept of Yield for queues (usable through the barrier subsystem) which pretty much covers number 2. And number 1 is the standard behaviour of the new queuing system. I therefore expect we can achieve the stated vision of this thread with fairly minimal changes to the APIs.

Cool, It looks like the new queue system covers us in the future so we can discard this temporary implementation.

Can we do something to make sure it fits our requirement? It might be enough to just pass the message to custom queue with custom logic for releasing the messages if it doesn’t.

For certain not everything is done; in particular the overweight queue does not execute a message in it at random and there is no barrier return value to get a message placed in such a queue (this happens as part of the execution logic). However I’m not sure the random execution is strictly needed - as long as someone is paying for a message to be executed, I don’t see any problem with them being able to select which one it be. And it should be easy enough to add an API to allow a barrier to shunt a message into the overweight (read: “manual execution”) queue.

As for yielding, this is also not exactly what is asked for in this thread. As currently implemented, Yield functions on a transport origin, not on an individual message and basically says “go try and execute messages from other places before this one”. Actually yielding on a specific message would require new logic. I’d question how valuable it is to be able to yield on individual messages though to be honest. It’s also not clear what the exact semantics of a per-message yield is. It could be “postpone for 24 hours”, but then what if the chain is completely idle in those 24 hours - should it not be allowed to execute? What if there’s a lot of messages in 24 hours time - should it always execute?

One possibility would be to have multiple concepts of readiness. Right now we have basically two concepts of readiness in the new message queue: ready to try to execute autonomously; and overweight (cannot be executed autonomously but can be executed manually). The former is always in-order, the latter is generally out-of-order. There could perhaps be a third category in between them: messages temporarily suspended due to the exhaustion of some high-level resource (e.g. messages sent to some other chain, maximum value in Holding). We must be careful here not to make it dependent on analysing the effect of execution since we might otherwise end up doing unpaid-for computation which would open up a DoS vector, so the message would need to predeclare these needed resources ready for a barrier to read and the executor to police. We must also remember that operations expressible by XCM can also generally be expressible as a raw callable in Transact. Thus attempting to impose limits on high-level activity is fraught with loopholes and workarounds.

If we persevere with this plan, then barriers could then Yield not just the whole queue but also an individual message by skipping over it with the “immediate execution” pointer, but not with the “deferred execution” pointer. The question would then become when to begin executing the older, yielded messages from the deferred-execution pointer instead of messages in the immediate-execution part of the queue. Age (i.e. number of blocks passed since insertion or prior execution attempt) could be one way.

I don’t think the random ordering is necessary when we introduce all of the below. It is important when you first execute XCMs to prevent frontrunning but after we sort them out and decide some messages should not be executed they are already in randomized order. We can place them into the other queue and use this order.

Everything else then seems reasonable including the age part. We can then introduce some weight multipliers to these. i.e. the older the messages in the other queue are, the higher the priority will be to execute them.

The requirement of having a fifo queue with volume limits then seems solvable in slightly different way we did it here.

Just to be clear, we only care about the first deposit on the chain, not transact or anything else that can be chained afterwards. I think this was the rule (at least I think in our barrier it needs to be first instruction). The whole XCM should then be enqueued.