Unifying bulk and mid-term blockspace with strided regions

Here’s some more in-depth design about blockspace regions.

Solving these problems:

  • Parachains sharing access to cores without depending on the others to continue to make blocks
  • Parachains making up for “missed” blocks (via asynchronous backing)
  • Parachains being assigned to multiple cores at the same time (elastic scaling)

On-chain Buffering of Blocks Pending Inclusion

The main insight and modification that allows this design is that there is no detriment to allowing more parachain candidates to be backed in a single block than there are cores. This has two caveats. First, there should be limits on how many blocks individual parachains can have backed. Second, there should be limits on how many parachain blocks can be backed in a single relay chain block. When blocks from a parachain are missed, the chain should have the opportunity to make up for the missed block later. But this has to be limited - if we didn’t get any parachain blocks until the last possible moment, we wouldn’t handle a single relay chain block with hundreds of thousands of candidates very well. But going to some multiple like 1.5x of the number of cores is fine. Over the course of a session, the missed blocks will balance out.

Approval-checking requires that at most one block is included for every execution core. This implies that when handling “missed” blocks and accepting multiple blocks per core, we need additional cores to handle the overflow.

Regions

Each core will have some region creation mechanism. This mechanism is responsible for not creating more regions than the core can handle, i.e. never allocating regions such that the core would need to handle more than one parachain block per 6 seconds. Parachain slot auctions or the on-demand parachain orderbook are examples of region allocation mechanisms.

Every region has a frequency and a TTL, which can be used to calculate the capacity of the region in blocks. e.g. a region with a frequency of 2 is expected to be able to accept a block every 2 relay chain blocks. If that region had a TTL of 10 blocks, its capacity would be 5. Regions carry an explicit capacity, and the actual capacity of the region is the minimum of the possible capacity and the explicit capacity.

Each region is assigned to a specific core and cannot move across cores. Regions can be split in two ways: contiguous splits and frequency splits. A contiguous split produces regions with the same frequency of blocks but nonoverlapping lifetimes. A frequency split produces regions with the same lifespan, but different frequencies, such that the total capacity of the split is less than or equal to the capacity of the region initially.

Frequency splits are what I referred to as “strided” splits previously.

Regions would be something like this:

struct Region<T: Metadata> {
    owner: ParaId,
    core: CoreIndex,
    start: BlockNumber,
    ttl: BlockNumber, // includes the start block

    produced: u32,
    capacity: u32,
    frequency: BlockNumber,

    metadata: Option<T>, // generic metadata for attaching more constraints onto regions
}

The expected amount of blocks per parachain is (now - start) / frequency. A region is in surplus if expected > produced. A block can only be backed when a region is in surplus, and then only according to the region ordering rules defined below. A region expires at start + ttl, regardless of how many blocks were produced. When a region expires, no more blocks can be backed. The threshold of a region is start + (produced * frequency).

Every parachain can have up to some number of regions assigned to it, and these regions may exist on different cores.

Regions and Scheduling

There is an ordering function for regions we can use to determine which region the next parachain block goes to. This ordering function sorts, filters, and repeats all the regions assigned to a parachain. In practice, it should be calculated lazily.

The ordering changes only when regions expire, blocks are backed, or the relay-chain block number advances. A block for a parachain can be backed only when the first region in the ordering is in surplus. It is legal to back i blocks, when there are i sequential regions in the ordering (although the regions may be the same, due to the fact that the ordering can repeat regions).

The ordering function is implemented like this: it repeatedly orders the regions assigned to the parachain by threshold, with produced incrementing by 1 each time they appear in the ordering, and regions are only included if threshold <= now and produced < capacity.

(note: in order to prevent parachains from not producing any blocks until the end and then saturating all relay-chain blocks with extra candidates, it may be desirable to set a reasonable maximum on how far behind capacity a region is effectively allowed to be. this could be done, for instance, by reducing the explicit capacity or increasing produced more rapidly when behind)

This ordering gives us a highly-desirable stability property: parachain blocks at a particular height in the parachain stay stably assigned to the same region and therefore the same core, even as the relay-chain block advances. New relay chain blocks put more regions into surplus, but those are still ordered after the regions which were already in surplus.

We don’t need to worry about ordering of regions per core, at least in the runtime, because:

  • Regions cannot move across cores
  • The initial regions assigned to a core never exceed the capacity of the core.
  • Regions never enter into “debt”, i.e. they can never produce more blocks than they are allowed to

This means that no parachain can crowd out another on the same core indefinitely, and can only crowd out other parachains to the extent that they are in surplus.

That said, regions only give stable parachain block frequencies in the long term, not in the short term. For that reason, we might also maintain a region mapping per core or create economic incentives for parachain block authors to prove that they have provided useful backed candidates.

Nodes will need a mapping from cores to upcoming regions, at least to get a sense of which parachains they should be working on as validators. This implies that we need runtime APIs for nodes to gather information about regions assigned to cores as well as regions assigned to parachains.

We should limit the amount of regions that are assignable to any particular parachain, as it increases the ordering overhead. Ordering is used in these places:

  • The ProvideInherent implementation to provide backed candidates to the chain such that they will all be accepted.
  • The node-side code of validators for determining which cores are assigned to which blocks (i.e. at which depth)
  • The node-side code of collators for the same
  • The block import logic (this is the only place calculating ordering is expensive)

The region metadata is left generic to support things like required collators.

Regions and on-demand parachains

While writing this, I realized that On-demand parachains can be implemented with this same regions approach. The on-demand orderbook is a region allocation mechanism, which creates and assigns regions with a single block capacity, a stride of 1, and an expiry time of a few blocks.

On-demand parachains could be implemented with a mechanism for creating and assigning regions with a single block capacity, stride ‘1’, metadata that limits the required collator, and an expiry time of a few blocks.

This also cleanly supports the “retries” problem for on-demand - because regions don’t require parachain blocks to be backed in a specific relay-chain block, the size of the region is the effective amount of “retries” the parachain gets. Asynchronous backing makes this clean.

It also allows for stacking on-demand orders, even across different cores and collators, just by giving them different start blocks.

1 Like