This thread is a discussion / pre-RFC topic focusing on the Core primitive and where it causes friction when looking into the future of Coretime allocation.
Cores are an artifact of Polkadot’s architecture. These goals lead us to use cores:
Loosely:
- Rate-limiting workload as a result of new inputs to data availability and approval checking
- Decorrelating stake behind candidates
- Amortizing costs of eager validity proving
- Limiting each fork of the relay chain to a single fork of a state machine
- Providing predictable accesses to resources to parachains scheduled upon Polkadot
However, the current design of cores at the scheduling layer had only two use-cases in mind:
- One parachain per core
- Cores reserved for sporadic on-demand workloads
The new paradigm covers the entire coretime demand distribution, from tiny consumers to enormous ones. Attempts at improving this, such as Coretime Scheduling Regions by rphmeier · Pull Request #3 · polkadot-fellows/RFCs · GitHub and Idea: Core Groups · Issue #7441 · paritytech/polkadot · GitHub have exposed enormous hidden complexity within previous assumptions. Here I break down what cores do for us, in order to propose a necessary set of changes for the future.
What is the current definition of a Core?
A core is a numbered container within the relay chain’s state. This container is either empty or full. When empty, a new candidate may be placed inside it by the relay chain block author. It stays full until that candidate’s data is available or a timeout is reached. The process repeats. Behind the scenes, validators use the core index to determine which candidates to validate in approval checking.
We currently expect that validators are partitioned into backing groups, where each backing group is responsible for a single core, and the candidates backed by that group may only go into this core. This decision was made when we assumed 1 parachain per core, not many parachains per core or many cores per parachain, or regular updates to this scheduling.
Elaborating on desired properties
(1) is related to availability and approval-checking and the fact that candidates are eagerly checked before finality. Overburdening validators with work would lead to potentially unbounded finality delays, which is undesirable. Therefore, the total rate of candidates must be bounded.
Formulation: the total data and execution burden on the network must stay below a set bound, and the individual validators’ share of these burdens should be equal in expectation
(2) is about Polkadot holding nodes accountable for attesting to invalid candidates. If a node expects to be slashed, the marginal cost of submitting additional invalid candidates is zero. That is, it shouldn’t be the same 1 or 2 nodes introducing all candidates to the system. (Gambler’s ruin - we don’t want nodes taking e.g. 500 attempts of getting bad candidates approved for the price of 1).
Formulation: the tokens backing each individual state transition pending finality should be sufficiently decorrelated (we can actually remove this, assuming disablement, according to @burdges)
(3) Each state transition carries some constant overheads, for example in erasure encoding, fetching, decoding, availability attestations, and in particular during approval checking. Amortizing these costs by bundling data packets and validity checking responsibilities for many small state transitions is quite valuable.
Formulation: the constant costs of data availability and state transition verification should be possible to amortize in order to accommodate small state transitions
(4) By limiting each fork of the relay chain to a single fork of each parachain, and utilizing the fork-choice rule of the network to reorganize away from bad candidates, we can process and execute candidates (sending messages & updating head) optimistically prior to finality. This can be done as soon as data availability is concluded, as at this point approval-checking and disputes are possible.
Formulation: any fork of the relay chain must execute at most one fork of each parachain, and should do so immediately after data availability
(5) Scaling resources must be complemented by effective allocation of those resources. Granular markets for data and execution independently is the holy grail here.
Formulation: parachains must be able to reserve data and execution resources with a bounded time horizon and reliably consume them at an expected rate
Grading each of these properties on a scale from 1-5:
(1): 5/5: Cores are excellent for rate-limiting data and execution burdens
(2): 5/5: Cores are excellent for decorrelating stake behind state transitions, by limiting the candidates that specific validators may back.
(3): 3/5: Cores are reasonably good at amortizing data and availability with bundling. This isn’t a 5/5 as arbitrary bundles aren’t supported, only bundles for state machines pre-scheduled on the core.
(4): 5/5: Cores fulfill the goal of fast execution of candidates immediately after availability
(5): 2/5: Cores function poorly for scheduling data and compute resources. Parachains on cores experience friction with other parachains on the same core pending availability, and parachains seem to need to have some affinity to specific cores, at least while maintaining goal (2). These problems get even worse when a parachain attempts to consume resources from multiple cores simultaneously. Furthermore, cores assume a specific ratio of data & execution time rather than being informed by market supply & demand.
Based on this rubric, it’d seem that cores are, on balance, a good solution - however, goals (3) and (5) are highly significant to the value proposition of Polkadot, in providing coretime to both large consumers and small consumers. In reality, the current architecture makes sure that the system doesn’t use too many resources, but it doesn’t make good use of the system’s resources. It’s natural that our architecture has run into issues here, as it was built with the assumption that 1 parachain = 1 core (i.e. addressing only a thin slice of the blockspace demand distribution).
Solution Sketch
A potential solution: change the algorithm for determining “backing validators” and remove all core affinity for particular validators’ backed candidates. Push the responsibility of putting backed candidates into cores onto the relay chain block author itself.
Note that the key issues with solving (3) and (5) stem from addressing the entire coretime distribution:
- The long tail of low-coretime parachains. This maps roughly onto the bootstrapping or tinkering audiences which need low volumes of coretime. They may require occasional large candidates or very regular small ones.
- The fat tail of high-coretime parachains. This maps onto an audience of coretime consumers which are capable of utilizing, in practice, as much coretime as they can acquire. They require regular large candidates at a frequency higher than the relay-chain itself, e.g. 4 or 8 entire cores.
- Coretime currently comprises two resources (data availability and execution time) expressed in an arbitrary ratio. The actual demand distributions and supply capabilities of the network may not match this ratio and leads to coretime in its current instantiation being an imperfect good.
The frictions we experience in (3) and (5) map onto scheduling tools imagined for those user groups:
- Bundling: When many small state transitions are backed by the same validators, they can be bundled and amortized by placing them all into the same core simultaneously. Data and execution burdens are combined.
- Bundling achieves better results when bundles can use “spillover” resources from other free cores and are not tethered to any specific core.
- The bundling problem: Backing many small state transitions is only viable when there is significant overlap in their backers, for the purpose of bundling and amortizing their costs through availability and approvals
- Multi-core parachains: Parachains utilizing multiple cores benefit from minimal ambiguity over backers and core affinity for particular state transitions.
- The current definition of backing groups and core affinity requirements lead to huge scheduling inefficiencies
- The elastic scaling problem: When attempting to back a sequence of sequential candidates for a single parachain in parallel, core affinity for the parachain or an excessive amount of potential backers causes significant friction
Setting this down concretely implies that the definition of backing group and the implicit core affinity stemming from it are the root of the problem. We are bringing in too many backers for multi-core parachains, introducing coordination overhead. Core affinity for particular backers reduces resource utilization in general. Small state transitions benefit from sharing backers.
Sketch of the solution:
- backers should be assigned to parachains, not cores. backers may be assigned to multiple parachains simultaneously.
- backers’ assignments are shuffled regularly and deterministically to avoid liveness issues from offline or censoring backers (just as today groups rotate across cores)
- any candidate (or bundle) can go in any core as long as they are backed by enough backers from the parachains’ group. The relay-chain block author makes a selection based on what they have received from gossip and is rewarded for packing cores optimally.
- large parachains should have dedicated backers, and small parachains should share their backers with other small parachains. ideally this property emerges as a continuous one with some nice algorithm for assigning duties
- some backing groups will be responsible for many small parachains. This is for the purpose of bundling (handling long-tail low-coretime parachains gracefully)
- some backing groups may be responsible for a single parachain, and the amount of backers in that group would be the minimum amount required to handle the amount of cores needed for that parachain (handling fat-tail high-coretime parachains gracefully)
With an algorithm that assigns backers to parachains more intelligently and adaptively we get much closer to fully solving (3) and (5). It’s not perfect for (5) as Polkadot still sells fixed ratios of data and execution as coretime. However, the decoupling of backing groups from core indices is a useful step in the direction of granular markets here. i.e. we have no hard requirement that a “core” even corresponds to a particular amount of resources anymore.