This is a Pre-RFC discussion on changing the CandidateReceipt
collection of data structures used within Polkadot’s core technology: CandidateDescriptor
, CandidateReceipt
, CommittedCandidateReceipt
, and PersistedValidationData
.
Changing these structs is very painful, as it’s highly integrated within runtime storage, inherents, network protocols, and databases. As such, it’s better to do many changes to this format at once rather than one at a time. The outcome of this discussion will be an RFC to update the format, with the justification for each change.
I will edit this top-line post over time to add changes as discussed in the thread below.
Context: Current Data Types
(expand for struct definitions)
/// A unique descriptor of the candidate receipt.
pub struct CandidateDescriptor {
/// The ID of the para this is a candidate for.
pub para_id: Id,
/// The hash of the relay-chain block this is executed in the context of.
pub relay_parent: Hash,
/// The collator's sr25519 public key.
pub collator: CollatorId,
/// The blake2-256 hash of the persisted validation data. This is extra data derived from
/// relay-chain state which may vary based on bitfields included before the candidate.
/// Thus it cannot be derived entirely from the relay-parent.
pub persisted_validation_data_hash: Hash,
/// The blake2-256 hash of the PoV.
pub pov_hash: Hash,
/// The root of a block's erasure encoding Merkle tree.
pub erasure_root: Hash,
/// Signature on blake2-256 of components of this receipt:
/// The parachain index, the relay parent, the validation data hash, and the `pov_hash`.
pub signature: CollatorSignature,
/// Hash of the para header that is being generated by this candidate.
pub para_head: Hash,
/// The blake2-256 hash of the validation code bytes.
pub validation_code_hash: ValidationCodeHash,
}
/// Commitments made in a `CandidateReceipt`. Many of these are outputs of validation.
pub struct CandidateCommitments {
/// Messages destined to be interpreted by the Relay chain itself.
pub upward_messages: UpwardMessages,
/// Horizontal messages sent by the parachain.
pub horizontal_messages: HorizontalMessages,
/// New validation code.
pub new_validation_code: Option<ValidationCode>,
/// The head-data produced as a result of execution.
pub head_data: HeadData,
/// The number of messages processed from the DMQ.
pub processed_downward_messages: u32,
/// The mark which specifies the block number up to which all inbound HRMP messages are
/// processed.
pub hrmp_watermark: BlockNumber,
}
/// A candidate-receipt.
pub struct CandidateReceipt {
/// The descriptor of the candidate.
pub descriptor: CandidateDescriptor,
/// The hash of the encoded commitments made as a result of candidate execution.
pub commitments_hash: Hash,
}
/// A candidate-receipt with commitments directly included.
pub struct CommittedCandidateReceipt {
/// The descriptor of the candidate.
pub descriptor: CandidateDescriptor,
/// The commitments of the candidate receipt.
pub commitments: CandidateCommitments,
}
/// The validation data provides information about how to create the inputs for validation of a
/// candidate. This information is derived from the chain state and will vary from para to para,
/// although some fields may be the same for every para.
///
/// Since this data is used to form inputs to the validation function, it needs to be persisted by
/// the availability system to avoid dependence on availability of the relay-chain state.
pub struct PersistedValidationData {
/// The parent head-data.
pub parent_head: HeadData,
/// The relay-chain block number this is in the context of.
pub relay_parent_number: BlockNumber,
/// The relay-chain block storage root this is in the context of.
pub relay_parent_storage_root: Hash,
/// The maximum legal size of a POV block, in bytes.
pub max_pov_size: u32,
}
Commit to PoV Size
Motivation: provide the relay chain and validators with more information regarding the amount of data bandwidth candidates use. This allows for more optimal scheduling and networking.
Candidate descriptors currently only commit to the PoV hash, not the PoV’s size itself.
The descriptor should change the field pov_hash
to a pov_commitment
.
pov_commitment: (Hash, u32)
Validators in backing, approval-checking, and disputes should reject candidates when the PoV’s size differs from the commitment. The runtime should reject backed candidates where the committed PoV size is greater than the maximum allowed.
Commit to execution time taken (needs discussion)
Motivation: provide the relay chain with information about how much time candidates take to execute. This allows for more fine-grained resource utilization and bundling of candidates into approval-checking workloads.
The exact measurement of execution time is difficult - I suspect this should be either expressed in an instruction count (once metering lands) or a proportion of the maximum execution time value allotted to a single candidate. Since the semantic meaning of this field may change over time, it’d be best to have a future-proof format.
Synchronous Composability Commitments
Motivation: allow parachains to be synchronously interoperable with each other by building parachain candidates which together perform an atomic operation.
This is done by allowing candidates to specify additional constraints on their ability to be accepted to the relay chain in the form of “This candidate is only able to be included when a candidate from another parachain produces this commitment”.
I suggest an additional field on the CandidateCommitments
of the form:
enum SyncCommitmentSource {
Pvf, // The commitment comes from the bare PVF of the the parachain.
Accord(AccordId), // The commitment comes from an accord which the parachain participates in
}
// 128 bit unique value - a "fingerprint" of the synchronous operation performed.
type SyncCommitment = (SyncCommitmentSource, [u8; 16]);
struct CandidateCommitments {
sync_commitments: Vec<SyncCommitment>,
required_sync_commitments: Vec<(ParaId, Vec<SyncCommitment>)>,
// ...
}
The sync_commitments
are essentially statements of the following:
- This candidate has produced an extra commitment from the given source and with the given data.
The required_sync_commitments
are statements of the following:
- This candidate may only be included when another candidate from each of the listed parachains is included, and each candidate produces the listed commitments.
- If any of the the required candidates is reverted, this candidate must also be reverted.
Note that this format only ensures safety in synchronous composability. It is the responsibility of parachains participating in synchronous composability to maintain liveness. With synchronous composability, liveness is a concern, as one parachain may gate itself on a commitment from other parachains which is never issued. This likely implies sharing collator infrastructure. Best practices for using synchronous composability need to be studied higher up the stack.
The relay chain runtime must limit the total amount of sync commitments a single parachain candidate may issue, and the number of dependencies per commitment or in total in order to keep candidate receipts of limited size.
Sequence Numbers
Motivation: more efficient garbage collection and spam protection rules.
Parachains aren’t necessarily required to be chains. This leads to the possibility of parachain state transitions being cyclical, e.g. A → B → C → A → … .
Spam protection must take this into account, which is a source of high complexity in the asynchronous backing implementation.
Adding a sequence number to candidate descriptors and having this be an input to the PVF would simplify spam protection enormously, especially for upcoming features such as bundling and elastic scaling.
This could either go into the candidate descriptor itself, or into the PersistedValidationData
. The former is simpler, but the latter is more efficient in terms of on-chain space used per candidate.
Relay Chain State Read Commitments (needs discussion)
Motivation: Allow candidates to draw on data which is stored in the relay chain state.
This is useful for e.g. smart contracts or other large blobs to sidestep being placed in the PoV. Candidates may commit to keys whose values should be read from the relay chain state and passed into the PVF as a parameter.
The difficulty here is in ensuring that the data is still available at all points where a candidate may still be executed, as relay chain full nodes will prune data. While nodes don’t prune data prior to finality, the relay parent of a candidate may be a finalized block even though the relay chain block where the candidate is included will not be finalized during approvals or disputes. Needs further discussion. Packaging these state keys into the AvailableData
is little better than having them appear in the PoV.