Post Quantum Cryptography Roadmap for Polkadot and JAM

By research@web3.foundation

1. Introduction

This document lays the roadmap to making the Polkadot and JAM protocols Post Quantum-resistant (PQ-resistant).

While post-quantum attacks are not posing any practical threat to the security of Polkadot network currently, we want to stay ahead of any possible threat even if the risk lies in the future. We propose to secure Polkadot and JAM against potential post-quantum attacks by replacing cryptographic primitives used at the consensus layer and, in account, certification of these protocols with post-quantum ones. The central cryptographic primitives used at consensus layers are hashes, signatures, and VRFs. The hash functions currently used in Polkadot and JAM are deemed post-quantum secure and, therefore, could be used as they are. The signature and VRF primitives, however, need to be replaced or supplemented with a cryptographically secure one.

Organization: Section Signatures introduces the various signature schemes used by Post Quantum Polkadot and JAM. The post-quantum secure signature primitives we recommend are the lattice-based NIST standards ML-DSA, Falcon, and ML-KEM, with the former and latter formerly known as CRYSTALS Dilithium and Kyber.

Section VRF explains how to replace the current usage of Elliptic curve-based VRFs in Polkadot and JAM with post quantum secure solutions.

Additionally, to secure the transport layer, which makes communication possible between Polkadot nodes, the minimum primitives needed are signatures and the non-interactive key exchanges. As this problem is far from unique to Polkadot and JAM architecture, we could safely assume that post-quantum secure transport layer protocols will emerge and be standardized, and we can use them to secure the Polkadot transport layer. While this is out of the scope of this document, we will briefly examine this issue in the Section Transport Layer.

2. Signatures

The Falcon and Dilithium signatures are the two signature schemes approved by the National Institute of Standards and Technology as post-quantum secure signatures. We use both schemes in different parts of the Polkadot protocol to replace all signature schemes.

  • Falcon: Falcon offers a faster signature scheme with a smaller size. However, this scheme does not have a constant implementation time. As explained in the General Account Signatures section, Polkadot will use this scheme to handle accounts stored in the Polkadot state.
  • Dilithium: Dilithium offers a bigger and slower signature scheme. However, it has a constant-time implementation, which makes it suitable for a consensus protocol. The Validator Signatures section describes this in detail.

For the reference, we are listing various benchmarks comparing these schemes here:

2.1 Validator Signatures

We will use the Dilithium scheme as specified by NIST for all messages signed by a validator except for BEEFY messages. This includes GRANDPA signatures, for example. Note that this is different for the current situation in Polkadot, as some signatures are currently integrated with VRF schemes and so the signature scheme is chosen to facilitate that integration. As such, validator schemes are not harmonized. VRF primitives are handled differently in post-quantum secure Polkadot, making harmonizing all validator signature schemes possible.

We use the same scheme for block production, that is, we use this scheme to sign the block after generating it. However, the VRF aspect of Sassafras’s single leader election protocol, which dictates who is eligible to generate a block for a given slot, must be handled by a post-quantum secure protocol described in the Sassafras section.

2.2 Account Signatures

The Falcon signature scheme is going to be used to secure accounts in post-quantum Polkadot. The migration from the current account scheme to Falcon is explained in Account migration section.

2.3 Account Migration

A post-quantum secure FRI-based SNARK is used to prove that the account holder knows the seed that has been hashed to generate the elliptic curve private key without revealing the private key itself. The FRI proof of 100s of KB is a signature of knowledge of the seed used in HDKD, resulting in the private key corresponding to the account elliptic curve point. This exploits the fact that even though the secret key for elliptic curve signatures is not post-quantum secure, it is actually derived using hashes, which are post-quantum secure.

The user submits the relatively large FRI proof to demonstrate the secret seed’s knowledge and the association of the post-quantum secure public key to the same seed. The large proof size is not prohibitive, as this demonstration needs to be carried out once. Since most users already use standardized algorithms for deriving keys from the seed, many cold wallets can transition to post-quantum crypto without having each account set up PQ keys in advance.
This is based on a similar idea from this ETH Research post by Vitalik Buterin. However, Vitalik Buterin suggests using a proof of knowledge that also includes the correct derivation of the lattice public key from the seed. This leads to an unnecessarily larger circuit than one for a signature of knowledge.

Chiesa, Manohar, and Spooner show in their work that protocols with round-by-round soundness and knowledge are secure in the quantum random oracle model and Block et al. show in their work that this applies to FRI. A lot of post-quantum signature suggestions are based on NIZKs with far worse complexity in terms of circuit sizes than the hash based SNARKs that are now becoming very practical because they do not realize that it is known that this is secure.

3. Bridge Protocol: Post Quantum BEEFY

Validators will sign BEEFY finality messages using each post-quantum secure scheme supported by the networks of interest that we are aiming to bridge. This includes the post-quantum scheme supported natively by EVM.

Bridge implementors could then use the current random sampling BEEFY protocol to prove the Polkadot state on other networks using the corresponding native scheme. While a relayer could potentially use a post-quantum secure SNARK such as FRI to present proof of the Polkadot state to the bridged chain, we expect that random sampling protocol to be far more competitive against SNARK-based design due to the size of post-quantum SNARK proofs.

4. VRF

The use of VRF in the current Polkadot protocol is being replaced by various application of a post-quantum secure Randomness beacon, as introduced in the Randomness Beacon section.

4.1 Randomness Beacon

Polkadot offers an unbiasable hashed-based post-quantum secure randomness beacon, assuming the honesty and liveness of 1/3 of validators, using techniques similar to Shamir’s secret sharing.

The randomness beacon protocol is based on a multi-instance verifiable secret sharing (VSS) where every party will act as a dealer in one VSS instance and as a recipient for all VSSes initiated by other validators. A low-degree test makes it possible to recognize honest parties and to use their shares to reconstruct the randomness contribution using Reed Solomon error correction. Subsequently, the final random value will be a linear combination of secrets obtained from all VSSes. All stages of this protocol are post-quantum secure.

4.2 Elves

We plan to replace some usage of the VRF in Elves with a randomness beacon. To significantly speed up the current (pre-quantum) Elves protocol, we will use public randomness from the randomness beacon along side the VRF.

In the post-quantum case however, we no longer need VRFs for ELVES. We replace VRF usage entirely with a combination of the randomness beacon and commit/reveal using a Merkle tree. While we hide the randomness while commiting to it in the Merkle tree, while we know it is unbiasable based on the threshold of honest validators participating in generating the random beacon.

The following subsections explain each substage of the public randomness-based Elves protocol.

4.2.1 Pre-Elves signature

We assume that we might have multiple equivocations for an RC block.

All validators sign the first RC block they import but never sign a second one.

We choose that fork for Elves if two-thirds of validators sign on the same block. We abandon the fork entirely if we never reach two-thirds of the signatures. We also hash any validator who signs multiple equivocations here.

This already has a draft RFC.

4.2.2 Tranche zero reveals

After seeing a 2/3 pre-elves signatures on a block, each validator reveals their share for a random beacon for the block. When they say enough (over 1/3) shares, they construct the random beacon result $r$. This is used as a seed for a shuffle that assigns validators to cores in tranche zero so that each core has about the same number of auditors and each validator does about the same amount of audits.

4.2.3 Later tranche reveals

We have an incoming randomness $r$ from the random beacon we use for tranch 0.

All nodes commit a random type TrancheRand([u8;32]) for each core in each slot in the epoch, so a [[TrancheRand; CORES]; SLOTS]. These commitments consist of a Merkle root of a tree indexed by slots, whose leaves are Merkle roots of a tree over cores.

We hash the final TrancheRand with $r$ and some auth_id to determine the tranche for that code in that slot. This scheme requires the 2/3 pre-elves consensus signatures to be verified previously, because TrancheRand can only be used once.

A tranche reveal consists of (slot,auth_id,merkle_proof,tranche,TrancheRand). These messages self-authenticate using the on-chain commitments. Theoretically, the slot component of merkle_proof could be optimized out, but then the messages assume other messages arrive, requiring politeness care.

(In the pre-quantum setting, using VRFs as we do now vs the above scheme is a computation/bandwidth trade-off.)

4.3 Sassafras

To generate the Sassafras ticket, we concatenate the validator’s secret key with the VRF input from the Randomness Beacon. Then, we double-hash it using SNARK-friendly Poseiden hash. We prove the secret key’s validity, its belonging to the ring of validators, and the correctness of double hashing in a post-quantum secure FRI-based SNARK. The double-hashed value dictates the block producer of each slot. Other validators will verify the validity of the SNARK proof and assign slots to each ticket.
The validator includes the result of the first hash (the preimage of the second hash) in the block they will produce to prove the ticket’s ownership without the need for another expensive SNARK proof.

4.3.1 Ring VRF

The input to the VRF is the same as the current Sassafras: the random beacon output is concatenated with the ticket index.

The public key is the hash of the secret key. We would compute the VRF Output as

VRF Ouput = Hash(Hash(secret key || VRF Input) || public key)

We then prove in an FRI based SNARK with input the VRF Input and ring commitment that:

  • The public key is in the ring, probably with a Merkle proof,
  • The public key is the hash of the secret key and
  • The output is computed as above.

We claim the ticket by revealing the preimage of the VRF output, i.e., Hash(secret key || VRF Input) || public key, and putting this in the block’s header. Then, we use the validator signature scheme to sign it. To verify this, we need to look up the validator’s validator signature public key from the Sassafras public key.

When registering a public key, the validator must sign the Sassafras hash-based public key with their validator key.

5. Transport Layer

To turn the transport layer post-quantum secure, we require hybrid post-quantum key exchanges and certificates using on-chain certificates. Then, we use the exchanged keys to communicate using secure, audited external libraries that implement post-quantum secure TLS.

What happens next?

We are fleshing out more details about some of points above that will be rolled out in the next couple of weeks/months.

We propose to apply post-quantum changes first to Kusama same as the Zero-Knowledge tech turning Kusama into the cooler twin of Polkadot :smiling_face_with_sunglasses::bird:. We are preparing RFCs that will be soon posted to put the roadmap above into action. Stay tuned!

THE END

For a detailed explanation of the randomness beacon protocol, please see: https://hackmd.io/psN4gEpPRJmCGN0sFq36Mg.

9 Likes

This nicely outlines what we’ve all discussed for post-quantum polkadot.

A few nuances:

First, quantum computers would be extremely bad news, like discovering that “God is a fascist” and the global economy collapses bad. It’d rock if some super-determinism theory eventually triumps and disproves quantum computers. At present though, most physicists do think quantum computers shall happen eventually.

We thus plan for quantum computers now, even though we’ve no reason to think they’ll happen this decade. Encryption should adopt hybrid ECC + post-quantum key exchanges now, because otherwise messages sent today would stop being secure if a quantum computer gets invented in the future. Authentication ala blockchains must become post-quantum before quantum computers, but it’s mostly harmless if they remain vulnerable today. Post-quantum cryptography also helps us think somewhat further outside the box and better understand the full design space.

Second, those FRI based SNARKs that prove account ownership would be 100s of kb, so they become vastly more efficent in parachain (in-core) on Polakdot than they’d ever be on competing blockchains. There are some complexities here like current implementations of FRI based SNARKs were written for ETH scaling and most neglect zero-knowledge, but likely this improves over the next couple years.

Third, Sassafras tickets via FRI based SNARKs is a somewhat significant bandwidth cost, more than 100 kb per relay chain block, gossiped and verified by each validator. It’s also problematic that FRI based SNARKs often neglected zero-knowledge so far. We’d obviously pay those costs if quantum computers existed, but we’ve other earlier designs from several years ago when FRI based SNARKs looked even larger, including (1) running the Sassafras ticket verifier in-core, (2) a cheap slashing based design, and (3) do a non-PQ SNARK that’s still PQ anonymous, so then adversaries maybe leak their usage of a quantum computer. Anyways Sassafras tickets can be done in several rather different ways, but even minor changes make Sassafras a low priority.

Fourth, our ELVES plan pulls together many different threads. @AlistairStewart and I have always planned to handled RC equivocations better, even before ELVES was fully designed. We’d wilder ideas but simply adding some partial earlier concensus solves this easier. @AlistairStewart wanted earlier concensus for “messaging extractable vcalue” (MMEV) and other reasons for a couple years now. I’ve discussed commit & reveal ideas with @eskimor for ages, because our VRFs need considerable CPU time, but doing so requires partial earlier concensus. All this says, Polkadot/JAM should aim for the partial earlier concensus, ala Initial very rough draft for early pre-ELVES soft concensus by burdges · Pull Request #144 · polkadot-fellows/RFCs · GitHub irregardless of PQ.

Alright, so far this looks cut & dried, but there is one more thread: @AlistairStewart noticed that, if tranche zero assignments can be assigned by threshold randomness, then we could reduce approval checker assignments by 40%, aka increase core count by 2/3rds. In fact, we’d almost doubles core count by combining this with slightly better parameters elsewhere.

You need distributed key generation (DKG) for threshold randomness though. At a high level DKGs have this massive griefing/aborts problem, which people prevent by using expensive zk proofs. As parachains/services are transperent proofs, you can massively improve almost DKG by running in on polakdot. We’ve roughly three flavors:

  1. A regular shitty scalar DKG winds up much less shitty when run in-core. This would’ve exactly the same CPU time cost as our current tranche zero assignments. Also, we could employ that shitty scalar DKG for a bitcoin bridge.
  2. Alistair and Elizabeth Crites have welded a much nicer “aggregatable” DKG onto what’s really just a BLS signature. In gossip, those BLS signatures would waste considerable CPU time, so we’d wrap these inside another faster siganture. We think those annoyances sound smaller than the worse DKG, but if a bitcoin bridge was a higher priority, not so expensive, etc then maybe we’d feel differently.
  3. The PQ randomness beacon discussed here brings implementation concerns more like the shitty DKG story. We might choose it anyways even without the PQ issue, purely to save CPU time, but not necessarily.

In brief, we’ll probably choose 2 which seemingly delays both any bitcoin bridge or PQ, but this seems like a good compromise short-term. We still owe the parachain team some notes about why going first 1 and then 3 looks worse.

1 Like