Generalized storage proofs

What do we want from storage proofs?

  1. Any classic Merkle trees should be binary, not monstrosities like ETH’s radix 16. We can likely optimize how blake2 gets used here further too, as SPHINCS+ does a binary hash in one 512 bit go.
  2. SNARK friendly hash functions bring strange arities like 4 or 9. These bring strange proof types too, like the SNARKs of course, but likely odd non-SNARK proofs that make up for their bad arity outside SNARKs, although some need classical Merkle proofs sometimes too even with bad arities.
  3. We need Merkle proofs into our own chain’s past state, sometimes only shallow, but some chains need deeper proofs, implemented via skip list or Merkle mountain range, or occasionally the MMB when you optimize for recent but expose deep. Also sometimes SNARK friendly, ala ZCash sapling.
  4. We need Merkle proofs into other chain’s recent state, presumably by passing through a recent relay parent hash to a relay chain state, and then passing to the other chain’s state. We’ll sometimes want another SNARK friendly access path across a family of chains though.
  5. Almost all storage proofs demand aggregation of some form, of course Merkle proofs unify as you go up the tree, but KZG proofs are only really helpful once you batch them. It follows proofs should be aggregated across the whole block, and thus live outside the core block in some PoV-like construct, but anything strange is still part of the block from polkadot’s perspective, not part of the PoV data of course.
  6. It’s clear Merkle proofs do not aggregate across chains, so the real PoV data does not merge with this in-block PoV-like data for Merkle proofs. There exist proofs types that aggregate across chains, but likely only relevant for swarms of zk parachains, so the block vs PoV distinction survives I think.
  7. Individual tx that employ strange storage proofs should not sign over the storage proof body, because these storage proof bodies wind up aggregated away, either into another part of the block for Merkle copaths, or disappear almost completely in KZG.
  8. Individual tx do however need to authenticate the strange proofs they use, which means they should sign enough of the proofs leaves and roots. If not, we risk strange replay attacks changing votes’ roots or whatever.
  9. I’d expect tx must supply these proofs too, which means whatever generates them should run RPC calls to multiple chains to acquire the proofs. Or smoldot?

I’ve made this sound scary and complex, especially by bringing up KZG, etc., but we should develop a feel for the larger design space, even though we won’t do this all at once. I think the 5+7 vs 8 issue feels fundamental to doing storage proofs well, but the block vs PoV distinction still kinda survives like 6 says, it’s just that the block itself needs some internal proofs data structure.

We’ve had several use cases for this stuff in polkadot itself, async backing and xcmp, and also some places we wisely chose to skip. It comes up in Multichain friendly account abstraction and Interchain Proof Oracle Network too, also other lazy messaging discussion brought up by parachain teams What else?

8 Likes

Last week as we discussed the recovery of parachains I came up with this crazy idea of supporting to call into wasm blobs of other parachains while validating a parachain block. The idea there was to provide a common interface for verifying proofs that are opaque to the caller, but should be interpret-able by the callee aka the wasm blob of the parachain from where we have extracted the proof. Relay chain validators already have all or most of the parachain wasm blobs pre-compiled, so that calling them should be fairly cheap. However, this probably brings a lot of other problems with it that we don’t want to start with. Simple issues like, how much memory would this sub instance get or what are we doing if the validator hasn’t yet pre-compiled the wasm? There is probably an endless stream of issues with this. On the parachain side we would then also need to support importing these blocks without needing to have the blob of the other parachain available anymore. While building we would require these blobs and it should be fairly simple to get them, but at import it gets more complicated.

So, we should forget this idea. However, the general idea of having standardized interface for checking these proofs sounds like something we could think more about. In general this goes into the direction of SPREE I would say. Instead of using the actual parachain wasm blob to verify a certain proof, we could introduce wasm blobs which are only providing certain functionality. We could experiment with using wasmi inside the runtime to execute these blobs, similar to what pallet_contracts is doing. However, we could trust these blobs “more”. One of the questions would be on where these extra blobs are stored. Are they stored in the chain that wants to verify a proof? Then we would need to put these extra blobs into the PoV when sending it to the relay chain. Could we maybe put this into the relay chain and make them accessible by the parachain validation? I mean we would only need to provide the blob and the execution would happen inside the validation blob using wasmi. A lot of that boils down to the question on how big these blobs are and how fast the execution with wasmi is.

Another question would also be on what kind of interfaces we would need to support? Maybe for certain things we could get away with some generic verify_proof(key: Vec<u8>, known_head: Vec<u8>, proof: Vec<u8>) -> Option<Vec<u8>>. Where you pass the key and the proof and then it returns the value it found for this key in the proof. In the chain that wants to have a proof of a different chain verified, the call that is doing this verification would take the proof and the expected_value that should be found in the proof. While writing this I realized that we would also need to know the key and we can not trust the caller of our call to give us the correct key. So, this interface wouldn’t really work and we would need much more specialized interfaces like extract_balance_from_proof(user: Vec<u8>, known_head: Vec<u8>, proof: Vec<u8>) -> u128. Not sure on how many of these interfaces we would need.

See this post as some kind of brain dump. It doesn’t really give any great solution, just some ideas to think about. I hope that this may inspires someone else to come up with a great idea :slight_smile: Or we can just iterate on these ideas, try them and see where they bring us. I think in general with runtime upgrades etc where are really good suited to play around with ideas to come up with solutions :slight_smile:

3 Likes

Why does wasmi help there? Contracts need metering, but not sure why this requires metering? Or you meant the foreign wasm blob is untrusted so not like a SPREE?

I’m firstly interested in a parachain being able to do proofs about its own past, because that’s the form this appeared in polkadot, and its what all anonymous payment schemes do in zk, so I expect it to appear in that form again.

You can also have chains talking in some “more efficient/lazy than messaging” way if they can both understand some portion of one another’ state, due to some standard.

1 Like

Very interesting discussion and will definitely keep monitoring it to see how it develops. Say we want to implement a merkle-based proving scheme with one blockchain generating a merkle proof and another blockchain verifying it based on some root that they have received by a trusted source, what are the tools currently available to parachain developers? I mentioned this already in Multichain friendly account abstraction - #16 by ntn_x2, but I feel this is probably a better space. We at KILT would be willing to take what is currently available, try to see how it fits what we have set out to do, and report back, potentially proposing new/better ways of achieving this.

1 Like

We want some non-merklize ephemeral/transient storage ala

We’ve little reason to impose a substrate storage interface upon this ephemeral/transient. In fact, arbitrary rust types should make sense, so batch verification could look like:

/// Ephemeral storage
static EPHEMERAL_STORAGE: Mutex<Option<schnorrkel::PreparedBatch>> = Mutex::new(None);

As for substrate non-zk storages, we could’ve one ephemeral frame storage root accessed, into which on_initialize “mounts” the non-ephemeral storage roots from the block header. It resembles how unix has a root filesystem, possibly in-memory, into which it mount the filesystems one uses. It’s provide one common name space for the FRAME DSL, but not require each storage root utilize the same storage proof system.

Another example of how this would be useful is multi-block migrations.

You’d first add a the new state root into the chain’s block header, which impacts nothing since the current chain never touches that state root. Next your code upgrade changes its mount points, with both the old and new state roots going to temporary names, under which the migration code runs. After the migration concludes, the old state root is abandoned, and the new state root gets mounted where its accessible to user transactions.