Generalized storage proofs

burdges · December 1, 2022, 11:24pm

What do we want from storage proofs?

Any classic Merkle trees should be binary, not monstrosities like ETH’s radix 16. We can likely optimize how blake2 gets used here further too, as SPHINCS+ does a binary hash in one 512 bit go.
SNARK friendly hash functions bring strange arities like 4 or 9. These bring strange proof types too, like the SNARKs of course, but likely odd non-SNARK proofs that make up for their bad arity outside SNARKs, although some need classical Merkle proofs sometimes too even with bad arities.
We need Merkle proofs into our own chain’s past state, sometimes only shallow, but some chains need deeper proofs, implemented via skip list or Merkle mountain range, or occasionally the MMB when you optimize for recent but expose deep. Also sometimes SNARK friendly, ala ZCash sapling.
We need Merkle proofs into other chain’s recent state, presumably by passing through a recent relay parent hash to a relay chain state, and then passing to the other chain’s state. We’ll sometimes want another SNARK friendly access path across a family of chains though.
Almost all storage proofs demand aggregation of some form, of course Merkle proofs unify as you go up the tree, but KZG proofs are only really helpful once you batch them. It follows proofs should be aggregated across the whole block, and thus live outside the core block in some PoV-like construct, but anything strange is still part of the block from polkadot’s perspective, not part of the PoV data of course.
It’s clear Merkle proofs do not aggregate across chains, so the real PoV data does not merge with this in-block PoV-like data for Merkle proofs. There exist proofs types that aggregate across chains, but likely only relevant for swarms of zk parachains, so the block vs PoV distinction survives I think.
Individual tx that employ strange storage proofs should not sign over the storage proof body, because these storage proof bodies wind up aggregated away, either into another part of the block for Merkle copaths, or disappear almost completely in KZG.
Individual tx do however need to authenticate the strange proofs they use, which means they should sign enough of the proofs leaves and roots. If not, we risk strange replay attacks changing votes’ roots or whatever.
I’d expect tx must supply these proofs too, which means whatever generates them should run RPC calls to multiple chains to acquire the proofs. Or smoldot?

I’ve made this sound scary and complex, especially by bringing up KZG, etc., but we should develop a feel for the larger design space, even though we won’t do this all at once. I think the 5+7 vs 8 issue feels fundamental to doing storage proofs well, but the block vs PoV distinction still kinda survives like 6 says, it’s just that the block itself needs some internal proofs data structure.

We’ve had several use cases for this stuff in polkadot itself, async backing and xcmp, and also some places we wisely chose to skip. It comes up in Multichain friendly account abstraction and Interchain Proof Oracle Network too, also other lazy messaging discussion brought up by parachain teams What else?

bkchr · December 8, 2022, 3:40pm

Last week as we discussed the recovery of parachains I came up with this crazy idea of supporting to call into wasm blobs of other parachains while validating a parachain block. The idea there was to provide a common interface for verifying proofs that are opaque to the caller, but should be interpret-able by the callee aka the wasm blob of the parachain from where we have extracted the proof. Relay chain validators already have all or most of the parachain wasm blobs pre-compiled, so that calling them should be fairly cheap. However, this probably brings a lot of other problems with it that we don’t want to start with. Simple issues like, how much memory would this sub instance get or what are we doing if the validator hasn’t yet pre-compiled the wasm? There is probably an endless stream of issues with this. On the parachain side we would then also need to support importing these blocks without needing to have the blob of the other parachain available anymore. While building we would require these blobs and it should be fairly simple to get them, but at import it gets more complicated.

So, we should forget this idea. However, the general idea of having standardized interface for checking these proofs sounds like something we could think more about. In general this goes into the direction of SPREE I would say. Instead of using the actual parachain wasm blob to verify a certain proof, we could introduce wasm blobs which are only providing certain functionality. We could experiment with using wasmi inside the runtime to execute these blobs, similar to what pallet_contracts is doing. However, we could trust these blobs “more”. One of the questions would be on where these extra blobs are stored. Are they stored in the chain that wants to verify a proof? Then we would need to put these extra blobs into the PoV when sending it to the relay chain. Could we maybe put this into the relay chain and make them accessible by the parachain validation? I mean we would only need to provide the blob and the execution would happen inside the validation blob using wasmi. A lot of that boils down to the question on how big these blobs are and how fast the execution with wasmi is.

Another question would also be on what kind of interfaces we would need to support? Maybe for certain things we could get away with some generic verify_proof(key: Vec<u8>, known_head: Vec<u8>, proof: Vec<u8>) -> Option<Vec<u8>>. Where you pass the key and the proof and then it returns the value it found for this key in the proof. In the chain that wants to have a proof of a different chain verified, the call that is doing this verification would take the proof and the expected_value that should be found in the proof. While writing this I realized that we would also need to know the key and we can not trust the caller of our call to give us the correct key. So, this interface wouldn’t really work and we would need much more specialized interfaces like extract_balance_from_proof(user: Vec<u8>, known_head: Vec<u8>, proof: Vec<u8>) -> u128. Not sure on how many of these interfaces we would need.

See this post as some kind of brain dump. It doesn’t really give any great solution, just some ideas to think about. I hope that this may inspires someone else to come up with a great idea Or we can just iterate on these ideas, try them and see where they bring us. I think in general with runtime upgrades etc where are really good suited to play around with ideas to come up with solutions

burdges · December 10, 2022, 11:50pm

Why does wasmi help there? Contracts need metering, but not sure why this requires metering? Or you meant the foreign wasm blob is untrusted so not like a SPREE?

I’m firstly interested in a parachain being able to do proofs about its own past, because that’s the form this appeared in polkadot, and its what all anonymous payment schemes do in zk, so I expect it to appear in that form again.

You can also have chains talking in some “more efficient/lazy than messaging” way if they can both understand some portion of one another’ state, due to some standard.

ntn_x2 · December 12, 2022, 2:35pm

Very interesting discussion and will definitely keep monitoring it to see how it develops. Say we want to implement a merkle-based proving scheme with one blockchain generating a merkle proof and another blockchain verifying it based on some root that they have received by a trusted source, what are the tools currently available to parachain developers? I mentioned this already in Multichain friendly account abstraction - #16 by ntn_x2, but I feel this is probably a better space. We at KILT would be willing to take what is currently available, try to see how it fits what we have set out to do, and report back, potentially proposing new/better ways of achieving this.

burdges · June 16, 2023, 1:30pm

We want some non-merklize ephemeral/transient storage ala

github.com/paritytech/substrate

Transient storage runtime host function

opened 10:58AM - 25 Mar 20 UTC

gavofyork

I9-optimisation Z2-medium T1-runtime

Right now we abuse storage for intra-block data such as block number, parent has…h and block author as well as various housekeeping information and flags like whether we set the uncles `Authorship::DidSetUncles`. When initially writing, this incurs an extra trie lookup, which is slow. Instead there should be another host API, which works exactly like set_storage/get_storage but has no trie backing, so it never tries to lookup the value in the trie, nor does it write the value at the end of the block.

We’ve little reason to impose a substrate storage interface upon this ephemeral/transient. In fact, arbitrary rust types should make sense, so batch verification could look like:

/// Ephemeral storage
static EPHEMERAL_STORAGE: Mutex<Option<schnorrkel::PreparedBatch>> = Mutex::new(None);

As for substrate non-zk storages, we could’ve one ephemeral frame storage root accessed, into which on_initialize “mounts” the non-ephemeral storage roots from the block header. It resembles how unix has a root filesystem, possibly in-memory, into which it mount the filesystems one uses. It’s provide one common name space for the FRAME DSL, but not require each storage root utilize the same storage proof system.

Another example of how this would be useful is multi-block migrations.

github.com/paritytech/substrate

[FRAME] Static analysis for HDH/ME code

opened 11:18AM - 12 Jun 23 UTC

ggwpez

U4-some_day_maybe S0-design

@gavofyork brought up the idea of doing static code analysis to ensure the runti…me's integrity in the presence of MBMs. (Trying my best to recapitulate this) All code that needs to execute by a specific deadline (aka. HDH/ME) must never read or write storage that is written by a pallet that can do MBMs. Or rather: by MBMs themselves. The fact that MBMs normally operate on pallets is, I think, irrelevant. Currently this is not checked and could lead to cases where `on_initialize` calls into a pallet or reads storage that is under the effect of a MBM and thereby results in undefined behaviour. Entrypoints for this analysis would be `on_initialize`, `on_finalize` and inherents. The analysis would probably need to run on the runtime definition itself to be able to consider all callbacks and hooks that get configured in the pallets. It needs to check reachability to the storage API from all HDH/ME hooks and analyze their arguments to be possibly overlapping with MBM code. The output should present all traces to possible UB and error if anything is found. For the implementation: no idea - just experiment. Maybe in a build script, or as external tool that we can then call in a build script. Could also be some macro magic on the `construct_runtime` itself :shrug: ### Taxonomy - HDH (Hard Deadline Hook): Something that must execute by a specific deadline (ie. every block or within the next 10 blocks). - ME (Mandatory Extrinsic): Extrinsic that must execute. - MBM (Multi Block Migration): A storage migration that spans over multiple blocks. The storage that is being migrated must be shielded from external access and modification. - UB (Undefined behaviour): Anything can happen.

You’d first add a the new state root into the chain’s block header, which impacts nothing since the current chain never touches that state root. Next your code upgrade changes its mount points, with both the old and new state roots going to temporary names, under which the migration code runs. After the migration concludes, the old state root is abandoned, and the new state root gets mounted where its accessible to user transactions.

Topic		Replies	Views
Straightforward cross-chain storage proofs PoC Tech Talk	3	198	May 21, 2024
zkCasper: A SNARK based protocol for verifying Casper FFG Consensus Tech Talk	0	431	May 11, 2023
XCM Storage Query Tech Talk xcm , frame	4	924	June 19, 2023
Rethinking historical storage	6	517	October 7, 2023
ZetaGrid: A Scalable Protocol for Permissionless State Proof Verification on Polkadot Tech Talk frame , ux , decentralized-future	1	67	November 19, 2024

Generalized storage proofs

Related topics