Trustless wasm compilation with SNARKS?

pepyakin · September 1, 2023, 10:50am

Compilation of untrusted code is hard.

This is because optimizing compilers leverage algorithms that may consume amounts of execution time or memory way beyond the size of the program. An adversary may exploit this and using a small program make the compiler take minutes of running time and gigabytes of memory. It makes it difficult to charge the user for the resources taken.

There are two strategies we employ to sidestep this problem:

Linear time compilation. Simple, we just use a compiler with a known fixed runtime for a single instruction of input code. Typically, the code produced by those compilers are not of the highest quality. We are using that approach for execution untrusted code, i.e. building contracts (See e.g. Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!) - #4 by koute exploratory work in this direction).
Pre-checking. This is where the validators try to compile the code in a safe way: in a separate process, with reasonable memory and time limits. In case the compilation process doesn’t finish because of panics or exhausting the resources, the validator will vote NAY on that code. The code can still be accepted if supermajority votes AYE for the code. In this case the failing validators have to swallow it and try their best to compile the code. This may negatively impact minority platforms^[1].

I had ideas of employing fault proofs to combat this problem. Specifically, shove the compiler we are using, wasmtime, into the a fault provable virtual machine e.g. wasm.^[2], allow somebody to assert that the compilation finishes in a certain number of steps and with a given amount of memory. In case it doesn’t compile, another actor would be able to create a fault proof. Specifically, it would show that after N steps the compilation either failed, ran out of memory or reached the maximum number of steps. You could instantiate this process multiple times to cover different target platforms (avoiding the minority platforms problem), several versions of wasmtime, etc.

However, this solution never gained any significant mindshare, because it has some important flaws:

Latency. Fault proofs imply existence of a challenge period. A time period where actors should be able to challenge the assertion(s). In folklore, and indeed in the most deployments (rollups) we see this number set to 7 days.
Fault proofs have 1-of-N security model. IOW, only a single actor is required to create a challenge. It’s a common and inherent problem of fault proof protocols: “It’s somebody elses job”. Some protocols added a mechanisms of deliberate fault injection, but IMO, they suck and also add complexity. Speaking of which:
Systemic complexity. The fault proof machinery is complex and it’s oozing in every direction. See Vitalik’s post.

I haven’t paid attention to anything related to SNARK integrity proofs (“ZK proofs”), but it seems like now is the time to change this mindset. Recently, SNARK proofs are graduating from science fiction to practical reality.

The first project to show this is risc0 by allowing a convenient way to write statements that produce SNARK proofs. Specifically, it proofs execution of RISC-V VM producing a cryptographic proof. Such a proof is very small and easy to verify. The verifier code is not terribly complex.

Using SNARK proofs we could have an elegant (as far as compiling untrusted code on blockchain can be) solution to our problems. Here is a strawman proposal:

Wasmtime/cranelift is compiled to RISC-V as part as a proved program. We run it through risc0 VM comping the untrusted input code into the target architecture tracking how much memory and cycles it took. As a result, we get a cryptographic proof that the input does indeed compile within a certain number of bytes of RAM and cycles using a specific version of wasmtime/cranelift for a specific target architecture.

This proof is a proxy evidence of how much it would take to compile the code on a platform that is different to RISC-V. The executed number of instructions or used memory bytes should differ by a constant factor. What can lead to big differences is the choice of target architecture^[3].

But pep, I hear you, aren’t SNARK provers extremely expensive? Well, kind of. To check whether this is workable I crafted a proof-of-concept: I took the cranelift-codegen and cranelift-reader. Generated a CLIF file from an ink erc20.wasm and proved it using bonsai API. It took several minutes. I don’t think it would cost too much and I think proving code locally is also on the table.

Now this gives us the following advantages:

Potentially, allows using the optimizing compiler for contracts!
Solves the minority platform problem.
If you feel particularly cocky , you may not even need a compiler on board of the blockchain node. Instead, modify the proved program to output the binary code (e.g. Module::serialize) for the target platform. Now, you have to make sure that all flags that can affect codegen match, and, you will likely have to provide multiple variations of code, but it’s interesting design space to explore.

Now, I am not proposing we are adopting this approach right now. There are the following problems that I see now:

risc0 is not mature. We need to wait until it’s audited and battle tested.
we need to adapt wasmtime. In my PoC I had to perform a couple of changes to make it work. But I used raw CLIF, but we need to ensure that the SAME clif is passed into the cranelift that would’ve been generated by wasmtime, otherwise, the equivalence is lost. Also, we need to make sure that ensuring matching the compilation flags is possible and fool proof.
obviously, rigoriously test the assumptions I made in this post.

As closing words, I think SNARKs open up an entire new design space. We had many ideas that dependended on the checking some properties of the code, but were rejected because they were too resource intensive. Now, we are about to have practically unlimited compute capability: we can precompute all kinds of stuff, run various analysis. Maybe things like: oh we can know the maximum stack size of each function (or in some cases entire program) and can come up with a better stack limiting mechanism. I haven’t given it too much of thought, but I feel there might be a treasure trove, and therefore I encourage to keep this tool in mind.

Imagine most of the validators are x86-64. Some code compiles good on this platform. But, say on aarch64, there is a bug/performance quirk, that makes it compile very long. Because most of the validators manage to compile in time, the minority will be ignored and potentially this situation will lead to slashing the minority platform validators. ↩︎
Although as I argue in my post wasm is not the best medium for fault proofs. ↩︎
Each target arch is implemented by its own backend. But even in case the same-ish architecture, e.g. RV32I and RV32E which primarily differ in the number of available registers, the execution paths could differ drastically. ↩︎

Alex · September 1, 2023, 11:10am

For contracts it would require us to store the compilation output on-chain as we can’t just require the client to have all binaries cached as we can for the runtime. For the simple reason that the relay chain does not have the source code (wasm) in state.

This caveat is also what made us ditch the caching approach and instead go for very fast compilation.

Not saying it is not promising. Just too many open questions to actually consider it for the short term.

pepyakin · September 1, 2023, 11:27am

I absolutely agree. While this may be a competing proposal to, say PolkaVM, I think it’s more long-term solution (The fact that risc0 is not battle tested implies that we would need to wait quite some time). I would like to defer comparisons between those solutions to another thread, which preferrably has a more concrete definition of the SNARK side, not a strawman presented here! The main point of this thread is to educate developers about this tool that IMO has a lot of promise.

For contracts it would require us to store the compilation output on-chain as we can’t just require the client to have all binaries cached as we can for the runtime. For the simple reason that the relay chain does not have the source code (wasm) in state.

This caveat is also what made us ditch the caching approach and instead go for very fast compilation.

I thin it’s a bit more nuanced.

In the case of re-execution validity proofs (parachain/PVF execution), we would need to put the compilation output in the PoV. The parachain itself doesn’t have to store the compilation output in the store, since the collator can cache (offchain) or rederive it. The deal breaker may be the footprint of PoV. It might be alleviated with the relay-chain side caching of the final machine code artefacts (no wasm needed for rederivation).

In other cases (standalone L1 or rollups) you don’t care about that, since validators don’t have to re-execute it, and the protocol may be designed in a way that allows plenty of time to make sure the required artefacts are in place.

Cyrill · September 1, 2023, 11:59am

Very interesting idea, thanks for sharing.

Another problem I see is security. It seem certainly possible to proof that the compilation finishes within some computational time bounds using STARK VMs. But proofing that the compiled code is “secure” (in the sense of as secure as you’d properly sandbox it), might be hard to impossible. So applying ZK might even be somewhat complementary to PolkaVM?

koute · September 1, 2023, 12:00pm

This is an interesting idea!

I have a limited understanding of how all of this newfangled zero knowledge stuff works exactly, so let me see if I got this right. Please correct me if I misunderstood anything.

Approach #1

(offline) You compile your compiler into RISC-V.
(offline) You compile your actual program with the compiler running under risc0 and get a proof of how much time + memory it took.
(offline) You upload all of the inputs (your program and the compiler) on-chain.
(online) The compiler blob, the program blob and the proof are fetched, and the node verifies the proof.
(online) The node runs the compiler natively producing native code. There’s no time nor memory metering because the proof has proved that it’s bounded.

Approach #2

(offline) You compile your compiler into RISC-V.
(offline) You compile your actual program with the compiler running under risc0 and get a proof that the input results in a given native code.
(offline) You upload the inputs and the native code on-chain.
(online) The inputs and the native code are fetched, and the node verifies that the native code was in fact correctly generated from the input.
(online) The node executes the native code as-is, without compiling anything.

Did I get this right or am I horribly wrong?

pepyakin · September 1, 2023, 12:30pm

@koute Almost!

Actually, the compiler blob is not needed, only the hash. A receipt you get from risc0 is self-sufficient to be verified and it contains, roughly speaking, the input/output data^[1], the ELF image ID of the program proved and the cryptographic seal. You only need the verifier program that verifies the receipt. The verification process will verify that the seal corresponds to the claimed input/output data (which is in our case is the untrusted program, and perhaps the cycle count).

@Cyrill

But proofing that the compiled code is “secure” (in the sense of as secure as you’d properly sandbox it), might be hard to impossible.

this is not quite true. Let me explain.

There is process of re-hydration, i.e. getting the serialized artefact Engine::precompile_module and then re-hydrating it via Module::deserialize. This process should be safe as long as Configs (or as I refer to them as compilation flags in OP) used to create the Engine are the same. It better be safe, because we are relying on it in the parachain validation functions.

Assuming the same version of wasmtime/cranelift, assuming the same flags and assuming the process is deterministic (although not necessary), then there is no difference whether you called precompile_module on your machine or got the result with the SNARK proof^[2].

IOW, If you trust that the re-hydration then with the proposed solution it should stay secure.

It doesn’t matter, because you can transform the statement f(X) == Y to f(X) - Y == 0. ↩︎
Well, kind of, because you take on additional assumption of the soundness of the proving system. Modulo bugs, there are certain additional assumptions, that should hold true. But that’s the whole value proposition of the SNARK that they are convincing arguments. This is in the same way as you trust your hash function is collision resistant. ↩︎

koute · September 1, 2023, 1:12pm

Hmm… okay, so I could see a few practical question/issues here (which, to be fair, may be solvable; I don’t expect answers to all of them, I’m just writing out whatever comes to mind), in no particular order:

How fast is the verification? Assuming a non-malicious, average program - how long would the verification take compared to, say, compiling that program with wasmtime on bare metal?
For less compute intensive smart contracts it’s conceivable that their compilation time would be the part that dominates, not their execution performance. Especially since currently e.g. guaranteed O(n) compilation seems to only cut down the execution performance in half (compared to fully optimized wasmtime), and we might improve this even further. So slow_comp_time + fast_exec_time + verify_time <= fast_comp_time + slow_exec_time would have to be true to be a net-win performance-wise (assuming approach #1).
I’d be a little iffy putting the native code on chain (with approach #2), security-wise. If someone finds a security hole in an already compiled program that’d be a disaster. So I think we’d definitely still need strong sandboxing, stronger than what wasmtime has out-of-box.
I’m not entirely convinced that e.g. it’d be worth doing this just to run wasmtime in it, performance-wise. However, if we could run a “real” compiler and get true bare metal performance that might change things. How feasible would be to run e.g. rustc inside risc0? Considering how slow Rust compilation can be, could we get a usable program out of it before the heat death of the universe? (:
I probably wouldn’t want to depend on a specific version of wasmtime and maintain it forever, which as far as I can see would be required for approach #1? How would we work around this?

pepyakin · September 1, 2023, 1:57pm

Those are great questions!

How fast is the verification? Assuming a non-malicious, average program - how long would the verification take compared to, say, compiling that program with wasmtime on bare metal?

The verification is extremely fast. It’s execution time is proportional to the statement size (the input program) and could be expected on the order of milliseconds.

For less compute intensive smart contracts it’s conceivable that their compilation time would be the part that dominates, not their execution performance.

Yes! And that requires changing the approach a bit: specifically, we would need to rely more on caching. I touched on that here. If caching is not possible, then a linear compiler would be a better solution. Perhaps, a combination of a linear and SNARK proved optimizing compiler could be useful: e.g. by default linear time but possibly users could submit proofs for some commonly used libraries to turbocharge them.

I’d be a little iffy putting the native code on chain (with approach #2), security-wise. If someone finds a security hole in an already compiled program that’d be a disaster. So I think we’d definitely still need strong sandboxing, stronger than what wasmtime has out-of-box.

Agreed.

I probably wouldn’t want to depend on a specific version of wasmtime and maintain it forever, which as far as I can see would be required for approach #1? How would we work around this?

It’s true but actually for the approach #2. Recall the re-hydration example from here. You need to codgen be compatible with the runtime. Note that there is no need for recompilation in the event of gas repricing, since we could patch it up in the binary before the execution.

For approach #1, I had in mind that the population of nodes will consist of e.g. 2 active versions of wasmtime (new and upgrading). Each time a new version of wasmtime is deployed a new requirement for proving is imposed. When a user deploys a contract they will need to provide proofs for each combination of target arch and active version.

Could it be that suddenly after an upgrade to a new version of wasmtime, one of the contracts deployed became a JIT bomb? I don’t think so, because, JIT bombs arise from deliberate mining, not accidential hit. However, it would be responsible to dry-run wasmtime on corpus of all deployed contracts before adding a new active wasmtime version, to defend against any regressions in wasmtime. Also, to add on another layer of safety, probably the cycle number should be limited or be priced exponentially.

I’m not entirely convinced that e.g. it’d be worth doing this just to run wasmtime in it, performance-wise. However, if we could run a “real” compiler and get true bare metal performance that might change things. How feasible would be to run e.g. rustc inside risc0? Considering how slow Rust compilation can be, could we get a usable program out of it before the heat death of the universe? (:

Subject to research. I suspect it might be possible.

Maybe we can estimate. Risc0/bonsai right now can prove around at a speed around ≈1-2 million of RISC-V instructions per second (caveats apply), and they are certain they could improve by at least one order of magnitude mid-term.

Given that, and my assumption about the constant factor instruction count difference, we may be able to napkin estimate how much time it would take.

Now, in one of their posts they claimed that they proved 3.4B of cycles for $21, which makes it roughly 6$ per billion of instructions at that speed. You can add more hardware and make it faster, or remove and make it cheaper.

Now, the question is about the runtime. We need to enforce that the resulting code:

is deterministic.
performs only allowed IO.
performs metering correctly (if applicable).

On the bright side:

Because the runtime is probably small, it could solidify quickly, and won’t need to be updated.
At least it will make developers literally pay for the dependencies they used, lol.

But yeah, this is also a good direction to explore.

dwulf69 · September 6, 2023, 3:46pm

Got inspired by your post @pepyakin and wrote a article on it.

From Theory to Action: Trustless WASM Compilation with SNARKs & Implementing with Astar, Ink!, and Substrate

fewwwww · October 24, 2023, 8:30am

You can also check out zkWasm by Delphinus Lab. It’s a SNARK-based wasm vm, should be more “wasm-native” compared to risc0.

david9991 · February 19, 2024, 2:17am

zkWASM is coming to Polygon: https://polygon.technology/blog/polygon-labs-and-near-foundation-collaborate-to-build-a-zkwasm-prover-as-a-component-for-polygon-cdk

Topic		Replies	Views
Exploring alternatives to WASM for smart contracts Tech Talk wasm , smart-contracts	67	7399	August 30, 2023
eBPF contracts hackathon Tech Talk wasm , smart-contracts	12	3329	June 8, 2023
We should create an RFC with the Bytecode Alliance to add no_std support Tech Talk wasm	0	400	September 15, 2022
Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!) Tech Talk wasm , smart-contracts	95	13772	July 3, 2024
Contracts Update: Solidity on PolkaVM Tech Talk smart-contracts	6	1459	May 19, 2025

Trustless wasm compilation with SNARKS?

Approach #1

Approach #2

Related topics