Compilation of untrusted code is hard.
This is because optimizing compilers leverage algorithms that may consume amounts of execution time or memory way beyond the size of the program. An adversary may exploit this and using a small program make the compiler take minutes of running time and gigabytes of memory. It makes it difficult to charge the user for the resources taken.
There are two strategies we employ to sidestep this problem:
- Linear time compilation. Simple, we just use a compiler with a known fixed runtime for a single instruction of input code. Typically, the code produced by those compilers are not of the highest quality. We are using that approach for execution untrusted code, i.e. building contracts (See e.g. Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!) - #4 by koute exploratory work in this direction).
- Pre-checking. This is where the validators try to compile the code in a safe way: in a separate process, with reasonable memory and time limits. In case the compilation process doesn’t finish because of panics or exhausting the resources, the validator will vote NAY on that code. The code can still be accepted if supermajority votes AYE for the code. In this case the failing validators have to swallow it and try their best to compile the code. This may negatively impact minority platforms.
I had ideas of employing fault proofs to combat this problem. Specifically, shove the compiler we are using, wasmtime, into the a fault provable virtual machine e.g. wasm., allow somebody to assert that the compilation finishes in a certain number of steps and with a given amount of memory. In case it doesn’t compile, another actor would be able to create a fault proof. Specifically, it would show that after N steps the compilation either failed, ran out of memory or reached the maximum number of steps. You could instantiate this process multiple times to cover different target platforms (avoiding the minority platforms problem), several versions of wasmtime, etc.
However, this solution never gained any significant mindshare, because it has some important flaws:
- Latency. Fault proofs imply existence of a challenge period. A time period where actors should be able to challenge the assertion(s). In folklore, and indeed in the most deployments (rollups) we see this number set to 7 days.
- Fault proofs have 1-of-N security model. IOW, only a single actor is required to create a challenge. It’s a common and inherent problem of fault proof protocols: “It’s somebody elses job”. Some protocols added a mechanisms of deliberate fault injection, but IMO, they suck and also add complexity. Speaking of which:
- Systemic complexity. The fault proof machinery is complex and it’s oozing in every direction. See Vitalik’s post.
I haven’t paid attention to anything related to SNARK integrity proofs (“ZK proofs”), but it seems like now is the time to change this mindset. Recently, SNARK proofs are graduating from science fiction to practical reality.
The first project to show this is risc0 by allowing a convenient way to write statements that produce SNARK proofs. Specifically, it proofs execution of RISC-V VM producing a cryptographic proof. Such a proof is very small and easy to verify. The verifier code is not terribly complex.
Using SNARK proofs we could have an elegant (as far as compiling untrusted code on blockchain can be) solution to our problems. Here is a strawman proposal:
Wasmtime/cranelift is compiled to RISC-V as part as a proved program. We run it through risc0 VM comping the untrusted input code into the target architecture tracking how much memory and cycles it took. As a result, we get a cryptographic proof that the input does indeed compile within a certain number of bytes of RAM and cycles using a specific version of wasmtime/cranelift for a specific target architecture.
This proof is a proxy evidence of how much it would take to compile the code on a platform that is different to RISC-V. The executed number of instructions or used memory bytes should differ by a constant factor. What can lead to big differences is the choice of target architecture.
But pep, I hear you, aren’t SNARK provers extremely expensive? Well, kind of. To check whether this is workable I crafted a proof-of-concept: I took the cranelift-codegen and cranelift-reader. Generated a CLIF file from an ink erc20.wasm and proved it using bonsai API. It took several minutes. I don’t think it would cost too much and I think proving code locally is also on the table.
Now this gives us the following advantages:
- Potentially, allows using the optimizing compiler for contracts!
- Solves the minority platform problem.
- If you feel particularly cocky , you may not even need a compiler on board of the blockchain node. Instead, modify the proved program to output the binary code (e.g. Module::serialize) for the target platform. Now, you have to make sure that all flags that can affect codegen match, and, you will likely have to provide multiple variations of code, but it’s interesting design space to explore.
Now, I am not proposing we are adopting this approach right now. There are the following problems that I see now:
- risc0 is not mature. We need to wait until it’s audited and battle tested.
- we need to adapt wasmtime. In my PoC I had to perform a couple of changes to make it work. But I used raw CLIF, but we need to ensure that the SAME clif is passed into the cranelift that would’ve been generated by wasmtime, otherwise, the equivalence is lost. Also, we need to make sure that ensuring matching the compilation flags is possible and fool proof.
- obviously, rigoriously test the assumptions I made in this post.
As closing words, I think SNARKs open up an entire new design space. We had many ideas that dependended on the checking some properties of the code, but were rejected because they were too resource intensive. Now, we are about to have practically unlimited compute capability: we can precompute all kinds of stuff, run various analysis. Maybe things like: oh we can know the maximum stack size of each function (or in some cases entire program) and can come up with a better stack limiting mechanism. I haven’t given it too much of thought, but I feel there might be a treasure trove, and therefore I encourage to keep this tool in mind.
Imagine most of the validators are x86-64. Some code compiles good on this platform. But, say on aarch64, there is a bug/performance quirk, that makes it compile very long. Because most of the validators manage to compile in time, the minority will be ignored and potentially this situation will lead to slashing the minority platform validators. ↩︎
Each target arch is implemented by its own backend. But even in case the same-ish architecture, e.g. RV32I and RV32E which primarily differ in the number of available registers, the execution paths could differ drastically. ↩︎