Background
A few months ago @Alex and @pepyakin entertained the idea of using eBPF for our smart contracts instead of WASM, the details of which are available in the following thread:
I won’t repeat the arguments already made in that thread (read it if you’re interested!). Suffice to say I do agree with the motivations presented within as to why we might consider something other than WASM for smart contracts.
So recently I got approached by @Alex regarding this and we’ve discussed it at length. Long story short, one of the ideas which were floated around was that maybe we could use RISC-V as the instruction set architecture of choice for our smart contract platform. I got intrigued by the idea, so I decided to run a little experiment to see how viable that would be in practice, and in this post I’d like to present to you my experience doing just that.
Alternatives to WASM, or what do we actually need?
First, the requirements. What do we actually want from our ISA? Here’s a non-exhaustive laundry list of what I think ideally we’d like to have, in no particular order:
- Simple.
- Easy to secure.
- Easy to write a singlepass JIT compiler for (which still generates fast code).
- Fast to execute.
- Fast to JIT-compile (with the assumption that the generated code is fast).
- Compact.
- Portable.
- Well defined, standardized and has an existing ecosystem.
- Already supported by rustc and LLVM.
- Guaranteed to be supported by rustc and LLVM into the future.
- Has enough features to compile existing programs without much trouble.
- Doesn’t have features (or has them but they’re optional) which we don’t need.
So considering the requirements the way I see it we have the following potential options available to us:
- WebAssembly (which is what we’re currently using)
- eBPF (used by Solana)
- RISC-V (what I’m proposing here)
- Our own custom ISA
Let’s quickly go over each and see which of the requirements each of those do not satisfy, either fully or partially.
WebAssembly
WebAssembly is:
- Not simple. While it is laughably simple compared to something like x86 it’s definitely not simple enough considering what we need. (Again, see the pepyakin’s post for some of the details; link’s at the beginning of this post.)
- Not easy to write a singlepass JIT compiler for (which still generates fast code). WASM is stack-based, and hardware ISAs are register based, so there’s a disconnect here. You need a register allocator, and writing a good register allocator is a really hard problem.
- Not fast to JIT-compile. See previous point.
wasmtime
is notorious for JIT-compiling relatively slowly; something like Wasmer Singlepass makes it better, but it’s still not ideal. - The MVP variant of WebAssembly (the one without a lot of extra features we don’t need) could arguably not be supported forever by LLVM as web browsers catch up and no one actually uses the MVP target anymore. Even if it is still technically supported it might accumulate bugs if no one’s going to actually use it.
- Has features we don’t need (e.g. floating point support).
eBPF
eBPF is:
- Out-of-box can’t compile arbitrary programs, although Solana has a fork of rustc and LLVM where they’ve made it work.
- Upstream rustc and LLVM only supports a limited form of eBPF; we’d need something like Solana’s version of it, which is not upstreamed.
- Not guaranteed to be supported by rustc and LLVM into the future, assuming the Solana-like variant of eBPF.
- Might not be fast-enough to execute even when JITed (see my experience with it later in this post).
- Can be too simple and actually lack the features we’d want; for example, internal function calls and host function calls use the same instruction which complicates things.
- Registers are all 64-bit which we don’t really need and e.g. makes
wasmtime
-like sandboxing of memory not possible.
Our own custom ISA
This doesn’t currently exist, but if it did I think it’d be safe to assume that we could fulfill all of the functional requirements; so what would be left are things like:
- Not supported by rustc and LLVM. We’d have to write our own LLVM backend and get it upstreamed. I can’t overstate how huge is the amount of work this would require! Solana maintains their own fork of rustc and LLVM and they do want to upstream it, but several years later it still hasn’t happened.
- Not standardized and has no existing ecosystem. We’d be the only users of this ISA (at least initially).
- Not guaranteed to be supported by rustc and LLVM into the future. Us being the only users we’d have to maintain the LLVM backend ourselves indefinitely, even if we’d upstream it.
This option would be ideal from a functional perspective, but would require a truly massive amount of work and effort. I’d really rather avoid it if we can help it.
RISC-V
None…? From a cursory look it seems like RISC-V might actually tick all of the boxes we’d need! But does it really? That’s what we want to find out here!
So what is RISC-V?
(If you already know feel free to skip this section.)
RISC-V is an open and free instruction set architecture developed and maintained by the RISC-V Foundation. In a nutshell it has the following killer features which (in combination) make it unique as far as ISAs go:
- It’s free and open-source. Anyone can use it without any license fees.
- It’s very simple and orthogonal. At its bare minimum all of the required userspace RISC-V instructions fit on a single page.
- It’s modular and extensible. Parts of the ISA (e.g. multiplication, floating point support, atomics, SIMD, etc.) are entirely optional and can be enabled/disabled at will.
- It’s efficient and scalable. It is designed to scale from tiny low-power microcontrollers up to supercomputers with hundreds of CPUs.
- It is already well supported in rustc and LLVM and is gradually picking up steam in the industry.
The unique properties of RISC-V make it possible to tailor it somewhat to our very specific requirements while simultaneously we can still benefit from all of the work that’s being done around it in the ecosystem.
Take a look at Wikipedia if you’re interested in more details. (I’d put a link but Polkadot forum barfs an error if there are more than 2 links in a post.)
The experiment
So what we’d like to know here is simple: would RISC-V actually be a good fit for smart contracts? And can it actually fulfill all of our requirements, not just on paper but also in practice? This is what I’ve decided to try and find out.
As the first step we need to find a benchmark on which we could evaluate RISC-V (and other alternatives)'s merits and performance. Ideally something that is at least somewhat smart-contract like in the type of work it does, but scaled up to the very extreme of what we’d reasonably run. And it just so happens I already had a good candidate that would fit the bill!
You see, 7 years ago I wrote a cycle-accurate NES emulator in Rust called Pinky. So I thought, hey, lets quickly make that bad boy no_std and use it as a benchmark! On first glance this might sound ridiculous - obviously no one’s going to be deranged enough to put an NES emulator on chain into a smart contract and play Super Mario Brothers with it. But I still think it’s a reasonably good pick for the following reasons:
- It does actual useful (well, if you define “playing games” as useful) work instead of being a microbenchmark.
- It’s relatively big, most likely on the upper end of what anyone would even attempt to compile into a smart contract, so it illustrates a sort of a worst case scenario.
- The type of work it does is - in a way - similar to what a smart contract would do: it does almost no floating point math, it’s not memory bound and doesn’t shuffle a lot of data around, and it’s essentially mostly a bunch of logic and ifs and jumps.
- The performance is easily interpretable: how close to 60 FPS can we get?
So now that we have our benchmark - a “smart contract” which generates frames of Super Mario Brothers (or any other NES game) - now’s the time to try to run it. So to get myself more familiar with RISC-V on a practical level I wrote a RISC-V interpreter. This took me less than one day.
Now, let me interject here for a bit and reiterate what just happened. I wrote an interpreter in less a single day, completely from scratch, that can run real software compiled into a real ISA generated by a real compiler. This is a big deal and is a testament to RISC-V’s simplicity! If you’d try to do that for any other real ISA, say, x86 for example, then you’d probably spend the whole week/month just trying to decode the instructions, never mind writing an actually fully functional interpreter in a day! Even the MOS 6502 interpreter that I wrote for Pinky (and 6502 is a CPU from the 70s, almost 50 year old!) is a lot more complex than my RISC-V interpreter!
So I have an interpreter, what now? I could compare it to the other alternatives as-is, but that’s a little bit boring. The fact that I wrote it in less than a day was promising, so I decided to take it a step further: let’s write an actual JIT compiler for it!
So that’s what I did. It took me two days of work to write a RISC-V JIT compiler. Completely from scratch. And did I mention it can technically take any arbitrary program which rustc generates and run it? But enough rambling; let’s look at the numbers! (Lower times are better.)
- wasmi: 108ms/frame (~9.2 FPS)
- wasmer singlepass: 10.8ms/frame (~92 FPS)
- wasmer cranelift: 4.8ms/frame (~208 FPS)
- wasmtime: 5.3ms/frame (~188 FPS)
- solana_rbpf (interpreted): 6930ms/frame (~0.14 FPS)
- solana_rbpf (JIT): ~625ms/frame (~1.6 FPS)
- My RISC-V interpreter: ~800ms/frame (1.25 FPS)
- My RISC-V JIT: ~25ms/frame (~40 FPS)
These results are… very interesting. My RISC-V JIT, which is a simple single pass recompiler with very little in way of optimizations, could probably fit within 1k lines of code and was written in two days, it generates code that is only 2.5x slower than Wasmer Singlepass, which in total has over 150k lines of code (not all of those are relevant, but still) and up until this point most likely had man-years worth of effort invested into it. Saying that these results are promising would be a gross understatement!
Another interesting result here is Solana’s eBPF JIT which really shocked me. It ended up being almost as slow as my RISC-V interpreter, and over six times slower than wasmi (also an interpreter)! Something went really wrong here. (And before you ask, I did disable metering in Solana’s JIT to make things fair.) This could be either because the JIT itself doesn’t generate good code, or possibly because LLVM doesn’t generate good code for eBPF, or both. What we need to remember here is that eBPF (and LLVM’s eBPF backend) was originally never meant to compile something like this, and it’s only due to Solana’s LLVM fork that it can do it in the first place. So it is entirely possible that their eBPF backend just simply generates bad code, which would translate to equally bad code after JIT compiling it. Nevertheless, this result makes using eBPF as the ISA of choice for smart contracts even more unappealing.
Code size
I’ve also done a comparison of the code size. All of the builds here are with lto = true
, strip = true
and codegen-units = 1
in Cargo.toml
:
- eBPF (-O3): 150k
- eBPF (-Os): 140k
- eBPF (-Oz): 117k
- WASM (-O3): 80k
- WASM (-Os): 73k
- WASM (-Oz): 59k
- WASM (-O3) + wasm-opt: 74k
- WASM (-Os) + wasm-opt: 67k
- WASM (-Oz) + wasm-opt: 54k
- RISC-V (-O3): 92k
- RISC-V (-Os): 83k
- RISC-V (-Oz): 71k
- RISC-V + C (-O3): 73k
- RISC-V + C (-Os): 66k
- RISC-V + C (-Oz): 57k
The RISC-V + C
is RISC-V with the compressed instructions extension which adds alternative 2-byte encodings (where normally they use 4 bytes) for the most commonly used instructions.
Using the C extension makes RISC-V competitive with WASM, but considering RISC-V’s simplicity I think we could do better! What I mean by this is, we don’t actually need to store raw RISC-V bytecode; we could have our own custom encoding of it and store that custom encoded version of it. You can think of this as a simple compression scheme for RISC-V bytecode. I haven’t explored this, but it would most likely allow us to cut down the size even more at essentially no cost.
RISC-V: the good parts
So what were the good parts of RISC-V based on my experience writing my JIT recompiler?
- Really simple. There are not many instructions - only 55 if I counted them right (if I didn’t - sorry, counting is hard!), but even this is a little misleading. The instructions can be grouped into roughly ~11 categories, and handling of all of the instructions in each category is essentially almost the same. For example, AND and XOR are encoded very similarly and have the same semantics, just doing a different bitwise operation.
- The most basic RISC-V target with the M extension (for multiplication/division) is the bare minimum of what an ISA should have, and is pretty much exactly what we want functionality-wise. Floating point support, atomics, SIMD, etc. are in their separate extensions which we can completely ignore.
- Has a dedicated instruction for making syscalls/hostcalls. (This is worth mentioning because eBPF doesn’t have one.)
- Is 32-bit (well, it has both 32-bit and 64-bit targets) so we could use the same trick
wasmtime
uses to sandbox its memory accesses through clever use of virtual memory. - Is really easy to decode; instructions’ encoding is mostly sane (although some of the immediate encodings are a little crazy, but it’s nothing too bad) and instructions are always constant length.
- Could most likely support a Harvard-style machine like WASM (which is nice for smart contracts as we wouldn’t have to copy the RISC-V code itself to memory and make it accessible to the smart contract; mentioning this because, again, eBPF doesn’t have this from what I can see)
- The support for RISC-V in rustc seems very good, and is only going to get better as RISC-V gains adoption.
RISC-V: the bad parts
The biggest wart of RISC-V in the context of writing a JIT is that it has 32 general purpose registers (well, actually, only 31; one of those is the zero register which doesn’t really count), which does complicate things.
For your reference AMD64 (sometimes also called x86_64) which most of us are running has only 16 registers. So how do you map 32 registers into only 16? Well, you don’t. You need to spill things into memory. Empirically I’ve found that as long as you pin the most frequently used registers to physical registers and only spill those which are rarely used then that should not affect performance too much. Initially I spilled every register into memory on every access, and as I’ve started to gradually pin more and more RISC-V registers to actual AMD64 registers the performance improved, but only up to a point, resulting in diminishing returns.
There is a way around this though. RISC-V officially defines a subtarget called RV32E which only uses 16 registers, which would be almost perfect for us! Unfortunately this isn’t currently implemented in LLVM, but there’s a patch in progress to add it. In the worst case we could help out and get it over the finish line (either directly or by funding it). There’s also apparently support for it in GCC already, so using rustc’s GCC backend could also be an option. Writing a postprocessor which would convert full RISC-V code into RV32E and use that until LLVM supports it natively is also a possibility. We could solve this.
Future work
So what are the next steps? What still needs to be done? More experiments!
- Follow up on RV32E: take the in-progress LLVM patch, make it work with rustc, and see how RV32E affects the generated code and how easy it is to JIT. Does the code get larger? By how much? Does it get slower? How much simpler the JIT gets if we can guarantee that only 16 registers are used?
- See whether it’d be feasible to write a postprocessor that’d take full blown RISC-V code and transform it into RV32E. Is that easy to do? And how would that affect the performance of the resulting code?
- Experimentally integrate it into Substrate and Ink!, and run some actual smart contracts on-chain.
- Investigate a more compact encoding of RISC-V instructions and see how small we can make it.
Conclusions
After looking into eBPF and RISC-V in more detail and experimenting with them my conclusions are as follows:
- I don’t think eBPF is a good fit. Yes, it’s simple, but it’s too simple, and it’s just too problematic in practice.
- RISC-V exceeded my expectations. We should seriously consider it and investigate further.
- I wouldn’t go as far as saying “we should switch to RISC-V” yet, but I’m close.
- Considering RISC-V’s simplicity I could probably write a secure, production-ready JIT for it in a few months, possibly weeks.