Exploring alternatives to WASM for smart contracts

FYI, ZK proofs over arbitrary rust code, here’s an example of RISC-V implementation. Might be great for Substrate

4 Likes

Continuing with the experiment I’ve looked into ways of cutting down the size of the serialized RISC-V bytecode and I’ve come up with a relatively simple bit-efficient serialization scheme to see how far an uncompressed stream can go down in size. Here are the results: (sorted according to size, best results are in bold)

(Note: to keep things fair those are the sizes for the code portion of the payloads, without headers, padding, metadata, non-code data, etc.)

size size (zstd -3) size (zstd -23)
O3 ebpf 135200 34757 28992
Os ebpf 125128 33385 27442
Oz ebpf 101608 29735 25138
O3 rv32e 74588 15183 12348
Os rv32e 68644 14017 11278
O3 wasm 68114 21054 18562
O3 wasm_opt 65042 20865 18498
O3 rv32e_c 62092 32798 30138
Os wasm 61164 19865 17564
Os wasm_opt 58414 19700 17491
Oz rv32e 58380 30262 27440
Os rv32e_c 57220 31077 28648
Oz rv32e_c 49872 29173 27019
O3 rv32e_compact 48182 43062 40947
Oz wasm 46550 18418 16421
Oz wasm_opt 44906 18126 16240
Os rv32e_compact 43088 38999 37293
Oz rv32e_compact 34889 32539 31090

RISC-V wins on all counts. Uncompressed my serialization scheme (the rv32e_compact) wins at ~34KB vs the next best WASM with wasm-opt at ~44KB. When compressed with zstd the vanilla RISC-V encoding wins at ~11KB vs the next best WASM with wasm-opt at ~16KB.

So we have two options to minimize the size of the binaries: either keep uncompressed blobs compiled with Oz and encoded with my custom scheme, or keep zstd compressed vanilla-encoded blobs compiled with O3 or Os. Either way it’s going to be an improvement over WASM.

4 Likes

One thing that doesn’t seem to be mentioned so far is deterministic execution metering - this is needed in contracts so presumably it’s available.

If we could make up some of that performance gap with an effective AOT trans-compiler which supported deterministic metering, then it could be a game-changer for Frame.

4 Likes

Yes, that’s the plan. The idea is that we want 100% deterministic execution, and this includes not only instruction metering, but also things like e.g. native stack usage (which is not deterministic on e.g. wasmtime so we need to hack around it to make sure PVFs don’t overflow it).

I haven’t yet done any experiments with execution metering as adding that in an efficient manner is a little bit more involved, but the general idea I had was to most likely just use @pepyakin idea of bumping a static gas counter at the boundaries of instruction blocks and asynchronously checking it whether it’s still under the expected limit.

This would still deterministically tell us how much work was done once the code finishes running, and it will deterministically always fail if it exceeds the amount of work it’s allowed to do, but it saves on the overhead of synchronously checking the gas meter at the cost of maybe executing a little more code than necessary (which in practice we don’t care about as long as the effects of the execution are transactional). Together with @pepyakin we’ve already done some experiments with this on wasmtime and validated that the idea should work in principle.

In theory if we wanted then with some light modifications to the FRAME support libraries (to remove the assumptions that it’s always WASM, to support a new host function calling convention, etc.) and an extra executor backend in Substrate we could indeed use this as an alternative to WASM not only for smart contracts but also for runtimes/PVFs/SPREEs/etc., essentially requiring only a recompilation of a runtime blob for a different target.

I’m not yet sure if this would be a good idea. At this point for smart contracts I’m quite confident that this is most likely a good idea as it will solve a lot of our problems with WASM, speed them up, have blazingly fast compile times, low memory footprint, be relatively easy to secure, and be simple enough so that it’s easy to maintain. Essentially giving us a state-of-art smart contract VM.

But for things which are not smart contracts the pendulum swings somewhat in the direction of maximizing execution speed at the cost of extra compilation time and complexity. How much of a performance hit would we be willing to accept? How much extra complexity would we be willing to entertain to increase the performance compared to the smart contracts sweet spot baseline?

For smart contracts I think this is attractive precisely because it’s in this sweet spot of the speed/complexity/effort trifecta where it just makes sense. For other uses however it’s still an open question. (Which we can explore. Now that I’m mostly done experimenting I just need to get it more production ready for smart contracts first, and then we can play around with it.)

1 Like

Yes - I see the tradeoff. Is there any existing work on building a high-performance optimising trans-compiler for RISC-V?

If performance is just not at a level where it can be replaced wholesale for all PVFs maybe we can support PVFs in either format; Wasm (for raw speed) or RISC-V (for the avoidance of writing benchmark code). If this were the case, I can imagine that as optimisations slowly arrive into the PVF trans-compiler things swing towards RISC-V, but we avoid forcing any performance drop onto anyone.

3 Likes

Well, there are quite a few RISC-V VMs out there, but AFAIK from what I’ve surveyed there’s nothing out there that would match our specific requirements. For example:

  1. Dynamic recompilation vs static recompilation. Most VMs which produce native code do it dynamically because they’re designed to support running arbitrary software that just can’t be recompiled once ahead-of-time. We don’t need nor want this, as it’s just extra unnecessary complexity. (And in fact, doing it in an AOT fashion - if the use case allows for it, and ours does - should be easier to optimize because you have access to the whole program graph.)

  2. Support for full RISC-V ISA. Most VMs support the whole kitchen-sink offered by RISC-V. In our case we only really need a cut down subset of the user-level instruction set.

  3. Doesn’t actually emit native code and/or is not high performance. There is for example this other blockchain which uses a RISC-V based VM, but they’re running it interpreted, which is not going to be as fast as even a naive recompiler. (And, paradoxically, writing a high performance interpreter is actually more complex/more effort than a naive recompiler.)

That said, there is plenty of prior art for making high performance recompilers - that’s what wasmtime is!

Technically we could maybe translate the RISC-V code back into Cranelift IR (where Cranelift is the optimizing compiler used by wasmtime) and generate native code using that, and maybe that could allow us to match wasmtime’s performance, essentially piggybacking on all of the performance work that’s done for wasmtime, just with a different frontend. So we could have two backends - one would be a simple O(n) one used for smart contracts, and the other would use Cranelift to generate faster code at the cost of extra compilation time and complexity. (Assuming it is feasible, which is still an open question which we can explore.)

For all PVFs - I don’t think so, but for some PVFs - indeed, it’s possible that the performance as-is would be enough. It’s not slow per-se (especially for code that’s not compute-heavy), it’s just not as fast as wasmtime.

1 Like

Presumably Cranelift IR itself isn’t a consideration for expressing a PVF in because it is not a stable target?

How terrible would it be to fork an existing project and strip it down? Is there an AOT trans-compiler which maybe supports everything including the kitchen sink which we can strip?

Indeed AFAIK it’s not stable, and it’s also not really designed (nor supported as such) to be a long-term standalone IR, is relatively high level (so the O(n) compilation which we need for smart contracts becomes hard), and in general has a lot of superfluous features we don’t need.

There’s also the practical issue of not actually having a compiler that directly targets it. If we don’t want to maintain own own rustc/LLVM backend (and honestly, we don’t) then we’re limited by what rustc supports. (Yes, technically there is a work-in-progress Cranelift backend for rustc, but it isn’t really meant to emit Cranelift IR as a final artifact.)

As a long-term IR that we can support forever I think that RISC-V is essentially the best option out there, precisely because it is small, stable, well supported, and has an ecosystem that won’t suddenly go in a direction we don’t want, mostly because many other people also care about the subset of RISC-V that we’re targeting. (Unlike WASM, where new features are being rolled into the baseline spec so we will be forced to deal with them somehow whether we want it or not.)

If the only issue would be that we’d just need to strip out some features then that possibly wouldn’t be too bad. (Although at some point the effort to modify it could conceivably surpass the effort to rewrite it from scratch.) But AFAIK fundamentally even if we ignore everything besides performance then there might not even actually exist a RISC-V VM that is higher performance than what I wrote. I surveyed some existing RISC-V VMs and they tend to not do much higher level optimizations on the code they’re generating (which is what you’d most likely need to close the performance gap).

All in all, once I get this to a reasonable state for smart contracts I will look into whether we can close the performance gap as a general purpose runtime VM. Currently it’s not entirely clear where the extra overhead comes from exactly so it’s hard to conclude anything, but if I e.g. compile all of our runtime benchmarks to RISC-V and compare their performance with how they run on WASM it might shine some light on where exactly the gap is and how to close it.

3 Likes

if I e.g. compile all of our runtime benchmarks to RISC-V and compare their performance with how they run on WASM it might shine some light on where exactly the gap is and how to close it.

Sounds sensible.

1 Like

why would they push WASM for cloud environment, when the original purpose was to make client side web better? So this no longer applies in WASM world?

People are using Wasm for all kind of applications where you want to sandbox code simply because there is good production quality software to do that (wasmtime for example). However, we have much stricter requirements than just sandboxing. Mainly determinism. Those cloud services can just spin up a VM and kill it if it takes too much resources for a certain program. May it be compilation or execution. We can’t do that.

But wouldn’t we lose our biggest our biggest incentive to use RISC-V by doing so? Meaning the dead simple O(n) compilation to native code? IMHO we either proof that we can write a O(n) compiler that produces fast enough code or we could just stick with Wasm where optimizing (but unbounded) compilers do already exist. Given, it would solve our stack overflow determinism problem. But would it be worth it just for that?

Is the assumption here that execution metering that is baked into the recompiler is necessarily faster for RISC-V than for Wasm? If yes: I am not sure if this is the case. If I remember correctly the overhead of wasmtime fuel metering was around 5% on our FRAME benchmarks. It was worse for contracts (running wasmi under wasmtime) but this was countered by @pepyakin async checking method. If we only want metering we can probably make it work with Wasm, too.

2 Likes

Yes. I was just explaining the possibility of it, but I really wouldn’t want to do it unless absolutely necessary. It is within the realm of possibility that we could match wasmtime’s performance without resorting to that, I still have some other ideas to try to speed it up. Ideas which I do intend to test out. (I just can’t test them out yet without refactoring the VM into a more production-ready state, which is currently in progress.)

Well, the O(n) compilation and preventing native stack overflow aren’t the only thing we’d get; there would be other (I guess more minor?) advantages too. The baseline instruction set wouldn’t expand/shift under us with the compiler emitting new instructions that we’d have to handle, we’d get better security sandboxing (more on that in a later post once I’m actually finished implementing this part), it’d be slightly easier to meter (mostly because we wouldn’t have to add @pepyakin async metering to wasmtime), etc.

But, again, I want to reiterate that I don’t like the idea of using Cranelift online, because then as you’ve said we’d lose O(n) compilation, we’d lose consistent performance, sandboxing gets harder (because we’d have to sandbox the recompilation part, while with O(n) we don’t have to), complexity increases exponentially, and the VM will be harder to reproduce based on the specs by a third party, etc.

(Another thing I’d like to achieve with this RISC-V based VM is to make it simple enough so that we can write it into the Polkadot spec, and make it feasible for anyone to reproduce it from scratch completely with 3rd party code and get comparable performance. Using Cranelift would make the “get comparable performance” part difficult.)

No. But it’d be easier to actually implement it.

3 Likes

Agreed. And we might even face resistance upstreaming the change to wasmtime as we are not exactly their target audience. A lot of really desirable properties for us are following from just the simplicity of RISC-V.

1 Like

So you’re saying that if you want 64 bit arithmetic, then you what you really want is SIMD, but since we don’t have that you can’t have 64 bit arithmetic either. I don’t think that follows; this is an example of perfection being the enemy of the good.

I am concerned about 32 bit. Wasm does support 64 bit arithmetic/memory access natively, and rv32e does not.

  • Balance is u128 and used heavily in smart contracts. Balance arithmetic will be many more instructions on rv32e than on wasm.
  • A major issue in recompiling solidity is supporting bigint arithmetic on e.g. uint256. If all we have is 32 bit arithmetic then bigint cost will be huge compared to wasm
  • Smart contracts wanting to do crypto will have performance penalties (as burdges pointed out)

This is a step backwards compared to wasm. rv32e was designed for tiny devices like usb peripheral microcontrollers, not smart contracts.

Note that clang/llvm does support rv64e even though it is not standardized. A 64 bit address space not a bad thing either.

No, I’m saying that if you care about the absolute best performance for your crypto/numeric/whatever code you actually want SIMD. In general the performance improvement from using SIMD tends to be a lot bigger than just switching from 32-bit to 64-bit arithmetic.

(Nevertheless, we’re currently running our smart contracts in an interpreter running under another virtual machine; the equivalent code running natively is going to be significantly faster anyway, even if it’s going to use only 32-bit arithmetic.)

I disagree. WASM too was not designed for smart contracts. (:

The nice thing about RISC-V is that it’s designed to be flexible and extensible, which means that if necessary we can just tweak it to be whatever we want it to be (within some reasonable limits, of course).

Theoretically we could just add custom dedicated instructions to handle 256-bit integer arithmetic, if we wanted to. Then the resulting code would be smaller than the equivalent WASM code (which would have to emulate those using 64-bit arithmetic) and possibly faster (since we could use pedal-to-the-metal native code to do the arithmetic ops, instead of depending on the WASM VM’s optimizer to generate good enough code; of course whether we can match the performance of something like wasmtime for general-purpose code with only an O(n) recompiler is still an open question).


Anyway, please remember that everything outlined up to this point is subject to change depending on the feedback/issues we encounter/practicalities of what we want to achieve. I’m not completely ruling out supporting a 64-bit target (except maybe supporting full 64-bit address-space; I don’t see why we’d want that, and it’ll just complicate sandboxing) but initially we will be 32-bit only and then we’ll see how it goes.

I think Sean made a good point. Yes we can extend the to ISA whatever we need (and then maintain our own ISA spec and a compiler infrastructure…). Maybe this even allows our contracts to outperform Wasm and EVM execution performance, with way lower VM or JIT overhead, all while being on par with or even beating EVM bytecode size, well justifying substantial efforts. I’m all in the risc-V camp, but we should anticipate such details early on.

We’ll have to maintain a spec anyway, since this will have to be reimplementable from scratch by alternative Polkadot implementations. As far as maintaining a compiler goes, having to fork rustc and LLVM is something I want to avoid at all costs**. Fortunately using custom instructions doesn’t require us to do that, because we could just use inline assembly to emit the appropriate instructions and make a nice high-level wrapper in Rust (with operator overloading, etc.) that people could use.

(** - initially we’ll most likely have to supply our own build of rustc with the rv32e patches applied, but that should be only temporary until the support is upstreamed)

Not everything can be anticipated early on; I don’t want us to end up with a bunch of cruft we’re not going to need, which is why I think it makes sense to start with a bare minimum base and expand from there as necessary, while architecting things in such a way that it’ll be easy to change/pivot as we go if needed.

Again, going back to the topic of 64-bit arithmetic - I’m not completely rejecting the idea, but I’m not going to make a final decision here without having hard data as to how much of an exact effect it’ll have. The new rewritten VM (more on that in a future post once I get it functional enough to share) is written in such a way that it’ll be relatively easy to have it accept rv64e and support 64-bit arithmetic, so then we’ll be able to see how much effect it’ll exactly have on code size and performance and not have to make this decision blindly.

1 Like

I’m not worried. I’m not worried about doing 128 bit balance operations with 32 bit registers either.

rv32e is being selected here because it exists. We could discuss other variations once we see how well it works in practice.

We’ve seen this before, and in my opinion this is bad for a number of reasons. This can’t be the solution. The same smart contract now needs different source code, depending on the target runtime. It’ll likely prevent certain code optimizations. It requires solving limitations of the surrounding infrastructure to be solved at a language level. I don’t like the idea of making the usage of assembly a common appearance in smart contracts, even when hiding it through shiny wrappers. We no longer have a runtime that is easily target-able by anyone. We end up in position comparable to the whole RBPF situation, while they at least provide a working LLVM fork. This will invalidate 5 out of the 11 points made here.
A0 explicitly lists 64bit integer support that “maps one-to-one with CPU instructions” and LLVM support as reasons for using Wasm.

I’d be somewhat surprised if having efficient arithmetic for integers >32 bit turns out to be cruft we don’t need. The benchmark results in the OP already showed that the best case RISC-V is already worse than the best case for Wasm (code size). And Wasm size is way worse than EVM, a complaint I’ve heared often so far. EVM is being criticized for 256bit, which does not map to real CPUs. RV32E would be the same but just into the opposite direction.

Sure, we are just trying to highlight some concerns about having only 32bits, which will have side effects.

I don’t quite understand what you mean here; can you please expand on this?

I don’t see how it’d be possible to target multiple smart contract runtimes (e.g. say a solidity VM, a WASM VM, and our new RISC-V based VM) with exactly the same source code. Every target platform will require slightly different code, but the point here is to put all of that code that needs to be platform-specific in libraries and/or tools (e.g. in something like ink!) so that the code of the smart contract itself can be exactly the same.

Why?

Did you read my update post? The best case RISC-V is better than the best case for WASM for code size, assuming we either use compression or a more compact bytecode serialization format.

Which exact points? Let me go through all of them anyway.

  • High performance: Wasm is high performance — it’s built to be as close to native machine code as possible while still being platform independent.

Still true with the new VM. (Although possibly not as high performance due to guaranteed O(1) recompilation.)

  • Small size: It facilitates small binaries to ship over the internet to devices with potentially slow internet connection. This is a great fit for the space-constrained blockchain world.

Still true with the new VM.

  • General VM & bytecode: It was developed so that code can be deployed in any browser with the same result. Contrary to the EVM it was not developed towards a very specific use case, this has the benefit of a lot of tooling being available and large companies putting a lot of resources into furthering Wasm development.

Still true with the new VM.

Even though this is meant for smart contracts I’m targeting this to be a general-purpose VM, and there’s very little smart contract specific about it (except checking all of the feature boxes that smart contracts need/want). And potentially having extra instructions for accelerating higher-bit arithmetic doesn’t really change this.

  • Efficient JIT execution: 64 and 32-bit integer operation support that maps one-to-one with CPU instructions.

Kinda. This requires a longer explanation.

First, a note: the part about “maps one-to-one with CPU instructions” this is actually not true for WASM! Fundamentally a lot of WASM instructions map to multiple native instructions due to the fact that 1) WASM is more high level than a hardware-level ISA, and 2) when translating between different ISAs there will always be some mismatch. For example, on amd64 the bitshift instructions require that the bitshift amount is in a specific register (in cl), so if, say, you want to bitshift by a value you have in another register you need to shuffle things around. There’s also the issue that WASM is a stack-based machine so it needs an expensive register allocator to actually recompile the bytecode into the native code.

So, arguably, this point is more true with the new VM than it is with WASM, but it’s still not 100% true as there will still be some impendance mismatch between e.g. RISC-V and amd64 (which is unavoidable).

  • Minimalistic: Formal spec that fits on a single page.

Okay, this is weird, because I don’t think this is true for WASM, unless the page has an infinite length? (: And the baseline WASM spec keeps on growing as the time goes.

Nevertheless, I’m aiming for our spec being shorter and more minimal than WASM’s.

  • Deterministic execution: Wasm is easily made deterministic by removing floating point operations, which is necessary for consensus algorithms.

Still true with the new VM.

  • Open Standards > Custom Solutions: Wasm is a standard for web browsers developed by W3C workgroup that includes Google, Mozilla, and others. There’s been many years of work put into Wasm, both by compiler and standardization teams.

Still mostly true with the new VM. Supporting some custom instructions to accelerate certain workloads doesn’t change this, and is something that’s encouraged and is very common among RISC-V hardware vendors. If you don’t want to use those custom instructions in your program then you just don’t, and use only the baseline RISC-V ISA, at the cost of worse performance.

  • Many languages available: Wasm expands the family of languages available to smart contract developers to include Rust, C/C++, C#, Typescript, Haxe, and Kotlin. This means you can write smart contracts in whichever language you’re familiar with, though we’re partial to Rust due to its lack of runtime overhead and inherent security properties.

Still true with the new VM, at least for C and C++. For other languages I’m not entirely sure, so this might be a fair point. Do we have many people writing smart contracts in C#, Typescript, Haxe or Kotlin?

  • Memory-safe, sandboxed, and platform-independent.

Still true with the new VM. And our sandboxing will actually be better than what WASM VMs offer. (More on that in a future update.)

  • LLVM support: Supported by the LLVM compiler infrastructure project, meaning that Wasm benefits from over a decade of LLVM’s compiler optimization.

Still true with the new VM.

  • Large companies involved: Continually developed by major companies such as Google, Apple, Microsoft, Mozilla, and Facebook.

Still true with the new VM.


So perhaps except a single bullet point all of those arguments should still be true.