Exploring alternatives to WASM for smart contracts

PeterMangata · May 2, 2023, 4:01pm

FYI, ZK proofs over arbitrary rust code, here’s an example of RISC-V implementation. Might be great for Substrate

koute · May 31, 2023, 11:39am

Continuing with the experiment I’ve looked into ways of cutting down the size of the serialized RISC-V bytecode and I’ve come up with a relatively simple bit-efficient serialization scheme to see how far an uncompressed stream can go down in size. Here are the results: (sorted according to size, best results are in bold)

(Note: to keep things fair those are the sizes for the code portion of the payloads, without headers, padding, metadata, non-code data, etc.)

		size	size (zstd -3)	size (zstd -23)
O3	ebpf	135200	34757	28992
Os	ebpf	125128	33385	27442
Oz	ebpf	101608	29735	25138
O3	rv32e	74588	15183	12348
Os	rv32e	68644	14017	11278
O3	wasm	68114	21054	18562
O3	wasm_opt	65042	20865	18498
O3	rv32e_c	62092	32798	30138
Os	wasm	61164	19865	17564
Os	wasm_opt	58414	19700	17491
Oz	rv32e	58380	30262	27440
Os	rv32e_c	57220	31077	28648
Oz	rv32e_c	49872	29173	27019
O3	rv32e_compact	48182	43062	40947
Oz	wasm	46550	18418	16421
Oz	wasm_opt	44906	18126	16240
Os	rv32e_compact	43088	38999	37293
Oz	rv32e_compact	34889	32539	31090

RISC-V wins on all counts. Uncompressed my serialization scheme (the rv32e_compact) wins at ~34KB vs the next best WASM with wasm-opt at ~44KB. When compressed with zstd the vanilla RISC-V encoding wins at ~11KB vs the next best WASM with wasm-opt at ~16KB.

So we have two options to minimize the size of the binaries: either keep uncompressed blobs compiled with Oz and encoded with my custom scheme, or keep zstd compressed vanilla-encoded blobs compiled with O3 or Os. Either way it’s going to be an improvement over WASM.

gavofyork · May 31, 2023, 12:47pm

One thing that doesn’t seem to be mentioned so far is deterministic execution metering - this is needed in contracts so presumably it’s available.

If we could make up some of that performance gap with an effective AOT trans-compiler which supported deterministic metering, then it could be a game-changer for Frame.

koute · May 31, 2023, 4:15pm

Yes, that’s the plan. The idea is that we want 100% deterministic execution, and this includes not only instruction metering, but also things like e.g. native stack usage (which is not deterministic on e.g. wasmtime so we need to hack around it to make sure PVFs don’t overflow it).

I haven’t yet done any experiments with execution metering as adding that in an efficient manner is a little bit more involved, but the general idea I had was to most likely just use @pepyakin idea of bumping a static gas counter at the boundaries of instruction blocks and asynchronously checking it whether it’s still under the expected limit.

This would still deterministically tell us how much work was done once the code finishes running, and it will deterministically always fail if it exceeds the amount of work it’s allowed to do, but it saves on the overhead of synchronously checking the gas meter at the cost of maybe executing a little more code than necessary (which in practice we don’t care about as long as the effects of the execution are transactional). Together with @pepyakin we’ve already done some experiments with this on wasmtime and validated that the idea should work in principle.

In theory if we wanted then with some light modifications to the FRAME support libraries (to remove the assumptions that it’s always WASM, to support a new host function calling convention, etc.) and an extra executor backend in Substrate we could indeed use this as an alternative to WASM not only for smart contracts but also for runtimes/PVFs/SPREEs/etc., essentially requiring only a recompilation of a runtime blob for a different target.

I’m not yet sure if this would be a good idea. At this point for smart contracts I’m quite confident that this is most likely a good idea as it will solve a lot of our problems with WASM, speed them up, have blazingly fast compile times, low memory footprint, be relatively easy to secure, and be simple enough so that it’s easy to maintain. Essentially giving us a state-of-art smart contract VM.

But for things which are not smart contracts the pendulum swings somewhat in the direction of maximizing execution speed at the cost of extra compilation time and complexity. How much of a performance hit would we be willing to accept? How much extra complexity would we be willing to entertain to increase the performance compared to the smart contracts sweet spot baseline?

For smart contracts I think this is attractive precisely because it’s in this sweet spot of the speed/complexity/effort trifecta where it just makes sense. For other uses however it’s still an open question. (Which we can explore. Now that I’m mostly done experimenting I just need to get it more production ready for smart contracts first, and then we can play around with it.)

gavofyork · June 1, 2023, 11:36am

Yes - I see the tradeoff. Is there any existing work on building a high-performance optimising trans-compiler for RISC-V?

If performance is just not at a level where it can be replaced wholesale for all PVFs maybe we can support PVFs in either format; Wasm (for raw speed) or RISC-V (for the avoidance of writing benchmark code). If this were the case, I can imagine that as optimisations slowly arrive into the PVF trans-compiler things swing towards RISC-V, but we avoid forcing any performance drop onto anyone.

koute · June 1, 2023, 2:09pm

Well, there are quite a few RISC-V VMs out there, but AFAIK from what I’ve surveyed there’s nothing out there that would match our specific requirements. For example:

Dynamic recompilation vs static recompilation. Most VMs which produce native code do it dynamically because they’re designed to support running arbitrary software that just can’t be recompiled once ahead-of-time. We don’t need nor want this, as it’s just extra unnecessary complexity. (And in fact, doing it in an AOT fashion - if the use case allows for it, and ours does - should be easier to optimize because you have access to the whole program graph.)
Support for full RISC-V ISA. Most VMs support the whole kitchen-sink offered by RISC-V. In our case we only really need a cut down subset of the user-level instruction set.
Doesn’t actually emit native code and/or is not high performance. There is for example this other blockchain which uses a RISC-V based VM, but they’re running it interpreted, which is not going to be as fast as even a naive recompiler. (And, paradoxically, writing a high performance interpreter is actually more complex/more effort than a naive recompiler.)

That said, there is plenty of prior art for making high performance recompilers - that’s what wasmtime is!

Technically we could maybe translate the RISC-V code back into Cranelift IR (where Cranelift is the optimizing compiler used by wasmtime) and generate native code using that, and maybe that could allow us to match wasmtime’s performance, essentially piggybacking on all of the performance work that’s done for wasmtime, just with a different frontend. So we could have two backends - one would be a simple O(n) one used for smart contracts, and the other would use Cranelift to generate faster code at the cost of extra compilation time and complexity. (Assuming it is feasible, which is still an open question which we can explore.)

For all PVFs - I don’t think so, but for some PVFs - indeed, it’s possible that the performance as-is would be enough. It’s not slow per-se (especially for code that’s not compute-heavy), it’s just not as fast as wasmtime.

gavofyork · June 1, 2023, 5:39pm

Presumably Cranelift IR itself isn’t a consideration for expressing a PVF in because it is not a stable target?

How terrible would it be to fork an existing project and strip it down? Is there an AOT trans-compiler which maybe supports everything including the kitchen sink which we can strip?

koute · June 1, 2023, 7:48pm

Indeed AFAIK it’s not stable, and it’s also not really designed (nor supported as such) to be a long-term standalone IR, is relatively high level (so the O(n) compilation which we need for smart contracts becomes hard), and in general has a lot of superfluous features we don’t need.

There’s also the practical issue of not actually having a compiler that directly targets it. If we don’t want to maintain own own rustc/LLVM backend (and honestly, we don’t) then we’re limited by what rustc supports. (Yes, technically there is a work-in-progress Cranelift backend for rustc, but it isn’t really meant to emit Cranelift IR as a final artifact.)

As a long-term IR that we can support forever I think that RISC-V is essentially the best option out there, precisely because it is small, stable, well supported, and has an ecosystem that won’t suddenly go in a direction we don’t want, mostly because many other people also care about the subset of RISC-V that we’re targeting. (Unlike WASM, where new features are being rolled into the baseline spec so we will be forced to deal with them somehow whether we want it or not.)

If the only issue would be that we’d just need to strip out some features then that possibly wouldn’t be too bad. (Although at some point the effort to modify it could conceivably surpass the effort to rewrite it from scratch.) But AFAIK fundamentally even if we ignore everything besides performance then there might not even actually exist a RISC-V VM that is higher performance than what I wrote. I surveyed some existing RISC-V VMs and they tend to not do much higher level optimizations on the code they’re generating (which is what you’d most likely need to close the performance gap).

All in all, once I get this to a reasonable state for smart contracts I will look into whether we can close the performance gap as a general purpose runtime VM. Currently it’s not entirely clear where the extra overhead comes from exactly so it’s hard to conclude anything, but if I e.g. compile all of our runtime benchmarks to RISC-V and compare their performance with how they run on WASM it might shine some light on where exactly the gap is and how to close it.

gavofyork · June 1, 2023, 8:30pm

if I e.g. compile all of our runtime benchmarks to RISC-V and compare their performance with how they run on WASM it might shine some light on where exactly the gap is and how to close it.

Sounds sensible.

PeterMangata · June 9, 2023, 6:37pm

why would they push WASM for cloud environment, when the original purpose was to make client side web better? So this no longer applies in WASM world?

Alex · June 12, 2023, 9:49pm

People are using Wasm for all kind of applications where you want to sandbox code simply because there is good production quality software to do that (wasmtime for example). However, we have much stricter requirements than just sandboxing. Mainly determinism. Those cloud services can just spin up a VM and kill it if it takes too much resources for a certain program. May it be compilation or execution. We can’t do that.

But wouldn’t we lose our biggest our biggest incentive to use RISC-V by doing so? Meaning the dead simple O(n) compilation to native code? IMHO we either proof that we can write a O(n) compiler that produces fast enough code or we could just stick with Wasm where optimizing (but unbounded) compilers do already exist. Given, it would solve our stack overflow determinism problem. But would it be worth it just for that?

Is the assumption here that execution metering that is baked into the recompiler is necessarily faster for RISC-V than for Wasm? If yes: I am not sure if this is the case. If I remember correctly the overhead of wasmtime fuel metering was around 5% on our FRAME benchmarks. It was worse for contracts (running wasmi under wasmtime) but this was countered by @pepyakin async checking method. If we only want metering we can probably make it work with Wasm, too.

koute · June 13, 2023, 4:58am

Yes. I was just explaining the possibility of it, but I really wouldn’t want to do it unless absolutely necessary. It is within the realm of possibility that we could match wasmtime’s performance without resorting to that, I still have some other ideas to try to speed it up. Ideas which I do intend to test out. (I just can’t test them out yet without refactoring the VM into a more production-ready state, which is currently in progress.)

Well, the O(n) compilation and preventing native stack overflow aren’t the only thing we’d get; there would be other (I guess more minor?) advantages too. The baseline instruction set wouldn’t expand/shift under us with the compiler emitting new instructions that we’d have to handle, we’d get better security sandboxing (more on that in a later post once I’m actually finished implementing this part), it’d be slightly easier to meter (mostly because we wouldn’t have to add @pepyakin async metering to wasmtime), etc.

But, again, I want to reiterate that I don’t like the idea of using Cranelift online, because then as you’ve said we’d lose O(n) compilation, we’d lose consistent performance, sandboxing gets harder (because we’d have to sandbox the recompilation part, while with O(n) we don’t have to), complexity increases exponentially, and the VM will be harder to reproduce based on the specs by a third party, etc.

(Another thing I’d like to achieve with this RISC-V based VM is to make it simple enough so that we can write it into the Polkadot spec, and make it feasible for anyone to reproduce it from scratch completely with 3rd party code and get comparable performance. Using Cranelift would make the “get comparable performance” part difficult.)

No. But it’d be easier to actually implement it.

Alex · June 13, 2023, 6:36am

Agreed. And we might even face resistance upstreaming the change to wasmtime as we are not exactly their target audience. A lot of really desirable properties for us are following from just the simplicity of RISC-V.

seanyoung · July 12, 2023, 11:11am

So you’re saying that if you want 64 bit arithmetic, then you what you really want is SIMD, but since we don’t have that you can’t have 64 bit arithmetic either. I don’t think that follows; this is an example of perfection being the enemy of the good.

I am concerned about 32 bit. Wasm does support 64 bit arithmetic/memory access natively, and rv32e does not.

Balance is u128 and used heavily in smart contracts. Balance arithmetic will be many more instructions on rv32e than on wasm.
A major issue in recompiling solidity is supporting bigint arithmetic on e.g. uint256. If all we have is 32 bit arithmetic then bigint cost will be huge compared to wasm
Smart contracts wanting to do crypto will have performance penalties (as burdges pointed out)

This is a step backwards compared to wasm. rv32e was designed for tiny devices like usb peripheral microcontrollers, not smart contracts.

Note that clang/llvm does support rv64e even though it is not standardized. A 64 bit address space not a bad thing either.

koute · July 12, 2023, 5:49pm

No, I’m saying that if you care about the absolute best performance for your crypto/numeric/whatever code you actually want SIMD. In general the performance improvement from using SIMD tends to be a lot bigger than just switching from 32-bit to 64-bit arithmetic.

(Nevertheless, we’re currently running our smart contracts in an interpreter running under another virtual machine; the equivalent code running natively is going to be significantly faster anyway, even if it’s going to use only 32-bit arithmetic.)

I disagree. WASM too was not designed for smart contracts. (:

The nice thing about RISC-V is that it’s designed to be flexible and extensible, which means that if necessary we can just tweak it to be whatever we want it to be (within some reasonable limits, of course).

Theoretically we could just add custom dedicated instructions to handle 256-bit integer arithmetic, if we wanted to. Then the resulting code would be smaller than the equivalent WASM code (which would have to emulate those using 64-bit arithmetic) and possibly faster (since we could use pedal-to-the-metal native code to do the arithmetic ops, instead of depending on the WASM VM’s optimizer to generate good enough code; of course whether we can match the performance of something like wasmtime for general-purpose code with only an O(n) recompiler is still an open question).

Anyway, please remember that everything outlined up to this point is subject to change depending on the feedback/issues we encounter/practicalities of what we want to achieve. I’m not completely ruling out supporting a 64-bit target (except maybe supporting full 64-bit address-space; I don’t see why we’d want that, and it’ll just complicate sandboxing) but initially we will be 32-bit only and then we’ll see how it goes.

Cyrill · July 12, 2023, 7:15pm

I think Sean made a good point. Yes we can extend the to ISA whatever we need (and then maintain our own ISA spec and a compiler infrastructure…). Maybe this even allows our contracts to outperform Wasm and EVM execution performance, with way lower VM or JIT overhead, all while being on par with or even beating EVM bytecode size, well justifying substantial efforts. I’m all in the risc-V camp, but we should anticipate such details early on.

koute · July 13, 2023, 6:55am

We’ll have to maintain a spec anyway, since this will have to be reimplementable from scratch by alternative Polkadot implementations. As far as maintaining a compiler goes, having to fork rustc and LLVM is something I want to avoid at all costs**. Fortunately using custom instructions doesn’t require us to do that, because we could just use inline assembly to emit the appropriate instructions and make a nice high-level wrapper in Rust (with operator overloading, etc.) that people could use.

(** - initially we’ll most likely have to supply our own build of rustc with the rv32e patches applied, but that should be only temporary until the support is upstreamed)

Not everything can be anticipated early on; I don’t want us to end up with a bunch of cruft we’re not going to need, which is why I think it makes sense to start with a bare minimum base and expand from there as necessary, while architecting things in such a way that it’ll be easy to change/pivot as we go if needed.

Again, going back to the topic of 64-bit arithmetic - I’m not completely rejecting the idea, but I’m not going to make a final decision here without having hard data as to how much of an exact effect it’ll have. The new rewritten VM (more on that in a future post once I get it functional enough to share) is written in such a way that it’ll be relatively easy to have it accept rv64e and support 64-bit arithmetic, so then we’ll be able to see how much effect it’ll exactly have on code size and performance and not have to make this decision blindly.

burdges · July 13, 2023, 8:26am

I’m not worried. I’m not worried about doing 128 bit balance operations with 32 bit registers either.

rv32e is being selected here because it exists. We could discuss other variations once we see how well it works in practice.

Cyrill · July 13, 2023, 9:04am

We’ve seen this before, and in my opinion this is bad for a number of reasons. This can’t be the solution. The same smart contract now needs different source code, depending on the target runtime. It’ll likely prevent certain code optimizations. It requires solving limitations of the surrounding infrastructure to be solved at a language level. I don’t like the idea of making the usage of assembly a common appearance in smart contracts, even when hiding it through shiny wrappers. We no longer have a runtime that is easily target-able by anyone. We end up in position comparable to the whole RBPF situation, while they at least provide a working LLVM fork. This will invalidate 5 out of the 11 points made here.
A0 explicitly lists 64bit integer support that “maps one-to-one with CPU instructions” and LLVM support as reasons for using Wasm.

I’d be somewhat surprised if having efficient arithmetic for integers >32 bit turns out to be cruft we don’t need. The benchmark results in the OP already showed that the best case RISC-V is already worse than the best case for Wasm (code size). And Wasm size is way worse than EVM, a complaint I’ve heared often so far. EVM is being criticized for 256bit, which does not map to real CPUs. RV32E would be the same but just into the opposite direction.

Sure, we are just trying to highlight some concerns about having only 32bits, which will have side effects.

koute · July 13, 2023, 10:05am

I don’t quite understand what you mean here; can you please expand on this?

I don’t see how it’d be possible to target multiple smart contract runtimes (e.g. say a solidity VM, a WASM VM, and our new RISC-V based VM) with exactly the same source code. Every target platform will require slightly different code, but the point here is to put all of that code that needs to be platform-specific in libraries and/or tools (e.g. in something like ink!) so that the code of the smart contract itself can be exactly the same.

Why?

Did you read my update post? The best case RISC-V is better than the best case for WASM for code size, assuming we either use compression or a more compact bytecode serialization format.

Which exact points? Let me go through all of them anyway.

High performance: Wasm is high performance — it’s built to be as close to native machine code as possible while still being platform independent.

Still true with the new VM. (Although possibly not as high performance due to guaranteed O(1) recompilation.)

Small size: It facilitates small binaries to ship over the internet to devices with potentially slow internet connection. This is a great fit for the space-constrained blockchain world.

Still true with the new VM.

General VM & bytecode: It was developed so that code can be deployed in any browser with the same result. Contrary to the EVM it was not developed towards a very specific use case, this has the benefit of a lot of tooling being available and large companies putting a lot of resources into furthering Wasm development.

Still true with the new VM.

Even though this is meant for smart contracts I’m targeting this to be a general-purpose VM, and there’s very little smart contract specific about it (except checking all of the feature boxes that smart contracts need/want). And potentially having extra instructions for accelerating higher-bit arithmetic doesn’t really change this.

Efficient JIT execution: 64 and 32-bit integer operation support that maps one-to-one with CPU instructions.

Kinda. This requires a longer explanation.

First, a note: the part about “maps one-to-one with CPU instructions” this is actually not true for WASM! Fundamentally a lot of WASM instructions map to multiple native instructions due to the fact that 1) WASM is more high level than a hardware-level ISA, and 2) when translating between different ISAs there will always be some mismatch. For example, on amd64 the bitshift instructions require that the bitshift amount is in a specific register (in cl), so if, say, you want to bitshift by a value you have in another register you need to shuffle things around. There’s also the issue that WASM is a stack-based machine so it needs an expensive register allocator to actually recompile the bytecode into the native code.

So, arguably, this point is more true with the new VM than it is with WASM, but it’s still not 100% true as there will still be some impendance mismatch between e.g. RISC-V and amd64 (which is unavoidable).

Minimalistic: Formal spec that fits on a single page.

Okay, this is weird, because I don’t think this is true for WASM, unless the page has an infinite length? (: And the baseline WASM spec keeps on growing as the time goes.

Nevertheless, I’m aiming for our spec being shorter and more minimal than WASM’s.

Deterministic execution: Wasm is easily made deterministic by removing floating point operations, which is necessary for consensus algorithms.

Still true with the new VM.

Open Standards > Custom Solutions: Wasm is a standard for web browsers developed by W3C workgroup that includes Google, Mozilla, and others. There’s been many years of work put into Wasm, both by compiler and standardization teams.

Still mostly true with the new VM. Supporting some custom instructions to accelerate certain workloads doesn’t change this, and is something that’s encouraged and is very common among RISC-V hardware vendors. If you don’t want to use those custom instructions in your program then you just don’t, and use only the baseline RISC-V ISA, at the cost of worse performance.

Many languages available: Wasm expands the family of languages available to smart contract developers to include Rust, C/C++, C#, Typescript, Haxe, and Kotlin. This means you can write smart contracts in whichever language you’re familiar with, though we’re partial to Rust due to its lack of runtime overhead and inherent security properties.

Still true with the new VM, at least for C and C++. For other languages I’m not entirely sure, so this might be a fair point. Do we have many people writing smart contracts in C#, Typescript, Haxe or Kotlin?

Memory-safe, sandboxed, and platform-independent.

Still true with the new VM. And our sandboxing will actually be better than what WASM VMs offer. (More on that in a future update.)

LLVM support: Supported by the LLVM compiler infrastructure project, meaning that Wasm benefits from over a decade of LLVM’s compiler optimization.

Still true with the new VM.

Large companies involved: Continually developed by major companies such as Google, Apple, Microsoft, Mozilla, and Facebook.

Still true with the new VM.

So perhaps except a single bullet point all of those arguments should still be true.

Topic		Replies	Views
eBPF contracts hackathon Tech Talk wasm , smart-contracts	12	3331	June 8, 2023
Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!) Tech Talk wasm , smart-contracts	95	13832	July 3, 2024
We should create an RFC with the Bytecode Alliance to add no_std support Tech Talk wasm	0	402	September 15, 2022
Bringing solana's ebpf based smart-contract execution to polkadot	7	1177	November 15, 2022
Future of WebAssembly and ink! Smart Contracts Ecosystem ink , wasm , smart-contracts	10	963	September 29, 2022

Exploring alternatives to WASM for smart contracts

Related topics