Exploring alternatives to WASM for smart contracts

koute · July 13, 2023, 10:25am

Also, a side note, I just want to say: thank you for the feedback. It’s very useful to know what other people think about all of this and what concerns they have, especially from the smart contracts and the ecosystem side of things.

Cyrill · July 13, 2023, 10:48am

Well, as I understood your quote:

This means there is no usable off the shelf toolchain targetting our ISA, or am I missing something? And if this is true, I’m concerned about these points from the use.ink Wasm raison d’être:

General VM & bytecode // instead, very specifc bytecode no one else is using
Open Standards > Custom Solutions: // instead, custom solution no one else is using
Many languages available // instead, only our own language with silly wrappers around assembly code blocks
LLVM support // Obsiosuly no
Large companies involved // I’m happy to learn about them

This is why I wrote that. I mean if we stick to some common RiscV ISA already supported upstream, or if we upstream ours to LLVM (which contradicts the part about not maintaining a compiler toolchain), then none of these points are really invalidated. I guess either we find out RV32E (or whatever else that is already usptream) is all we need. Or we should anticipate putting the needed efforts in getting this right; just by my opinion, using inline assembly isn’t getting it right, for the reasons I explained. Or to expand on that, using inline assembly is something that can be done and can be fine, but if it’s for something as fundamental as integer arithmetic, I view it as problematic.

Well, needless to say, as soon as contract authors need to use assembly in the source code, that code will only work for that specific target. There will be RiscV and Wasm on the contracts pallet, there’s also Phat contracts, and there may be more in the future (maybe ZK VMs, off-chain environments, …). In Ethereum, where hand-written Assembly or bytecode is very common, this creates a lot of friction already. Maybe this situation will become inevitable as we grow anyways because of other reasons, but it’s still something to think about.

Cyrill · July 13, 2023, 10:50am

Thank you for pushing this awesome initiative forward and keeping the discussions public!

seanyoung · July 13, 2023, 11:03am

I want to make clear that picking an 32 bit isa is going to be very bad for Solidity. Take this code which you can compile with
/usr/bin/clang -Xclang -fexperimental-max-bitint-width=512 -O3 -target riscv64 -c bigint.c

typedef unsigned _BitInt(256) uint256_t;
typedef unsigned _BitInt(128) uint128_t;

uint128_t mul128(uint128_t a, uint128_t b)
{
	return a * b;
}

uint256_t mul256(uint256_t a, uint256_t b)
{
	return a * b;
}

Change the target and you can see the difference. On riscv64, mul128 looks like so
(llvm-objdump -d bigint.o):

0000000000000002 <mul128>:
       2: b3 05 b6 02  	mul	a1, a2, a1
       6: 33 37 a6 02  	mulhu	a4, a2, a0
       a: ba 95        	add	a1, a1, a4
       c: b3 86 a6 02  	mul	a3, a3, a0
      10: b6 95        	add	a1, a1, a3
      12: 33 05 a6 02  	mul	a0, a2, a0
      16: 82 80        	ret

On riscv32:

00000002 <mul128>:
       2: 01 11        	addi	sp, sp, -32
       4: 22 ce        	sw	s0, 28(sp)
       6: 26 cc        	sw	s1, 24(sp)
       8: 4a ca        	sw	s2, 20(sp)
       a: 4e c8        	sw	s3, 16(sp)
       c: 52 c6        	sw	s4, 12(sp)
       e: 03 a8 c5 00  	lw	a6, 12(a1)
      12: 83 a8 85 00  	lw	a7, 8(a1)
      16: dc 41        	lw	a5, 4(a1)
      18: 8c 41        	lw	a1, 0(a1)
      1a: 14 42        	lw	a3, 0(a2)
      1c: 58 42        	lw	a4, 4(a2)
      1e: 83 22 c6 00  	lw	t0, 12(a2)
      22: 83 23 86 00  	lw	t2, 8(a2)
      26: 33 b3 b6 02  	mulhu	t1, a3, a1
      2a: 33 06 b7 02  	mul	a2, a4, a1
      2e: 32 93        	add	t1, t1, a2
      30: 33 3e c3 00  	sltu	t3, t1, a2
      34: 33 36 b7 02  	mulhu	a2, a4, a1
      38: 32 9e        	add	t3, t3, a2
      3a: 33 86 f6 02  	mul	a2, a3, a5
      3e: 32 93        	add	t1, t1, a2
      40: b3 3e c3 00  	sltu	t4, t1, a2
      44: 33 b6 f6 02  	mulhu	a2, a3, a5
      48: 76 96        	add	a2, a2, t4
      4a: b3 0e ce 00  	add	t4, t3, a2
      4e: 33 0f f7 02  	mul	t5, a4, a5
      52: 33 06 df 01  	add	a2, t5, t4
      56: b3 8f 75 02  	mul	t6, a1, t2
      5a: b3 89 d8 02  	mul	s3, a7, a3
      5e: 33 8a f9 01  	add	s4, s3, t6
      62: b3 0f 46 01  	add	t6, a2, s4
      66: 33 b9 cf 00  	sltu	s2, t6, a2
      6a: 33 36 e6 01  	sltu	a2, a2, t5
      6e: b3 b4 ce 01  	sltu	s1, t4, t3
      72: 33 34 f7 02  	mulhu	s0, a4, a5
      76: a2 94        	add	s1, s1, s0
      78: 26 96        	add	a2, a2, s1
      7a: b3 84 55 02  	mul	s1, a1, t0
      7e: 33 b4 75 02  	mulhu	s0, a1, t2
      82: a2 94        	add	s1, s1, s0
      84: b3 87 77 02  	mul	a5, a5, t2
      88: a6 97        	add	a5, a5, s1
      8a: 33 87 e8 02  	mul	a4, a7, a4
      8e: b3 b4 d8 02  	mulhu	s1, a7, a3
      92: 26 97        	add	a4, a4, s1
      94: b3 04 d8 02  	mul	s1, a6, a3
      98: 26 97        	add	a4, a4, s1
      9a: 3e 97        	add	a4, a4, a5
      9c: b3 37 3a 01  	sltu	a5, s4, s3
      a0: 3e 97        	add	a4, a4, a5
      a2: 3a 96        	add	a2, a2, a4
      a4: 4a 96        	add	a2, a2, s2
      a6: b3 85 b6 02  	mul	a1, a3, a1
      aa: 0c c1        	sw	a1, 0(a0)
      ac: 23 22 65 00  	sw	t1, 4(a0)
      b0: 23 24 f5 01  	sw	t6, 8(a0)
      b4: 50 c5        	sw	a2, 12(a0)
      b6: 72 44        	lw	s0, 28(sp)
      b8: e2 44        	lw	s1, 24(sp)
      ba: 52 49        	lw	s2, 20(sp)
      bc: c2 49        	lw	s3, 16(sp)
      be: 32 4a        	lw	s4, 12(sp)
      c0: 05 61        	addi	sp, sp, 32
      c2: 82 80        	ret

The mul256 is 1024 bytes long on riscv32 and has loops! On riscv64 it’s a mere 190 bytes straight line code.

This will cause massive code bloat and gas cost for Solidity. This is a problem for Solang!

koute · July 13, 2023, 12:12pm

The idea is to use an off the shelf toolchain, yes! I definitely do not want to force the use of a custom toolchain. Although the artifacts generated by that toolchain will not be directly used on-chain.

Basically, the pipeline I’m envisioning is this:

[smart contract code] → [standard rustc/clang/gcc targetting RISC-V with a custom linker script] → [standard ELF file] → [our custom postprocessor] → [final blob] → [the VM]

Why do it this way? Well, for several reasons:

Reduce the size of the artifacts. ELF files are not particularly compact.
Reduce the downstream complexity of processing the smart contract blob. ELF files have a lot of features and flexibility that we just don’t need.
Reduce the complexity of the VM itself. The more code we can move from the VM into the postprocessor the better. The VM will be easier to reimplement by 3rd parties, and easier to secure.
Faster recompilation into native code by the VM. (Since the format of the blob will be designed to facilitate that.)
Easier to spec. (I’ll most likely document it so that other general purpose compilers can also target it by being able to emit an appropriate ELF file that will be accepted by it, but technically whatever the postprocessor is doing doesn’t have to be in the spec, because the final VM is only going to consume its output.)
Enable offline macro-op fusion. RISC-V being a fixed-width ISA splits up certain operations into multiple instructions (e.g. loading a big immediate requires at least two instructions). For hardware RISC-V implementations these pairs of instructions are automatically merged at runtime by the CPU as the code runs, but we can just do it offline and simplify things. (Again, this will be both more convenient to process for tools, and simpler to recompile by the VM. In theory this is, of course, reversible and we can get normal RISC-V code back if we want, so it’s essentially just an alternative way of writing the same thing, just simpler and easier to process.)
Better security. E.g. eventually I want to support Intel’s CET to mitigate any potential ROP-based exploits, and that requires us statically knowing every potential jump target, which is something we can extract from the ELF file as long as it has debug symbols included.
Possibilities for higher performance. We absolutely want O(n) compilation in the VM, but since the postprocessor runs entirely offline this requirement doesn’t apply there. (Still an open question whether we can actually have any extra meaningful optimizations in the postprocessor that would help over what LLVM already provides. This is something I do plan to explore further in the future.)

In my opinion the benefits here far outweight the costs, which is having the user run a single tool after compilation to postprocess their smart contract. And in practice this should be handled completely automatically by the relevant language-specific tooling (e.g. our cargo-contract will do this automatically and transparently) so it mostly only affects people developing language-specific tooling and not end-user smart contract devs.

If you want to be pedantic then in a way this could be treated as a custom toolchain, but the important part is that the part of the toolchain that’s the biggest (millions of lines of code) and actually doing most of the legwork (rustc/LLVM) is going to be completely off-the-shelf, and only comparatively tiny (few thousand lines of code?) part will be custom (the postprocessor).

We are on the same page here. I definitely do not want the end users (the smart contract authors) to have to use any inline assembly. However standard libraries that are included with a given smart contract platform (e.g. inside of ink!) are a fair game.

Thank you! This is very useful!

I have only a passing familiarity with the solidity ecosystem, so please forgive me if I’m missing something here, but, since people are not going to use a general-purpose compiler (like rustc or clang) to compile their solidity programs and instead will use something like your hyperledger solang - like I’ve already proposed, wouldn’t it actually make sense to have native support for u256 arithmetic in the VM, which you could directly take the advantage of? Especially since even on 64-bit RISC-V that multiplication function is still 190 bytes. That’d make your life easier as you wouldn’t have to manually emulate 256-bit arithmetic, would be smaller and faster, and it shouldn’t be too much trouble for me to actually implement.

(I said I want to keep the VM as small and minimal as possible, but I definitely do not want to make life harder for any of the existing projects in the ecosystem, so I’m very much interested in discussing potential solutions to problems like this one. Again, thank you for bringing this to my attention.)

seanyoung · July 13, 2023, 12:35pm

I don’t think your idea of a custom instruction is much of a win. First of all, you’ll going to need to wire it up in the VM and in llvm, and now you’ve got out of tree patches: you’ve forked llvm. So the maintenance burden is huge.

Secondly, how are you going to implement this in the jitter? SIMD does not buy you anything here. Compile the code above with:
/usr/bin/clang -Xclang -fexperimental-max-bitint-width=512 -O3 -march=x86-64-v4 -c bigint.c
and clang does not generate any SSE/AVX instructions, it just uses general purpose instructions. Sure, if you look at the openssl bn source code there are some SSE instructions used for some cases, but I don’t think simple multiply is one of them. So it will just be implemented in regular general purpose registers.

If you jit riscv64 256 bit multiply to x86-64 you will get very similar performance to your custom risc-v instruction.

If you want to do a 256 bit syscall then the overhead of the system call is already too much. If you want to use the risc-v vector extension then a) very hard to jit and b) does not help anyway with 256 bit mul.

I don’t think this is the only case where 64 bit risc-v is a win. Say you want to copy an account (32 bytes) from one place to another. That’s 4 loads + stores in 64 bit, and 8 load + stores in 32 bit.

koute · July 13, 2023, 2:08pm

I don’t need to wire it up in LLVM though?

I see two cases here:

A program that’s using a normal general-purpose compiler to write their smart contract. They can use inline assembly to use the extra instructions, and ideally that inline assembly is going to be wrapped up by whatever platform/library they use so that the end-user won’t have to touch this directly nor even know about it.
A program that’s compiled using a special-purpose compiler like your solang. In this case your compiler can just generate the relevant instructions directly. (You don’t use LLVM to generate code, right? Or do you?)

Indeed, wide multiplication is notoriously difficult to actually speed up with SIMD, and AFAIK even e.g. AVX512 only exposes at most a 64bit multiply (mostly because wide hardware multiplication circuits are very expensive). But I’ve seen some tricks for e.g. additions that can be used to make them faster with SIMD. (Although I’m not really an expert in this particular field.)

But it does have other benefits, even if the VM supports 64-bit arithmetic. The code will be smaller, and it will most likely be faster (a naive translation of even 64-bit RISC-V code into amd64 is not going to produce equivalent performance because the architectures are just too different, although how much exactly is the performance difference would have to be benchmarked).

Sorry, I’m a little confused; nobody said anything about making syscalls? (:

This is a good point. However, in general for bigger copies the compiler will just insert a memcpy (and I just checked what rustc does here, and it did use a memcpy). And I do have plans to add an accelerated memcpy instruction to the VM, so in practice this will be a single instruction. (For WASM we’ve found out that there is a significant performance improvement in certain cases when the bulk memory ops extension is enabled.)

Anyway, you’ve made your point; thank you for your feedback. I’ll take it into consideration. You’ve convinced me that having the base registers be 64-bit would be useful.

Also, you’ve previously said rv64e is not standarized; this is incorrect. it is actually standarized.

Cyrill · July 13, 2023, 2:50pm

Solang is a LLVM frontend. (Currently there’s also ask! (assemblyscript / binaryen) targeting the contracts pallet, not entirely sure what RiscV would mean for them).

koute · July 13, 2023, 3:22pm

I see! My apologies; I didn’t know that.

In this case you should be able to use a completely off the shelf LLVM (assuming the rv32e/rv64e patch is upstreamed) to target the new VM, just with an RISC-V target and some other minor changes (e.g. telling the LLVM to use a specific linker script that we’ll supply, supporting hostcalls, etc.; all of this will be documented) which shouldn’t be too much work. And then you’ll hand off the ELF file generated by LLVM to our postprocessor (which will be available both as a binary and as a library crate which you can compile into your program) which will do the rest of the work, spewing out the final binary blob that can be put on chain.

Interesting. Since they seem to be essentially an assemblyscript transform plugin (which is inherently WASM-based) they wouldn’t be able to easily target RISC-V. But AFAIK for now we don’t have plans to actually drop support for existing WASM-based contracts, at least not until we have a clear migration path. (One option could be to have a WASM-to-RISC-V recompiler which would take a WASM-based smart contract and convert it. Even existing on-chain contracts could be in theory migrated; even with suboptimal codegen they should still run faster than under the current wasmi-under-wasmtime. Not sure if we’ll do that as this is a conversation we’ve not yet had, but it is a possibility.)

burdges · July 19, 2023, 11:23am

I’d say let’s not slow down rv32e experimentation due to 64 bit concerns. If rv32e works nicely, then we could think about the minimal delta between rv32e and existing rv64 options, aka what rv64e makes sense, benefits most from other RISC-V work, etc.

koute · July 19, 2023, 12:51pm

Nothing’s going to slow down. (:

The initial 0.1 release of the VM (coming soon!) will only support 32-bit. I will add 64-bit support down the line, and then (at least initially) maintain support for both so that I can compare them, and we’ll go from there.

As long as I keep the address space 32-bit (which I will even for the 64-bit variant) the extra support for the 64-bit target shouldn’t be too much trouble.

Cyrill · July 19, 2023, 11:54pm

@koute no worries! Just giving our view on this, appreciate it being considered Excited for the initial results!

For the custom binary format, I have an idea floating around changing how message constructors and dispatch work (or is there another place to discuss this than in this already big post?). This would decrease contract sizes but require some fundamental changes to the current model, so now seems like a good time thinking about. Currently, contracts export a deploy and call in the Wasm module. Both these functions require some dispatching logic because there could be multiple constructors or messages. How about doing the following instead:

Treat any exported function as callable.
Remove the need for contracts bringing their own dispatching code. Realistically I don’t see why this is something contract authors or even contract languages need to have control over. If exported functions can store the function selector and some flags whether the export is a constructor, payable or a fallback function, the contracts pallet could take over dispatching to the correct message. Another such flag could be whether the contract wants to read input, in which case the contracts pallet could provide the input and its length at a predefined memory location.

This will make migrating existing contracts harder, although I could see legacy support for contracts with just call and deploy exports work.

@Alex call me out if I’m missing something here.

koute · July 20, 2023, 11:29am

For now here is probably fine, although in the future I suppose we’ll want to move such conversations into e.g. a GitHub issue.

Hm, from my point of view that seems reasonable! Although we’ll still want to explicitly mark callable functions instead of letting any functions to be callable (since exporting a function can inhibit optimizations and in general has extra overhead, among other things), but besides that we could add whatever metadata we’d like to imports/exports. Initially I’m going to make it strictly WASM-like (so that the new VM is a drop-in-replacement for the old one), but a more rich calling ABI is certainly something we can have.

Alex · July 22, 2023, 5:38am

@Cyrill We very carefully designed pallet-contracts to not be aware of any ABI. Baking those assumptions and complexity into the VM just to save some bytes in the contract is not worth it in my opinion. There are way lower fruits to pick that don’t require destroying this separation of concerns. Is switching on a 4 byte integer really that much code? You could construct your selectors in a way that you can use a jump table and not a nested set of blocks. Every bit of assumption we bake into the VM needs to be implemented by every new language making it harder to create one and hindering innovation.

Regarding “exported functions”: This is a Wasm specific concept. Usually, a function is a compiler concept and not visible to any consumer of the code such as pallet-contracts. Except if including symbol tables which are for debugging purposes only. In RISC-V we will have a single entry point and pass the information if we call or deploy in a register.

koute · July 22, 2023, 11:17am

Please note that fundamentally the VM will support multiple entry points because I do want the ability to run full fat Substrate runtimes on it. (And supporting multiple entry-points doesn’t really increase the complexity of the VM very much, because it’s essentially just telling the VM to initially jump to a different address.) As to whether we’ll make use of that for contracts, well, I’ll leave the decision up to you. From my point of view either way is fine.

But we do need the metadata to know what’s callable in the contract anyway, right? In which case we could add an extra field in there specifying the address of the entry point for a given call removing the need to dispatch on it from within the contract itself, and then whatever is triggering that call (instead of passing the address as the argument to the contract’s single entry point) would pass it to pallet-contracts telling it “hey, start execution at address 0x12345678”. This wouldn’t necessarily make pallet-contracts aware of any particular contract-specific ABI - it could still just pass along whatever it receives as-is, treating it as a black box if it wishes to do so.

Cyrill · July 22, 2023, 12:50pm

Message dispatch is more involved than just that:

Call the input runtime API
Do a bounds check; bail if less than 4 bytes of data otherwise store the selector into an int
Switch on the selector
(non-payable functions) Call the value_transferred runtime API and bail if value was received
Bail or fallback function on o match

This leaves the contract having to do 1 or 2 costly API calls and multiple branches before the actual function or contract can be executed; tiny optimizations like jump tables won’t help much. Regarding code size, while for any non-trivial contract the dispatcher will be a fraction of the overall code size, having this code in every single contract still adds up. That’s why I thought it may be worth to solve it on the contracts pallet instead. I can make some benchmarks get some numbers.

I agree, that makes a lot of sense. However, doesn’t requiring Wasm modules that export call, deploy and a memory already create some kind of ABI? I mean if for our smart contracts runtime we already have a custom VM, custom binary format and possibly custom ISA. Requiring minimal amounts of metadata about the contracts function and messages in this binary format shouldn’t really hinder any language compiling supporting it?

Unless, the whole stack is not exclusively be used for smart contracts, which seems a possibility, I agree we must be very careful with that (or provide a way to opt out without introducing much complexity to).

I think so; If we replicate what we have with Wasm, the binary format must somehow provide a way to let the runtime know where the call and deploy functions reside.

koute · July 22, 2023, 1:54pm

Yeah, one point I’d like to make here is that even though we’ll have all of those custom things the general idea is that for the most part anyone that’d like to target our VM wouldn’t actually have to deal with most of these things, because our middle-end will take care of massaging the bog standard RISC-V ELF file that a compiler will emit into whatever will be required by the VM. This will both simplify things for the languages targeting the VM (because the amount of custom stuff they’ll have to do besides targeting RISC-V will be minimized), and also simplify any alternative VM implementations (because what they’ll have to accept/support will be significantly cut down; e.g. our ISA is actually even simpler than RISC-V, with even less instructions and being easier to decode e.g. it has a nice regular way of decoding immediate operands instead of the craziness that normal RISC-V has).

But of course there is going to be some custom stuff that will be unavoidable, e.g. the programs will have to conform to a particular memory map to be accepted. (Although in the future our middle-end could maybe support ingesting fully relocatable programs and automatically relocate them.)

I think what Alex was perhaps thinking about (please correct me if I’m wrong!) was to just have a single implicit, preset entry point into the contract at a static address (or just simply make the entry point the very first instruction), in which case this wouldn’t have to be provided anywhere.

That’s essentially how my initial prototype VM worked. But please note that this wouldn’t necessarily make things easier for other alternative languages targeting our VM! Because you’d still have to force the entry point to start at a particular address, and very few languages (if any) actually guarantee that their entry point physically starts at the beginning of the code section.

Alex · July 22, 2023, 2:22pm

Yes sure. I see no fundamental problem in this. However, since rustc does not support cdylib for RISC-V (only bin) it seems more natural to have a single entry point.

No. This is the point. pallet-contracts is unaware of what is callable within the contract. It only requires the two exported functions call and deploy. So adding this dispatcher would be a fundamental change to the ABI between the contract and pallet-contracts.

Yes. What I meant is the ABI defined between contracts by the metadata.json. The ABI between the contract and pallet-contracts is a different story and is exactly what you pointed out. I want to keep this is minimal as possible.

My concern is also for complexity. Putting complexity into privileged code (pallet-contracts) is much more costly than in userspace code (contracts). As you pointed out it doesn’t even stop at message dispatching. You also want to know if something is payable. Then you need introduce this concept, too. I am very concerned for the complexity creep here.

But allowing to just pass an address within the contract to use as entry point (as @koute) suggested could be a good solution. It wouldn’t require any additional on-chain metadata. The mapping from function to address will still be in the offchain metadata.json and the tooling will convert it into addresses.

Not necessarily. We can have a single entry point and just pass the information about which function was called in a register or memory.

Correct. And this is mainly because you can’t have a cdylib in rustc with RISC-V. Having multiple entry points would be a bit awkward as you still need to have the main function. My idea here is that our custom binary format will contain a single entry address that it parses from the emitted ELF file. No need to force it to a specific address.

Cyrill · July 22, 2023, 2:51pm

I like this idea. Thinking it a bit further, instead of passing the call origin in a register. Instead allow the contract to define some memory segment where the VM will map a struct containing information about the call context? Containing things like the called function (deploy / call), value transferred, and input. Virtually any contract doing something meaningful would need to access this information anyways. Defining this segment could be completely optional as well. But it would spare the dispatcher having to do multiple API calls until the actual contract message code is executed.

koute · July 22, 2023, 2:57pm

Are you sure that’s the case? From what I can see cdylib crates work just fine for RISC-V targets, and I can easily get multiple functions exported from my test program and don’t need a main. Maybe that’s only not supported for some of the RISC-V targets (after all there are 6 of them in total) or only in some older versions of the compiler?

A side note, format-wise if we want to go the route of a single entry point I’d rather go with just mandating that for smart contracts there’s only a single specific entry-point export, and just reject programs where there are more of them. (Practically this is going to be equivalent, but would allow us to reuse exactly the same format for smart contracts and for full fat runtimes, and just have the smart contracts one be a strict subset of the full one.)

Topic		Replies	Views
eBPF contracts hackathon Tech Talk wasm , smart-contracts	12	3317	June 8, 2023
Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!) Tech Talk wasm , smart-contracts	95	13614	July 3, 2024
We should create an RFC with the Bytecode Alliance to add no_std support Tech Talk wasm	0	400	September 15, 2022
Bringing solana's ebpf based smart-contract execution to polkadot	7	1157	November 15, 2022
Future of WebAssembly and ink! Smart Contracts Ecosystem ink , wasm , smart-contracts	10	963	September 29, 2022

Exploring alternatives to WASM for smart contracts

Related topics