eBPF contracts hackathon

eBPF hack report

@Alex and I discovered that we both have been entertaining the idea of using eBPF for contracts instead of WebAssembly (wasm).

Disclaimer: Speaking for myself here, maybe @Alex can give his perspective below.

Intro

After working with wasm in various environments, we naturally gained a deeper understanding of wasm, it’s strengths and it’s costs.

It enables the safe and performant execution of portable code with easily achievable determinism. However, it is also more complicated than it could be if focusing on our scenario. Below are some examples that come to my mind.

Runtime Structure. The runtime structure of a wasm module is rather flexible. A wasm module can declare different entities: functions, memories, tables and globals. Those can be exported from a wasm module, or they could be also imported into, meaning that those items must be provided at instantiation time.

We don’t really need that flexibility and could get away with much less. E.g. modeling global variables as separate entities might improve compiled code quality, but for linear time compilers this does not really matter.

Interaction with host. In wasm the main way to interact with the host is through function imports. The module would specify functions, their names and signatures, and the host would provide implementations of those during the module instantiation.

Memories. Memories can declare an initial size and optional a maximum size. During runtime a memory can grow up to its maximum size.

I am pretty sure contracts can get away with a limited amount of memory. pallet-contracts used to limit the memory size at fixed 1 MiB. Solana contracts are enjoying 32 KiB heaps.

Bytecode. There are several things, so I will highlight on only a few.

Wasm bytecode is a stack machine. The stack is unbounded. Moreover, a function can declare arguments and locals. That all means you actually need to perform some register allocation to get decent performance.

Wasm bytecode operates on primitive values of I32, I64, F32, and F64 types. Contracts don’t really need floating points and if they need it badly, they can use software implementations. Also, I am pretty sure there is no need for two integer types, and having only 64-bit numbers is pretty much OK.

One of the most obscure corners of Wasm is the polymorphic stack. This is a mode that makes the operand stack switch to special rules. It is enabled when the execution is trivally unreachable. But since it’s unreachable, it does not really matter and could as well be eliminated.

All those features bear some cost, either in the complexity of the API, the complexity of the compiler implementation, or performance.

One of the reasons why Polkadot still does not embed the API for Wasm VM is its complexity.

Also, it’s easy to abuse typical optimizing compilers and adversaries can craft contracts doing that. A canonical solution for this is to employ a linear-time (in code size) compiler. At the same time, we want to have a fully optimizing compiler for the Substrate Runtime / PVF execution. That means that in one way or another, Runtimes and contracts are destined to use different VMs.

Those thoughts made me consider taking a look at the other options. That’s how the idea for the hackathon started.

Learnings from Hackathon

eBPF’s instruction set is register-based. eBPF defines 10 general-purpose registers. eBPF does not split code into functions. It’s just one big blanket of code. No args and no locals. That all makes it a good target for translation into machine code in a single pass. Check out the source code of eBPF to x86-64 from the Linux kernel.

A big obstacle was the lack of documentation. A lot of time was spent reading the code, issues, and doing the good old trial and error.

Vanilla eBPF is not a really welcoming environment to write contracts. Since this tech originated from the kernel, it limited the space of expressable control flow. Specifically, it limited the usage of loops. People would use C with special macros to specify bounds for each loop. That immediately constraints the usefulness of the whole thing.

To avoid this, we decided to use the tech from Solana. Namely, they forked rust/llvm and rbpf. The fact that eBPF requires usage of a custom rustc/LLVM backend kind of diminishes it’s value. Also, managing their custom toolchain wasn’t the best experience ever, but hey, at least we did not have to repeat all the work ourselves.

After getting all setup, we got our first artifact: an ELF file. ELF is a bit archaic but mostly fine. But how the interaction with the host is set up is a bit weird. I still haven’t figured it out completely, but the same instruction is used for calling the code within the module and the host (syscall).

From a glance, it seemed that the call instruction has a 4-byte immediate value to specify the call target. Usually, it seems to be an offset in the text section. However, syscalls seem to be specified with special magic cookie values. I guess this is not standard behavior, but registering a syscall in Solana’s RBPF VM would take a hash (murmur) from the syscall name, and I assume a call with that hash would be interpreted as a syscall. I can’t call that an elegant solution, but fine, that works.

It does not end here. Even though the hash values can be computed upfront, Solana’s runtime depended on dynamic relocations. That is, all syscalls would be emitted as

LABEL_001: call -1 # my_syscall

paired with a relocation entry that specifies that the particular my_syscall is called at LABEL_001. When the ELF is loaded prior to the execution, the loader would go over all relocations, compute hash, in this instance of my_syscall, then write fixup all values from -1 to the actual hash value.

To be fair, Solana folks do realize the problem since there seem to be some feature flags, that compute those hashes offline. But for now, it appears to be off by default.

Another quirk that we noticed about Solana toolchain is that it seems the recommended way is to produce shared object ELFs, i.e. .so files. Yet, despite being shared objects they have a single entry point, which is specified in the ELF header. The way how it works is unclear since a binary having a no_mangle function is automatically assumed to be an entrypoint by the forked rustc/LLVM. Solana’s RBPF VM also seems to make some assumptions about this.

Looking deeper into RBPF made me unhappy. The overall quality of code questionable. An unsoundness hole is looking back at you right from the primary user facing API. The memory model is also very questionable. Per their overview, their address space is 64 bit, is not linear (i.e. has gaps) yet really tiny amount of that memory is used. This requires them to emit a pretty complex bounds check before each actual memory access. RBPF does this by emitting a call into a non trivial Rust function that does bounds check. Once more, per one memory access they would do several more accesses while calling Rust and doing bunch of work. To give you an example, in-kernel eBPF elides all checks based on verification. Wasm typically catches out-of-bound memory accesses after the fact, being almost zero-cost.

On the bright side, the Substrate Runtime Interface for the eBPF turned out to be way simpler than the wasm counterpart. However, don’t be deceived, since the API in this case also includes strict promise on stability. Changing, and that includes fixing, even small things can lead to a disaster.

When it comes to the eBPF format itself, one unpleasant surprise was that each instruction is 8 bytes long. This might be good for decoding simplicity, but not really great size-wise. Although those would probably compress well.

Conclusions

I don’t really have a good grasp of all the little details of eBPF. It’s my first time interacting with it closely.

The first impression was not really great, but this is mostly because of Solana’s toolchain and VM.

However, it is still left unclear if eBPF is great for contracts target. More research is needed here.

Limiting loops is not really great since that prevents compiling a big chunk of software out-of-box. Not sure how much Rust would still compile into eBPF fine, and that requires more research.

However, forking eBPF is also not cheap. It would require you to make changes to rustc/LLVM. I suspect that lack of loops is the foundation of why e.g., memory checks could be elided. Messing with that may be tricky.

But if you are forking eBPF to change some little details, there is a temptation to fix more things. Shrink the instructions. Remove some unneeded instructions. But at this point you might as well design a thing from scratch that satisfies the whole. With more changes, less and less tooling, testing corpus, etc, could be used. That, however, writing an entire backend and all accompanying tools and libraries is a ton of work, and one of the main reasons why we picked Wasm in the first place.

11 Likes

Why eBPF

My biggest incentivation for eBPF over Wasm is the ability to write efficient and simple single pass compilers. Interpretation is the main contributor by a large margin why contracts are much much slower than the runtime itself. eBPF is easier to compile because it is a register machine just like any target machine we ever intent su support (x86 + ARM at the moment). It is a RISC ISA with few registers so that it can be 1to1 mapped to x86-64 and ARM without any register allocation overhead.

There are good reasons why we choose Wasm for the runtime: It is an industrie standard with a good compiler (wasmtime) and tooling. However, smart contracts is a very niche application which cannot profit from this good compiler as it is not single pass. We have to roll our own tooling anways. If we do so anyways we could do it for an instruction set that is more fitting.

The other big incentive is simplicity: Wasm has a lot of high level constructs. Those are nice for analyzing the code and staying portable. However, it makes compilers and interpeters complicated. Single pass compilers outright slow. Also, validation of Wasm code is complicated and a huge attack surfaces. Case in point: A call instruction in Wasm is not constant time on almost all execution engines (because of the high level concept of local variables). Not accointing for that (by accident) during validation will open up the embedder to a Dos vector. Impossible in eBPF where local variables are not a concept and need to be managed by the eBPF instructions itself.

Using Vanilla eBPF

LLVM (and rustc) have a upstream BPF targets: bpfeb-unknown-none and bpfel-unknown-none. Please note that:

  1. rustup does not include a core or std library for them
  2. lld cannot link bpf code.

Both of that can be worked around. There is bpf-linker which can link bpf code into a cdylib. Its README contains instructions on how to build core (we are using that for wasm contracts too in order to re-build core with the panic-immediate-abort feature flag for smaller contract sizes).

Hence you can build eBPF programs with the upstream rust compiler (even on stable).

Vanilla eBPF is not suitable for smart contracts

However, there is a catch: The upstream eBPF target is specifically built to target the Linux kernel’s VM. This VM statically verifies all programs to terminate and don’t access any out of bound memory. This requires all programs to not visit the same code twice: The vanilla eBPF backend will fail to compile a code if it can’t make sure that this is the case. It does this by inlining everything unrolling all loops. Please note that this will prevent any programs to be compiled that do not bound loops statically. This makes writing programs for it very cumbersome and basically requires all dependencies of a contract to be purposely built.

For that reason Solana went ahead and forked the eBPF backend of LLVM and removed this constraint. Termination is then garantueed by good old gas metering. We probably want to go the same route. Theoretically, this does not seem too hard as it is mainly removing some checks. But we don’t know which other changes they did as there is no documentation. The obvious downside of this is that they have to maintain a custom Rust toolchain and deliver it to their users. I don’t think this would be a showstopper. We would download the toolchain with cargo contract.

There is the temptation to make further modifications when forking. We need to make sure that it is worth the hassle, though. It might be even conceivable to modify the upstream backend to add some options that allow us to bend BPF to our needs:

  1. Disable static termination checks
  2. Change memory mapping (might already be possible with bpf-linker) to linearly map everything to low memory
  3. Disable relocations (we probably can just dump the text section and establish some convention for a magic arg to call that is the syscall)

LLVM binutils work on eBPF files

In order to inspect the elf files that are the result of compiling an ebpf file we can use the upsatream LLVM binutils. Some interesting commands:

llvm-readelf --all contract.so
llvm-objdump --disassemble contract.so
llvm-strip contract.so

Porting pallet-contracts to bpf

This was pretty straightforward. The biggest difference is that there is no imported function definition in BPF. A call of any function just adheres to the bpf calling conventions: r1-r5 are arguments and r0 is for the return value. I was able to expose the complete SEAL API to a bpf program by just passing a pointer to the SCALE encoded list of arguments in r1. By modifying the define_env macro this could be done for all functions at the same time. In wasm arguments are passed as single arguments.

I think it would be viable to support both wasm and bpf at the same time for a transition. Instruction benchmarks for bpf are not really necessary as there are no complicated high level concepts: We can just ad-hoc benchmark and use the same weight for every instruction. Host function benchmarks can be shared.

Porting ink! to eBPF

I also tried to port ink! to eBPF. In theory this should be fairly easy since there are very little platform specific constructs in ink!. The biggest chunk of work would be to refactor the seal API but this is well abstracted and is able to support multiple APIs and the same time (bpf + wasm).

In practice ink! didn’t compile because of some error regarding proc macros. When including extern crate proc_macro it was not able to find this crate. But not always. Just sometimes when included from proc_macro2.

This also happens with the upstream eBPF target. My my working theory is that it is related to the missing std target for eBPF. The upstream eBPF doesn’t have std (core and alloc does compile). Solana’s target has but it seems to be incomplete.

Usually this doesn’t matter as proc macros run as run as compiler plugin (they only run with the host target). However, in the ink! case they somehow manage to break this.

It is not totally clear what is going on here and it needs more investigation. A minimal reproducer would be needed to see when exactly it breaks.

Conclusion

I am somewhat confident that pallet-contracts can support both wasm and eBPF and the same time without needing to change any logic. Most ugliness can be hidden inside the define_env macro to generate similar APIs from the same definition.

Whether only slight modifications to the upstream eBPF or a completely custom backend would be the best course of action is still an open question. I have some ideas to deal with some of the downsides of eBPF without resorting to our own backend: For example, the relocation of syscalls could be done by the build tool and during upload the deployer needs to specify the highest requested syscall API version (for forwards compatibility). The idea is that we need zero scans or validation of the bytecode before we start compilation.

7 Likes

Thanks for sharing your results.
I’m currently creating an RFP for BPF-based ink! smart contracts and am reaching out to ask whether you would be willing to share the code and configs you used to attempt to compile the contracts, even though it’s not working yet.

In addition to that, I have a couple of questions:

  • what exactly do you mean by gas metering @Alex? Where can I learn more about the concept of gas metering in ink! smartcontracts?
  • whenever you mention backend I assume you refer to the backend part of the compiler, which takes care of the conversion to machine code, is my understanding correct here?
  • is eBPF really single pass? Afaik eBPF uses LLVM. And the LLVM optimizer seems to be multi-pass (sorry, maybe not the best source).

We didn’t get ink! working because the issues mentioned above with proc macros. Apart from this:

No special config needed to compile. Just install the solana toolchain from their website.

You can’t use any contracts frontend for this because they need metadata and expect ink! ABI. So just manually submit the extrinsics or runtime API calls. For example with polkadot.js Apps.

This is how gas metering works today for out wasm contracts: wasm-instrument/mod.rs at b51701088e3d4f13b77047237a2480b488e6d099 · paritytech/wasm-instrument · GitHub. We inject it as instrumentation into the contract when you deploy it to the chain.

Yes! I am talking about LLVM backends. They take LLVM IR and generate machine code.

C/Rust → eBPF uses LLVM as you pointed out and hence is unbounded. But we don’t care. This happens off-chain when you write the contract. We care about eBPF → MachineCode. Because this needs to happen every time when the contract is executed.

1 Like

@Alex Thanks for elaborating, that helps a lot. I have a couple of follow-up questions:

C/Rust → eBPF uses LLVM as you pointed out and hence unbounded.

Unbounded := multi-pass?! I’m not familiar with all the compiler related terms (yet) as I just started looking into this a couple of days ago.

We care about eBPF → MachineCode. Because this needs to happen every time when the contract is executed.

I see, so basically the process should be:

  1. Compile Rust-based ink! smart contracts → eBPF ELF file
  2. Store the ELF file on-chain
  3. Execute the ELF file within the eBPF VM that will convert it to machine code

correct?

I don’t think it is necessarily the same. You could theoretically have multiple passes that are all O(n). Not sure if that makes sense, though. In reality multi pass compilers implement all sorts of optimization passes with all kinds of complexities. Some might not even terminate for all inputs. This is why I said unbounded.

correct

1 Like