eBPF contracts hackathon

eBPF hack report

@Alex and I discovered that we both have been entertaining the idea of using eBPF for contracts instead of WebAssembly (wasm).

Disclaimer: Speaking for myself here, maybe @Alex can give his perspective below.

Intro

After working with wasm in various environments, we naturally gained a deeper understanding of wasm, it’s strengths and it’s costs.

It enables the safe and performant execution of portable code with easily achievable determinism. However, it is also more complicated than it could be if focusing on our scenario. Below are some examples that come to my mind.

Runtime Structure. The runtime structure of a wasm module is rather flexible. A wasm module can declare different entities: functions, memories, tables and globals. Those can be exported from a wasm module, or they could be also imported into, meaning that those items must be provided at instantiation time.

We don’t really need that flexibility and could get away with much less. E.g. modeling global variables as separate entities might improve compiled code quality, but for linear time compilers this does not really matter.

Interaction with host. In wasm the main way to interact with the host is through function imports. The module would specify functions, their names and signatures, and the host would provide implementations of those during the module instantiation.

Memories. Memories can declare an initial size and optional a maximum size. During runtime a memory can grow up to its maximum size.

I am pretty sure contracts can get away with a limited amount of memory. pallet-contracts used to limit the memory size at fixed 1 MiB. Solana contracts are enjoying 32 KiB heaps.

Bytecode. There are several things, so I will highlight on only a few.

Wasm bytecode is a stack machine. The stack is unbounded. Moreover, a function can declare arguments and locals. That all means you actually need to perform some register allocation to get decent performance.

Wasm bytecode operates on primitive values of I32, I64, F32, and F64 types. Contracts don’t really need floating points and if they need it badly, they can use software implementations. Also, I am pretty sure there is no need for two integer types, and having only 64-bit numbers is pretty much OK.

One of the most obscure corners of Wasm is the polymorphic stack. This is a mode that makes the operand stack switch to special rules. It is enabled when the execution is trivally unreachable. But since it’s unreachable, it does not really matter and could as well be eliminated.

All those features bear some cost, either in the complexity of the API, the complexity of the compiler implementation, or performance.

One of the reasons why Polkadot still does not embed the API for Wasm VM is its complexity.

Also, it’s easy to abuse typical optimizing compilers and adversaries can craft contracts doing that. A canonical solution for this is to employ a linear-time (in code size) compiler. At the same time, we want to have a fully optimizing compiler for the Substrate Runtime / PVF execution. That means that in one way or another, Runtimes and contracts are destined to use different VMs.

Those thoughts made me consider taking a look at the other options. That’s how the idea for the hackathon started.

Learnings from Hackathon

eBPF’s instruction set is register-based. eBPF defines 10 general-purpose registers. eBPF does not split code into functions. It’s just one big blanket of code. No args and no locals. That all makes it a good target for translation into machine code in a single pass. Check out the source code of eBPF to x86-64 from the Linux kernel.

A big obstacle was the lack of documentation. A lot of time was spent reading the code, issues, and doing the good old trial and error.

Vanilla eBPF is not a really welcoming environment to write contracts. Since this tech originated from the kernel, it limited the space of expressable control flow. Specifically, it limited the usage of loops. People would use C with special macros to specify bounds for each loop. That immediately constraints the usefulness of the whole thing.

To avoid this, we decided to use the tech from Solana. Namely, they forked rust/llvm and rbpf. The fact that eBPF requires usage of a custom rustc/LLVM backend kind of diminishes it’s value. Also, managing their custom toolchain wasn’t the best experience ever, but hey, at least we did not have to repeat all the work ourselves.

After getting all setup, we got our first artifact: an ELF file. ELF is a bit archaic but mostly fine. But how the interaction with the host is set up is a bit weird. I still haven’t figured it out completely, but the same instruction is used for calling the code within the module and the host (syscall).

From a glance, it seemed that the call instruction has a 4-byte immediate value to specify the call target. Usually, it seems to be an offset in the text section. However, syscalls seem to be specified with special magic cookie values. I guess this is not standard behavior, but registering a syscall in Solana’s RBPF VM would take a hash (murmur) from the syscall name, and I assume a call with that hash would be interpreted as a syscall. I can’t call that an elegant solution, but fine, that works.

It does not end here. Even though the hash values can be computed upfront, Solana’s runtime depended on dynamic relocations. That is, all syscalls would be emitted as

LABEL_001: call -1 # my_syscall

paired with a relocation entry that specifies that the particular my_syscall is called at LABEL_001. When the ELF is loaded prior to the execution, the loader would go over all relocations, compute hash, in this instance of my_syscall, then write fixup all values from -1 to the actual hash value.

To be fair, Solana folks do realize the problem since there seem to be some feature flags, that compute those hashes offline. But for now, it appears to be off by default.

Another quirk that we noticed about Solana toolchain is that it seems the recommended way is to produce shared object ELFs, i.e. .so files. Yet, despite being shared objects they have a single entry point, which is specified in the ELF header. The way how it works is unclear since a binary having a no_mangle function is automatically assumed to be an entrypoint by the forked rustc/LLVM. Solana’s RBPF VM also seems to make some assumptions about this.

Looking deeper into RBPF made me unhappy. The overall quality of code questionable. An unsoundness hole is looking back at you right from the primary user facing API. The memory model is also very questionable. Per their overview, their address space is 64 bit, is not linear (i.e. has gaps) yet really tiny amount of that memory is used. This requires them to emit a pretty complex bounds check before each actual memory access. RBPF does this by emitting a call into a non trivial Rust function that does bounds check. Once more, per one memory access they would do several more accesses while calling Rust and doing bunch of work. To give you an example, in-kernel eBPF elides all checks based on verification. Wasm typically catches out-of-bound memory accesses after the fact, being almost zero-cost.

On the bright side, the Substrate Runtime Interface for the eBPF turned out to be way simpler than the wasm counterpart. However, don’t be deceived, since the API in this case also includes strict promise on stability. Changing, and that includes fixing, even small things can lead to a disaster.

When it comes to the eBPF format itself, one unpleasant surprise was that each instruction is 8 bytes long. This might be good for decoding simplicity, but not really great size-wise. Although those would probably compress well.

Conclusions

I don’t really have a good grasp of all the little details of eBPF. It’s my first time interacting with it closely.

The first impression was not really great, but this is mostly because of Solana’s toolchain and VM.

However, it is still left unclear if eBPF is great for contracts target. More research is needed here.

Limiting loops is not really great since that prevents compiling a big chunk of software out-of-box. Not sure how much Rust would still compile into eBPF fine, and that requires more research.

However, forking eBPF is also not cheap. It would require you to make changes to rustc/LLVM. I suspect that lack of loops is the foundation of why e.g., memory checks could be elided. Messing with that may be tricky.

But if you are forking eBPF to change some little details, there is a temptation to fix more things. Shrink the instructions. Remove some unneeded instructions. But at this point you might as well design a thing from scratch that satisfies the whole. With more changes, less and less tooling, testing corpus, etc, could be used. That, however, writing an entire backend and all accompanying tools and libraries is a ton of work, and one of the main reasons why we picked Wasm in the first place.

14 Likes

Why eBPF

My biggest incentivation for eBPF over Wasm is the ability to write efficient and simple single pass compilers. Interpretation is the main contributor by a large margin why contracts are much much slower than the runtime itself. eBPF is easier to compile because it is a register machine just like any target machine we ever intent su support (x86 + ARM at the moment). It is a RISC ISA with few registers so that it can be 1to1 mapped to x86-64 and ARM without any register allocation overhead.

There are good reasons why we choose Wasm for the runtime: It is an industrie standard with a good compiler (wasmtime) and tooling. However, smart contracts is a very niche application which cannot profit from this good compiler as it is not single pass. We have to roll our own tooling anways. If we do so anyways we could do it for an instruction set that is more fitting.

The other big incentive is simplicity: Wasm has a lot of high level constructs. Those are nice for analyzing the code and staying portable. However, it makes compilers and interpeters complicated. Single pass compilers outright slow. Also, validation of Wasm code is complicated and a huge attack surfaces. Case in point: A call instruction in Wasm is not constant time on almost all execution engines (because of the high level concept of local variables). Not accointing for that (by accident) during validation will open up the embedder to a Dos vector. Impossible in eBPF where local variables are not a concept and need to be managed by the eBPF instructions itself.

Using Vanilla eBPF

LLVM (and rustc) have a upstream BPF targets: bpfeb-unknown-none and bpfel-unknown-none. Please note that:

  1. rustup does not include a core or std library for them
  2. lld cannot link bpf code.

Both of that can be worked around. There is bpf-linker which can link bpf code into a cdylib. Its README contains instructions on how to build core (we are using that for wasm contracts too in order to re-build core with the panic-immediate-abort feature flag for smaller contract sizes).

Hence you can build eBPF programs with the upstream rust compiler (even on stable).

Vanilla eBPF is not suitable for smart contracts

However, there is a catch: The upstream eBPF target is specifically built to target the Linux kernel’s VM. This VM statically verifies all programs to terminate and don’t access any out of bound memory. This requires all programs to not visit the same code twice: The vanilla eBPF backend will fail to compile a code if it can’t make sure that this is the case. It does this by inlining everything unrolling all loops. Please note that this will prevent any programs to be compiled that do not bound loops statically. This makes writing programs for it very cumbersome and basically requires all dependencies of a contract to be purposely built.

For that reason Solana went ahead and forked the eBPF backend of LLVM and removed this constraint. Termination is then garantueed by good old gas metering. We probably want to go the same route. Theoretically, this does not seem too hard as it is mainly removing some checks. But we don’t know which other changes they did as there is no documentation. The obvious downside of this is that they have to maintain a custom Rust toolchain and deliver it to their users. I don’t think this would be a showstopper. We would download the toolchain with cargo contract.

There is the temptation to make further modifications when forking. We need to make sure that it is worth the hassle, though. It might be even conceivable to modify the upstream backend to add some options that allow us to bend BPF to our needs:

  1. Disable static termination checks
  2. Change memory mapping (might already be possible with bpf-linker) to linearly map everything to low memory
  3. Disable relocations (we probably can just dump the text section and establish some convention for a magic arg to call that is the syscall)

LLVM binutils work on eBPF files

In order to inspect the elf files that are the result of compiling an ebpf file we can use the upsatream LLVM binutils. Some interesting commands:

llvm-readelf --all contract.so
llvm-objdump --disassemble contract.so
llvm-strip contract.so

Porting pallet-contracts to bpf

This was pretty straightforward. The biggest difference is that there is no imported function definition in BPF. A call of any function just adheres to the bpf calling conventions: r1-r5 are arguments and r0 is for the return value. I was able to expose the complete SEAL API to a bpf program by just passing a pointer to the SCALE encoded list of arguments in r1. By modifying the define_env macro this could be done for all functions at the same time. In wasm arguments are passed as single arguments.

I think it would be viable to support both wasm and bpf at the same time for a transition. Instruction benchmarks for bpf are not really necessary as there are no complicated high level concepts: We can just ad-hoc benchmark and use the same weight for every instruction. Host function benchmarks can be shared.

Porting ink! to eBPF

I also tried to port ink! to eBPF. In theory this should be fairly easy since there are very little platform specific constructs in ink!. The biggest chunk of work would be to refactor the seal API but this is well abstracted and is able to support multiple APIs and the same time (bpf + wasm).

In practice ink! didn’t compile because of some error regarding proc macros. When including extern crate proc_macro it was not able to find this crate. But not always. Just sometimes when included from proc_macro2.

This also happens with the upstream eBPF target. My my working theory is that it is related to the missing std target for eBPF. The upstream eBPF doesn’t have std (core and alloc does compile). Solana’s target has but it seems to be incomplete.

Usually this doesn’t matter as proc macros run as run as compiler plugin (they only run with the host target). However, in the ink! case they somehow manage to break this.

It is not totally clear what is going on here and it needs more investigation. A minimal reproducer would be needed to see when exactly it breaks.

Conclusion

I am somewhat confident that pallet-contracts can support both wasm and eBPF and the same time without needing to change any logic. Most ugliness can be hidden inside the define_env macro to generate similar APIs from the same definition.

Whether only slight modifications to the upstream eBPF or a completely custom backend would be the best course of action is still an open question. I have some ideas to deal with some of the downsides of eBPF without resorting to our own backend: For example, the relocation of syscalls could be done by the build tool and during upload the deployer needs to specify the highest requested syscall API version (for forwards compatibility). The idea is that we need zero scans or validation of the bytecode before we start compilation.

9 Likes

We didn’t get ink! working because the issues mentioned above with proc macros. Apart from this:

No special config needed to compile. Just install the solana toolchain from their website.

You can’t use any contracts frontend for this because they need metadata and expect ink! ABI. So just manually submit the extrinsics or runtime API calls. For example with polkadot.js Apps.

This is how gas metering works today for out wasm contracts: https://github.com/paritytech/wasm-instrument/blob/b51701088e3d4f13b77047237a2480b488e6d099/src/gas_metering/mod.rs. We inject it as instrumentation into the contract when you deploy it to the chain.

Yes! I am talking about LLVM backends. They take LLVM IR and generate machine code.

C/Rust → eBPF uses LLVM as you pointed out and hence is unbounded. But we don’t care. This happens off-chain when you write the contract. We care about eBPF → MachineCode. Because this needs to happen every time when the contract is executed.

2 Likes

@Alex Thanks for elaborating, that helps a lot. I have a couple of follow-up questions:

C/Rust → eBPF uses LLVM as you pointed out and hence unbounded.

Unbounded := multi-pass?! I’m not familiar with all the compiler related terms (yet) as I just started looking into this a couple of days ago.

We care about eBPF → MachineCode. Because this needs to happen every time when the contract is executed.

I see, so basically the process should be:

  1. Compile Rust-based ink! smart contracts → eBPF ELF file
  2. Store the ELF file on-chain
  3. Execute the ELF file within the eBPF VM that will convert it to machine code

correct?

I don’t think it is necessarily the same. You could theoretically have multiple passes that are all O(n). Not sure if that makes sense, though. In reality multi pass compilers implement all sorts of optimization passes with all kinds of complexities. Some might not even terminate for all inputs. This is why I said unbounded.

correct

1 Like

@Alex so Alexio took a look at my PR and commented on eBPF’s platform support. I think eBPF is overall more lightweight than rustc/LLVM due to less tooling and its register based architecture, hence “easier to compile”, correct? Is there any difference w.r.t. platform support (x86 & ARM) in comparison to its LLVM counterpart?

Also, feel free to have a look at the PR for the RFP yourself:)

It is not optimal to review a markdown file because the lines are long there when an in-depth review is called for (that not just corrects mistakes but want to add stuff). Of course I could manually quote sentences when commenting on a paragraph but I think we can do better. I suppose I am in a good position to write this proposal with all this knowledge fresh in my brain. I also care about this but barely have the time necessary to push this forward.

I think it could be more productive if I crate a PR with your PR as a target. I will extend the PR and then you can review this. But I can also post all of this as review comments if you want. What do you think?

In the meantime here are my thoughts on the “easier to compile” discussion. We should make something like this part of the proposal as context:

Simplicity in Validation

The simplicity we hope to achieve by switching away from Wasm is not only about compilation. This is certainly a big motivation for using BPF. Howeber, the simplicity also makes the validation of a contract much easier. When we say validation we mean the steps that the on chain logic needs to perform on every new contract to make sure it has a valid structure. If we fail to do so or have a bug in that logic that will most likely lead to security issues and leave the chain attackable. This is why we are very interested in making this step as simple as possible. Wasm works against this goal by being more similar to a high level language than a CPU instruction set such as x86, mips, arm or BPF. Wasm does this by including high level constructs usually absent from CPU instruction sets. Their existence requires us to engage with them and validate them:

Some examples are:

  • Functions
  • Local and Global Variables
  • Tables
  • Blocks
  • Types / Signatures
  • Start function

Let us look at one of them in more detail to understand which trouble they cause:

Functions

WASM: Can have different signatures in Wasm that need to validated. Execution engines might perform differently based on the number of arguments. With the multivalue proposal this extends to the return values, too. Additionally, the (startup) performance of a program might also depend on the number of functions: Creating a program with a lot of empty functions could be a possible DoS vector.

BPF: Functions are not a concept in BPF. Programming languages might group their code into functions but the program is just one big chunk of code when we receive it. A function call instruction in BPF has the same complexity every time. The actual logic of passing data to and from the function is implemented by the compiler and emitted as instructions which we have a much easier time understanding. Yes, there is a function call instruction. But functions are nowhere defined or declared and hence don’t need to be validated. There are no signatures that could influence the runtime of a function call. The call instruction just takes a a memory address and continues execution there. It might does some register housekeeping but all of this is constant time. If the call instruction doesn’t make sense (cause the rust compiler is broken for example) it doesn’t endanger our chain as it just becomes part of the deterministic (but faulty) behavior of the contract.

As you can see there is a lot to consider when trying to argue about the runtime of wasm contracts just by looking at this one feature.

Simplicity in compilation

The simplicity in validation also contributes to the simplicity in compilation because a compiler usually demands a program to be valid before it can be compiled. So any compiler would include some form of validation. This is quite important because this might add an additional pass to the compilation while a BPF compiler can “just start” right away. Please note that the validation that a smart contracts module built on substrate (e.g pallet-contracts) needs to perform goes above what a compiler does and is generally more tricky: The smart contracts module needs to make sure that any non linear or not accounted for behavior that arises from high level constructs (see see function example above) can’t happen. A compiler, if not written specifically for our use case, usually only cares about conformance to the Wasm standard.

But what might be even more interesting is the actual transformation from the VM bytecode (WASM, BPF) to native (Intel, Arm). To understand why BPF has a advantage over Wasm we should have a look on the two types of compilers: Multi pass (sometimes called optimizing compiler) and single pass compiler (sometimes called linear time compiler).

Multi Pass Compiler

A multi pass compiler usually transforms the source (WASM) into some intermediate representation (e.g LLVM IR), runs some optimizations on that IR and then generates the target code (Intel, Arm) from that IR. In reality it is of course much more complicated but I think this gets the point across. This approach has two major benefits over single pass compilers:

  1. It decouples the source language from the target language. Optimization can (theoretically) be written independently of the source or target language. So you don’t start from scratch when adding a new source or target language.

  2. It can generate fast code even when the source machine and target machine are very different or when the source → IR step doesn’t produce very efficient code (e.g rustc). This is why we can get away with using wasm for our runtime: We use wasmtime (a multi pass compiler) that delivers performance in the same order of magnitude as running native code.

This intermediate representation and optimization comes with a price. The transformation into the IR which is usually SSA (static single-assigment) is not linear over the input size in the general case. The optimizations then performed on the SSA are neither. Even if it was linear time there would still be multiple passes slowing us down. We conclude that those compilers are pretty slow because of the tradeoffs they make.

The runtime itself is compiled using a multi-pass compiler (wasmtime). This poses two (largely solved) challenges:

  1. The compile time of the runtime is unbounded in the general case. A malicious (or unlucky) parachain can run into the issue that the runtime can’t be compiled by validators (because it takes too long).

  2. Even for a benign runtime the compilation is slow. If we need to compile every time we execute the runtime it might probably not worthwhile in a lot of use cases to invest so much blockspace to compile the code. We rather interpret or single pass compile in this case.

These issues are addressed like this (I am no export on this issue so please correct me):

  1. Validators agree on a timeout of how long of a compilation time they would accept. They interrupt the compilation if this time is reached. They then vote whether the code could be successfully compiled in that time window. This is where my knowledge gets fuzzy: Disputes about the result of the compilation can be resolved as long as there is a honest majority. Bad validators not agreeing with the majority can be punished. What this means is that the compilation is part of or at least leverages the Polkadot protocol in order to resolve the in-determinism with regard to the subjective measure of time. This enables us to use a compiler of which we don’t have a solid grasp of the runtime complexity. We can transform this hard problem of guaranteed bounded compilation for every input using game theory.

  2. I am not very strong on the details here, either. I am just theorizing. It is not reflective of the actual code in substrate. Maybe someone with a solid understanding can chime in and correct me: The client can make use of caching. The code that needs to be run (the runtimes) are not too many in numbers. Having them all compiled in memory and ready to go is a reasonable expectation and we can design our weights around this expectation. Deploying a new runtime is a (purposefully) lengthy process that gives clients time to compile a runtime. Additionally, a runtime probably executes much more instructions after it is compiled: A whole block is run with the same code. So even if the runtime would be compiled for every block there is opportunity to amortize. Compare this to contracts where a block consists of calls to a lot of different codes. Given, they have less code each but they also often only execute a tiny fraction (a getter function for example).

Single Pass Compiler

If we look at the two issues raised above we can see they they are much harder to solve for contracts:

  1. A smart contract module that is part of the runtime (pallet-contracts). All of its actions are subject to consensus. It can’t resolve the in-determinism that arises from timing compilation. All validators need to come to exactly the same conclusion.

  2. In general we can’t make the validator clients cache anything in runtime logic. The PvF (parachain validation function) is stateless. That said, it is not impossible to make it happen. It just adds complexity but we might consider something like this in the future: As a remedy for being bottle necked by PoV sizes relay chain can offer a (paid) service to parachains to cache storage values. A similar service could be to offer the caching of compiled contracts: A client would be required to compile all of those contracts on startup and keep them around.

Looking at this it might be easier to “just” have a fast linear time compiler. We can throw any code at it and can be sure that its compilation time depends linearly only in the input size. This solves issue 1. The compilation step ideally is only a simple 1to1 mapping from source to target. This makes the compilation time not more than a linear scan and puts its startup time in the same league as interpreters.

Sadly, evidence points in the direction that this is not feasible with Wasm while also generating somewhat fast code. When going from a stack machine to a register machine one of the fundamental tasks is register allocation (deciding in which register to put the stack value). This is essentially an optimization problem and the quality of the allocation affects the performance. Only allowing linear time results in a far from optimal allocation and hence slower code. This is just one example. There is surely more that needs to be done which just makes the compilation rather slow affair. The only mature Wasm singlepass implementation I know of is Wasmer. We experimented with it. The startup speed is naturally much faster than wasmtime but a far cry from an interpreter. This makes it not worthwhile for a lot of contracts that just need to run a small portion of code.

BPF on the other hand is specifically designed so that it can be 1to1 mapped to a broad set of CPU architectures (certainly all we care about). The implementations are also dead simple. Just have a look at this compiler implemented within the Linux kernel in 2,5k lines of C code: bpf_jit_comp.c - arch/x86/net/bpf_jit_comp.c - Linux source code (v6.0.12) - Bootlin

Hi @Alex

Sure, feel free to do that.

Ah I see, thanks for clarifying. This wasn’t apparent to me indeed.

Pls let me know if you need any support regarding extending the RPF.

@Alex quick FYI: Since there hasn’t been any recent activity on the BFP contracts RFP PR and a sufficient number of W3F Grants Committee members approved it, we decided to merge it. However, don’t hesitate to update that RPF. You can always open up a new PR.

@Alex @seraya_w3f

Hello folks,
I’ve come along this thread and the RFP for BPF-based ink! smart contracts.
I’m asking if this is still needed, as I noticed that work for supporting RISC-V began a couple of month ago according to this commit.

If not, would there be an alternative RFP for further supporting the RISC-V target?
Thanks!

BPF was our first alternative target we tried simply because a VM already existed. However, since then we made the switch to RISC-V. Our current opinion is that it is a much better target. Hence there is no need for BPF anymore. More information here: contracts: Support RISC-V bytecode · Issue #13640 · paritytech/substrate · GitHub

Implementation is already ongoing. It is still a long way to go, though.

3 Likes

Would there be some other help the project could need that might be modelled into a similar RFP?

Yes. I think the best project we could finance would be to upstream RV32E support to LLVM. Which essentially means upstreaming this patch: ⚙ D70401 [RISCV] CodeGen of RVE and ilp32e/lp64e ABIs

1 Like