Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!)

There’s no need to bother emitting WASM or something.

I’ve tested making the browser run the runtime, and since there’s no way for smoldot and the runtime to share memory, and no way to do cooperative execution of multiple Wasm instances in the same thread, all the host function calls must be implemented by sending a message to webworker and copy the necessary data.
This completely nullifies any performance gained by the execution itself.

Maybe in the distant future this will be advantageous, but seeing the speed at which Wasm features are shipped there’s really no need to rush.

I’ve seen a couple times now that this work is related to coreplay/jam. Or that we’re betting on it in regard to those. Could someone speak to that?

The benefit is more that code like dalek should actually be optimized for WASM by now, so yeah the native vs VM gap increases to 5-10 x, just like it does in WASM, but it’s really the WASM vs PolkaVM gap that’s most interesting for you.

curve25519-dalek had an explicit u32 feature, so even if it cannot recognize your target correctly then you can probably still switch it to doing arithmetic in the way rustc+llvm handles better

I think there are two factors at play here:

  1. CorePlay will require us to suspend programs and resume them in another block. That requires persisting the entire state of that program. Something that is not supported in wasmtime. It is something that is pretty hard to implement for Wasm. At least in a platform independent and deterministic way. One of the reasons is that the stack is defined as infinite in the Wasm spec. In PolkaVM there is no stack. Just memory and a finite number of registers. Those are mapped 1to1 to native registers. This makes RISC-V a much better architecture for suspending the state.

  2. The number of different programs is expected to go up dramatically with Polkadot V2. Relying on pre-checking and caching is a source for all kinds of headaches in that scenario. Just knowing that every program is guaranteed to compile in linear time gives us a much bigger design space and reduces overall complexity and attack surface. When it is no longer required to purchase a whole slot in order to attack our pre-checking some people here will get a really bad sleep.

5 Likes

So here’s one more bonus update!

Compile time benchmarks

Due to popular demand I’ve also added compile time benchmarks.

The initial results were… good, but we weren’t the best. And us not being the best has the magic property of motivating me to make it so that we are the best. So while I was at it I also grabbed a profiler and improved our compile times a little bit. Here are the final results: (lower is better!)

  • PolkaVM: 433us
  • Wasmi (0.31.0): 603us
  • Wasmi (master, stack): 632us
  • Wasmi (master, register): 862us
  • Wazero: 1511us
  • PVF Executor: 2621us
  • Wasmer: 3000us
  • Wasmtime: 70578us

Yes, we have the fastest compilation times now; even faster than wasmi!

I also slightly improved the execution performance again, because why not:

  • PolkaVM: 5770us
  • Wasmtime: 6054us

So we have 5% faster execution times than wasmtime while having over 160 times faster compile times!

20 Likes

1000042388

10 Likes

Initial post mention 19x difference in almost same functions with different values so there is probably some value add to compare performances between different CPUs. Here are benchmarks done on NUC with AMD Ryzen 7940HS, 64GB DDR5 and 2x 2Tb PCIe4.0 nvme. Seems to be quite in line with koute’s benchmarks.

EDIT:unreliable data done without correct profile → see correct benchmarks on later post

Specs and raw test results
OS: Arch Linux x86_64
Kernel: 6.6.0-AMD
CPU: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics (16) @ 5.423GHz
 GPU: AMD ATI c7:00.0 Phoenix1
Memory: 48721MiB / 60007MiB
Disks:
- /dev/nvme0n1          Hanye ME70-2TA01                       2.05  TB /   2.05  TB    512   B +  0 B   HA02010B
- /dev/nvme1n1          Hanye ME70-2TA01                       2.05  TB /   2.05  TB    512   B +  0 B   HA02010B
3857 ± cargo run criterion                                                                                             ⏎ [23h23m] ✹ ✭
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `/home/user/src/polkavm/target/debug/benchtool criterion`
Benchmarking runtime/pinky/polkavm_no_gas
Benchmarking runtime/pinky/polkavm_no_gas: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.2s or enable flat sampling.
Benchmarking runtime/pinky/polkavm_no_gas: Collecting 10 samples in estimated 8.1955 s (55 iterations)
Benchmarking runtime/pinky/polkavm_no_gas: Analyzing
runtime/pinky/polkavm_no_gas
                        time:   [3.4724 ms 3.4929 ms 3.5141 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/pinky/polkavm_async_gas
Benchmarking runtime/pinky/polkavm_async_gas: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.9s or enable flat sampling.
Benchmarking runtime/pinky/polkavm_async_gas: Collecting 10 samples in estimated 8.8731 s (55 iterations)
Benchmarking runtime/pinky/polkavm_async_gas: Analyzing
runtime/pinky/polkavm_async_gas
                        time:   [3.9502 ms 3.9690 ms 3.9825 ms]
Benchmarking runtime/pinky/polkavm_sync_gas
Benchmarking runtime/pinky/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking runtime/pinky/polkavm_sync_gas: Collecting 10 samples in estimated 5.5443 s (30 iterations)
Benchmarking runtime/pinky/polkavm_sync_gas: Analyzing
runtime/pinky/polkavm_sync_gas
                        time:   [4.5688 ms 4.6044 ms 4.6381 ms]
Benchmarking runtime/pinky/wasmi_stack
Benchmarking runtime/pinky/wasmi_stack: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 792.6s.
Benchmarking runtime/pinky/wasmi_stack: Collecting 10 samples in estimated 792.57 s (10 iterations)
Benchmarking runtime/pinky/wasmi_stack: Analyzing
runtime/pinky/wasmi_stack
                        time:   [2.1397 s 2.1690 s 2.1993 s]
Benchmarking runtime/pinky/wasmi_register
Benchmarking runtime/pinky/wasmi_register: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 752.9s.
Benchmarking runtime/pinky/wasmi_register: Collecting 10 samples in estimated 752.92 s (10 iterations)
Benchmarking runtime/pinky/wasmi_register: Analyzing
runtime/pinky/wasmi_register
                        time:   [1.9806 s 2.0289 s 2.0808 s]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild
Benchmarking runtime/pinky/wasmtime_cranelift_default
Benchmarking runtime/pinky/wasmtime_cranelift_default: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.1s.
Benchmarking runtime/pinky/wasmtime_cranelift_default: Collecting 10 samples in estimated 6.1440 s (10 iterations)
Benchmarking runtime/pinky/wasmtime_cranelift_default: Analyzing
runtime/pinky/wasmtime_cranelift_default
                        time:   [4.3618 ms 4.4205 ms 4.4919 ms]
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.8s.
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 10.769 s (10 iterations)
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel: Analyzing
runtime/pinky/wasmtime_cranelift_with_fuel
                        time:   [5.9868 ms 6.0747 ms 6.1635 ms]
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.8s.
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 6.8213 s (10 iterations)
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch: Analyzing
runtime/pinky/wasmtime_cranelift_with_epoch
                        time:   [4.9971 ms 5.0361 ms 5.0767 ms]
Benchmarking runtime/pinky/wasmer
Benchmarking runtime/pinky/wasmer: Warming up for 3.0000 s
Benchmarking runtime/pinky/wasmer: Collecting 10 samples in estimated 8.1066 s (20 iterations)
Benchmarking runtime/pinky/wasmer: Analyzing
runtime/pinky/wasmer    time:   [9.2409 ms 9.3848 ms 9.5353 ms]
Benchmarking runtime/pinky/wazero
Benchmarking runtime/pinky/wazero: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.9s.
Benchmarking runtime/pinky/wazero: Collecting 10 samples in estimated 6.8980 s (10 iterations)
Benchmarking runtime/pinky/wazero: Analyzing
runtime/pinky/wazero    time:   [16.626 ms 16.979 ms 17.356 ms]
Benchmarking runtime/pinky/pvfexecutor
Benchmarking runtime/pinky/pvfexecutor: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.1s.
Benchmarking runtime/pinky/pvfexecutor: Collecting 10 samples in estimated 7.0689 s (10 iterations)
Benchmarking runtime/pinky/pvfexecutor: Analyzing
runtime/pinky/pvfexecutor
                        time:   [17.213 ms 17.357 ms 17.531 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/pinky/wasm3
Benchmarking runtime/pinky/wasm3: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 125.9s.
Benchmarking runtime/pinky/wasm3: Collecting 10 samples in estimated 125.87 s (10 iterations)
Benchmarking runtime/pinky/wasm3: Analyzing
runtime/pinky/wasm3     time:   [324.79 ms 334.33 ms 345.52 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/pinky/native
Benchmarking runtime/pinky/native: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.7s or enable flat sampling.
Benchmarking runtime/pinky/native: Collecting 10 samples in estimated 6.6911 s (55 iterations)
Benchmarking runtime/pinky/native: Analyzing
runtime/pinky/native    time:   [3.1249 ms 3.2632 ms 3.3393 ms]

Benchmarking runtime/prime-sieve/polkavm_no_gas
Benchmarking runtime/prime-sieve/polkavm_no_gas: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/polkavm_no_gas: Collecting 10 samples in estimated 8.3846 s (110 iterations)
Benchmarking runtime/prime-sieve/polkavm_no_gas: Analyzing
runtime/prime-sieve/polkavm_no_gas
                        time:   [1.7048 ms 1.7479 ms 1.7901 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/prime-sieve/polkavm_async_gas
Benchmarking runtime/prime-sieve/polkavm_async_gas: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/polkavm_async_gas: Collecting 10 samples in estimated 8.6181 s (110 iterations)
Benchmarking runtime/prime-sieve/polkavm_async_gas: Analyzing
runtime/prime-sieve/polkavm_async_gas
                        time:   [1.7024 ms 1.7387 ms 1.7741 ms]
Benchmarking runtime/prime-sieve/polkavm_sync_gas
Benchmarking runtime/prime-sieve/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/polkavm_sync_gas: Collecting 10 samples in estimated 9.5357 s (110 iterations)
Benchmarking runtime/prime-sieve/polkavm_sync_gas: Analyzing
runtime/prime-sieve/polkavm_sync_gas
                        time:   [2.0300 ms 2.0557 ms 2.0782 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/prime-sieve/wasmi_stack
Benchmarking runtime/prime-sieve/wasmi_stack: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 563.3s.
Benchmarking runtime/prime-sieve/wasmi_stack: Collecting 10 samples in estimated 563.34 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmi_stack: Analyzing
runtime/prime-sieve/wasmi_stack
                        time:   [1.4281 s 1.5005 s 1.5865 s]
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
Benchmarking runtime/prime-sieve/wasmi_register
Benchmarking runtime/prime-sieve/wasmi_register: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 590.2s.
Benchmarking runtime/prime-sieve/wasmi_register: Collecting 10 samples in estimated 590.25 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmi_register: Analyzing
runtime/prime-sieve/wasmi_register
                        time:   [1.2612 s 1.3140 s 1.3821 s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 14.1s.
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default: Collecting 10 samples in estimated 14.065 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default: Analyzing
runtime/prime-sieve/wasmtime_cranelift_default
                        time:   [1.6574 ms 1.7477 ms 1.8427 ms]
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 18.9s.
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 18.941 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel: Analyzing
runtime/prime-sieve/wasmtime_cranelift_with_fuel
                        time:   [2.0841 ms 2.1284 ms 2.1800 ms]
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 16.2s.
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 16.204 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch: Analyzing
runtime/prime-sieve/wasmtime_cranelift_with_epoch
                        time:   [1.8622 ms 1.9513 ms 2.0411 ms]
Benchmarking runtime/prime-sieve/wasmer
Benchmarking runtime/prime-sieve/wasmer: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/wasmer: Collecting 10 samples in estimated 5.4978 s (30 iterations)
Benchmarking runtime/prime-sieve/wasmer: Analyzing
runtime/prime-sieve/wasmer
                        time:   [4.6220 ms 4.6705 ms 4.7260 ms]
Benchmarking runtime/prime-sieve/wazero
Benchmarking runtime/prime-sieve/wazero: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling.
Benchmarking runtime/prime-sieve/wazero: Collecting 10 samples in estimated 8.8448 s (55 iterations)
Benchmarking runtime/prime-sieve/wazero: Analyzing
runtime/prime-sieve/wazero
                        time:   [4.2059 ms 4.2347 ms 4.2708 ms]
Benchmarking runtime/prime-sieve/pvfexecutor
Benchmarking runtime/prime-sieve/pvfexecutor: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/pvfexecutor: Collecting 10 samples in estimated 5.3890 s (20 iterations)
Benchmarking runtime/prime-sieve/pvfexecutor: Analyzing
runtime/prime-sieve/pvfexecutor
                        time:   [6.0807 ms 6.1702 ms 6.2776 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking runtime/prime-sieve/wasm3
Benchmarking runtime/prime-sieve/wasm3: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 64.5s.
Benchmarking runtime/prime-sieve/wasm3: Collecting 10 samples in estimated 64.465 s (10 iterations)
Benchmarking runtime/prime-sieve/wasm3: Analyzing
runtime/prime-sieve/wasm3
                        time:   [168.93 ms 171.66 ms 174.36 ms]
Benchmarking runtime/prime-sieve/native
Benchmarking runtime/prime-sieve/native: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/native: Collecting 10 samples in estimated 7.3036 s (165 iterations)
Benchmarking runtime/prime-sieve/native: Analyzing
runtime/prime-sieve/native
                        time:   [1.1463 ms 1.1470 ms 1.1479 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benchmarking compilation/pinky/polkavm_no_gas
Benchmarking compilation/pinky/polkavm_no_gas: Warming up for 3.0000 s
Benchmarking compilation/pinky/polkavm_no_gas: Collecting 10 samples in estimated 5.1307 s (1100 iterations)
Benchmarking compilation/pinky/polkavm_no_gas: Analyzing
compilation/pinky/polkavm_no_gas
                        time:   [4.6757 ms 4.7766 ms 4.8837 ms]
Found 3 outliers among 10 measurements (30.00%)
  1 (10.00%) low mild
  2 (20.00%) high severe
Benchmarking compilation/pinky/polkavm_async_gas
Benchmarking compilation/pinky/polkavm_async_gas: Warming up for 3.0000 s
Benchmarking compilation/pinky/polkavm_async_gas: Collecting 10 samples in estimated 5.0304 s (990 iterations)
Benchmarking compilation/pinky/polkavm_async_gas: Analyzing
compilation/pinky/polkavm_async_gas
                        time:   [5.0612 ms 5.0811 ms 5.0998 ms]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild
Benchmarking compilation/pinky/polkavm_sync_gas
Benchmarking compilation/pinky/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking compilation/pinky/polkavm_sync_gas: Collecting 10 samples in estimated 5.2140 s (935 iterations)
Benchmarking compilation/pinky/polkavm_sync_gas: Analyzing
compilation/pinky/polkavm_sync_gas
                        time:   [5.5725 ms 5.6143 ms 5.6586 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking compilation/pinky/wasmi_stack
Benchmarking compilation/pinky/wasmi_stack: Warming up for 3.0000 s
Benchmarking compilation/pinky/wasmi_stack: Collecting 10 samples in estimated 5.6407 s (385 iterations)
Benchmarking compilation/pinky/wasmi_stack: Analyzing
compilation/pinky/wasmi_stack
                        time:   [14.454 ms 14.492 ms 14.530 ms]
Benchmarking compilation/pinky/wasmi_register
Benchmarking compilation/pinky/wasmi_register: Warming up for 3.0000 s
Benchmarking compilation/pinky/wasmi_register: Collecting 10 samples in estimated 5.7161 s (330 iterations)
Benchmarking compilation/pinky/wasmi_register: Analyzing
compilation/pinky/wasmi_register
                        time:   [17.204 ms 17.266 ms 17.312 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking compilation/pinky/wasmtime_cranelift_default
Benchmarking compilation/pinky/wasmtime_cranelift_default: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.3s.
Benchmarking compilation/pinky/wasmtime_cranelift_default: Collecting 10 samples in estimated 10.326 s (10 iterations)
Benchmarking compilation/pinky/wasmtime_cranelift_default: Analyzing
compilation/pinky/wasmtime_cranelift_default
                        time:   [1.0226 s 1.0258 s 1.0294 s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.0s.
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 13.018 s (10 iterations)
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel: Analyzing
compilation/pinky/wasmtime_cranelift_with_fuel
                        time:   [1.3147 s 1.3325 s 1.3549 s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 12.1s.
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 12.146 s (10 iterations)
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch: Analyzing
compilation/pinky/wasmtime_cranelift_with_epoch
                        time:   [1.2184 s 1.2362 s 1.2550 s]
Benchmarking compilation/pinky/wasmer
Benchmarking compilation/pinky/wasmer: Warming up for 3.0000 s
Benchmarking compilation/pinky/wasmer: Collecting 10 samples in estimated 5.7392 s (275 iterations)
Benchmarking compilation/pinky/wasmer: Analyzing
compilation/pinky/wasmer
                        time:   [20.024 ms 20.383 ms 20.731 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
Benchmarking compilation/pinky/wazero
Benchmarking compilation/pinky/wazero: Warming up for 3.0000 s
Benchmarking compilation/pinky/wazero: Collecting 10 samples in estimated 5.0228 s (3245 iterations)
Benchmarking compilation/pinky/wazero: Analyzing
compilation/pinky/wazero
                        time:   [1.3963 ms 1.4322 ms 1.4836 ms]
Benchmarking compilation/pinky/pvfexecutor
Benchmarking compilation/pinky/pvfexecutor: Warming up for 3.0000 s
Benchmarking compilation/pinky/pvfexecutor: Collecting 10 samples in estimated 5.2229 s (495 iterations)
Benchmarking compilation/pinky/pvfexecutor: Analyzing
compilation/pinky/pvfexecutor
                        time:   [10.507 ms 10.579 ms 10.642 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

Benchmarking compilation/prime-sieve/polkavm_no_gas
Benchmarking compilation/prime-sieve/polkavm_no_gas: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/polkavm_no_gas: Collecting 10 samples in estimated 5.6755 s (440 iterations)
Benchmarking compilation/prime-sieve/polkavm_no_gas: Analyzing
compilation/prime-sieve/polkavm_no_gas
                        time:   [12.815 ms 12.900 ms 12.959 ms]
Benchmarking compilation/prime-sieve/polkavm_async_gas
Benchmarking compilation/prime-sieve/polkavm_async_gas: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/polkavm_async_gas: Collecting 10 samples in estimated 5.3764 s (385 iterations)
Benchmarking compilation/prime-sieve/polkavm_async_gas: Analyzing
compilation/prime-sieve/polkavm_async_gas
                        time:   [13.927 ms 13.964 ms 14.006 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking compilation/prime-sieve/polkavm_sync_gas
Benchmarking compilation/prime-sieve/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/polkavm_sync_gas: Collecting 10 samples in estimated 5.0258 s (330 iterations)
Benchmarking compilation/prime-sieve/polkavm_sync_gas: Analyzing
compilation/prime-sieve/polkavm_sync_gas
                        time:   [15.127 ms 15.199 ms 15.272 ms]
Benchmarking compilation/prime-sieve/wasmi_stack
Benchmarking compilation/prime-sieve/wasmi_stack: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wasmi_stack: Collecting 10 samples in estimated 5.2597 s (165 iterations)
Benchmarking compilation/prime-sieve/wasmi_stack: Analyzing
compilation/prime-sieve/wasmi_stack
                        time:   [31.365 ms 31.498 ms 31.596 ms]
Benchmarking compilation/prime-sieve/wasmi_register
Benchmarking compilation/prime-sieve/wasmi_register: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wasmi_register: Collecting 10 samples in estimated 6.1325 s (165 iterations)
Benchmarking compilation/prime-sieve/wasmi_register: Analyzing
compilation/prime-sieve/wasmi_register
                        time:   [36.949 ms 37.048 ms 37.130 ms]
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 20.4s.
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default: Collecting 10 samples in estimated 20.426 s (10 iterations)
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default: Analyzing
compilation/prime-sieve/wasmtime_cranelift_default
                        time:   [2.1173 s 2.2556 s 2.4112 s]
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 26.6s.
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 26.634 s (10 iterations)
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel: Analyzing
compilation/prime-sieve/wasmtime_cranelift_with_fuel
                        time:   [2.5312 s 2.5530 s 2.5723 s]
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 22.3s.
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 22.295 s (10 iterations)
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch: Analyzing
compilation/prime-sieve/wasmtime_cranelift_with_epoch
                        time:   [2.1562 s 2.2061 s 2.2580 s]
Benchmarking compilation/prime-sieve/wasmer
Benchmarking compilation/prime-sieve/wasmer: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wasmer: Collecting 10 samples in estimated 7.3709 s (165 iterations)
Benchmarking compilation/prime-sieve/wasmer: Analyzing
compilation/prime-sieve/wasmer
                        time:   [44.534 ms 45.266 ms 46.283 ms]
Benchmarking compilation/prime-sieve/wazero
Benchmarking compilation/prime-sieve/wazero: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wazero: Collecting 10 samples in estimated 5.0900 s (1650 iterations)
Benchmarking compilation/prime-sieve/wazero: Analyzing
compilation/prime-sieve/wazero
                        time:   [2.8460 ms 2.9335 ms 3.0683 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
Benchmarking compilation/prime-sieve/pvfexecutor
Benchmarking compilation/prime-sieve/pvfexecutor: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/pvfexecutor: Collecting 10 samples in estimated 6.5460 s (220 iterations)
Benchmarking compilation/prime-sieve/pvfexecutor: Analyzing
compilation/prime-sieve/pvfexecutor
                        time:   [29.539 ms 29.675 ms 29.833 ms]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low severe
  1 (10.00%) low mild
benchtool benchmark
runtime/pinky/polkavm_no_gas: 3562us
compilation/pinky/polkavm_no_gas: 4429us
runtime/pinky/polkavm_async_gas: 4058us
compilation/pinky/polkavm_async_gas: 4381us
runtime/pinky/polkavm_sync_gas: 4622us
compilation/pinky/polkavm_sync_gas: 5175us
runtime/pinky/wasmi_stack: 2.193s
compilation/pinky/wasmi_stack: 12ms
runtime/pinky/wasmi_register: 1.890s
compilation/pinky/wasmi_register: 14ms
runtime/pinky/wasmtime_cranelift_default: 4087us
compilation/pinky/wasmtime_cranelift_default: 857ms
runtime/pinky/wasmtime_cranelift_with_fuel: 5955us
compilation/pinky/wasmtime_cranelift_with_fuel: 1.135s
runtime/pinky/wasmtime_cranelift_with_epoch: 4790us
compilation/pinky/wasmtime_cranelift_with_epoch: 996ms
runtime/pinky/wasmer: 8796us
compilation/pinky/wasmer: 14ms
runtime/pinky/wazero: 14ms
compilation/pinky/wazero: 1044us
runtime/pinky/pvfexecutor: 13ms
compilation/pinky/pvfexecutor: 8547us
runtime/pinky/wasm3: 298ms
runtime/pinky/native: 2770us
runtime/prime-sieve/polkavm_no_gas: 1539us
compilation/prime-sieve/polkavm_no_gas: 12ms
runtime/prime-sieve/polkavm_async_gas: 1596us
compilation/prime-sieve/polkavm_async_gas: 12ms
runtime/prime-sieve/polkavm_sync_gas: 1674us
compilation/prime-sieve/polkavm_sync_gas: 14ms
runtime/prime-sieve/wasmi_stack: 1.253s
compilation/prime-sieve/wasmi_stack: 27ms
runtime/prime-sieve/wasmi_register: 1.139s
compilation/prime-sieve/wasmi_register: 32ms
runtime/prime-sieve/wasmtime_cranelift_default: 1431us
compilation/prime-sieve/wasmtime_cranelift_default: 1.773s
runtime/prime-sieve/wasmtime_cranelift_with_fuel: 1605us
compilation/prime-sieve/wasmtime_cranelift_with_fuel: 2.289s
runtime/prime-sieve/wasmtime_cranelift_with_epoch: 1657us
compilation/prime-sieve/wasmtime_cranelift_with_epoch: 1.893s
runtime/prime-sieve/wasmer: 4446us
compilation/prime-sieve/wasmer: 37ms
runtime/prime-sieve/wazero: 3704us
compilation/prime-sieve/wazero: 2153us
runtime/prime-sieve/pvfexecutor: 5683us
compilation/prime-sieve/pvfexecutor: 25ms
runtime/prime-sieve/wasm3: 158ms
runtime/prime-sieve/native: 982us

4 Likes

This is absolutely wild.

1 Like

Did I read that right? On newer hardware (@hitchhooker’s 7940HS) we’re almost as fast as native code now!? (3492us for PolkaVM vs 3263us for native) Okay, even I didn’t expect that!

Anyway, thanks for the extra benchmark results!

2 Likes

The PolkaVM results are indeed impressive.

Looking at the wasmi results, am I right that it says that wasmi is roughly 682-1275 times slower than native execution? In my own experiences I saw slowdowns between 60-80 (stack-machine) and 30-40 (register-machine). Also 160x for Wasm3 looks a bit off. The benchmarks posted by Koute earlier were much more aligned to what I experienced so far when benchmarking wasmi. One explanation is that wasmi ran in debug mode or at least without proper optimization settings. :sweat_smile:

Maybe this also explains the small gap between PolkaVM and native code if native code was not compiled with best optimization settings as well? For wasmi it is critical to be compiled via

[profile.release]
codegen-units = 1
lto = "fat"

pardon me. I were clearly high out of e1m1 riffs skipping correct profiles and running benchmarks with 6d uptime with dev processes hindering the native execution.

rebooted and redid criterion tests with correct profile:

runtime/pinky/native: 2.4457ms
runtime/pinky/polkavm_no_gas: 3.4234 ms
runtime/pinky/polkavm_async_gas: 4.0281 ms
runtime/pinky/polkavm_sync_gas: 4.4950 ms
runtime/pinky/wasmtime_cranelift_default: 3.7975 ms
runtime/pinky/wasmtime_cranelift_with_fuel: 5.2632 ms
runtime/pinky/wasmtime_cranelift_with_epoch: 4.2711 ms

hwspecs
OS: Arch Linux x86_64
Kernel: 6.6.0-AMD
CPU: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics (16) @ 5.423GHz
 GPU: AMD ATI c7:00.0 Phoenix1
Memory: 48721MiB / 60007MiB
Disks(striped-zfs):
- /dev/nvme0n1          Hanye ME70-2TA01                       2.05  TB /   2.05  TB    512   B +  0 B   HA02010B
- /dev/nvme1n1          Hanye ME70-2TA01                       2.05  TB /   2.05  TB    512   B +  0 B   HA02010B

benchtool logs
plot gen gist

2 Likes

This looks way more like what I would expect. Still impressive from PolkaVM!

Thanks a lot for redoing the benchmarks @hitchhooker !

Btw.: To clarify the compile-time benchmarks of Wasm3: They are simply not yet implemented in the benchmarks. Wasm3 has 2 configs, lazy and eager compilation. Eager compilation is slower than wasmi (register) whereas lazy compilation is way faster than wasmi (stack) since it skips both compiling and even validating Wasm.

Key take-aways from the new benchmarks:

  • PolkaVM executes just 40-55% slower than native.
  • PolkaVM compiles 97-171x faster than Wasmtime.
  • PolkaVM performance is on par with Wasmtime, being slower in one and faster in another benchmark.
  • PolkaVM executes 16-18 times faster than wasmi (register).
  • PolkaVM compiles 30-40% faster than wasmi (stack) and 65-80% faster than wasmi (register).
  • wasmi (register) is ~10% faster than Wasm3 in one benchmark. (:partying_face:)
3 Likes

Again impressive benchmarks! :slight_smile: this results are exciting, even if its a non-goal to serve use cases outside Polkadot I think the use cases will just come to benefit from PolkaVM’s advantages, just how WASM started being used outside of the browser because it was a great alternative to existing solutions.
Some of the non-goals I wish I can help with in any way is embedded, seeing PVM quite light and memory friendly I wouldn’t be surprised if it can fit in that world and compete with Wasm3(that I use when playing with toys on weekends :child:) I’m very clueless about compilers or the feasibility of the task but would love to try get this to Xtensa or CortexM(help appreciated).

Well, the first question you’d have to ask is - do you want a full recompiler, or just an interpreter?

The interpreter should already mostly work, possibly with some very minor tweaks required. It’s not going to be fast because the interpreter is currently not optimized at all, but it should be functional. For a deeply embedded use you might need to make the polkavm crate no_std (and it currently isn’t no_std), but I’m not opposed to make it so (and in fact I’m planning on making it no_std myself eventually).

So making the interpreter work for whatever use case you have should be the first step. Once you have that fully working then you could think about making a recompiler work, and that would be significantly more complex (but still probably not very hard, since the VM is explicitly designed to be simple). It would need at least a sandbox implementation (probably based on the current generic sandbox), and a codegen backend.

You could do this as a prototype, and fundamentally I wouldn’t be opposed to merging it in, but definitely not before PolkaVM 1.0. (Because things are currently still in flux and I don’t want to have to simultaneously update multiple recompiler backends while I’m still changing things in a major way.)

1 Like

It has been mentioned before on the subject, but I want to say it too. At this stage, I think it is more important for the VM to be provable than anything else. If Risc is a viable VM, much more attention should be given to the implementation of RiscZero.

No it’s not, because a ZK VM like risc0 is not a general purpose VM. Yes, it’s cool, and it gives unique capabilities, but the use cases are, in general, different and it is not a replacement for a normal, general purpose VM.

Let me demonstrate.

I compiled my usual benchmark for both risc0 and PolkaVM, just slightly modified to run in one-shot mode (so that I don’t have to call into the VM multiple times, which I’m not sure risc0 even supports):

  • PolkaVM: 52 milliseconds
  • risc0 (dev mode, so no proof is generated): 44 seconds
  • risc0 (proof generation, CPU-only, on my 32-core Threadripper using all cores): didn’t finish in 10 minutes (I got bored and killed it), peak memory usage was 8.7GB
  • risc0 (proof generation, with GPU acceleration on my $2000 RTX 4090): 6 hours 10 minutes, peak RAM usage was 8.0GB, peak VRAM usage was 19GB
  • risc0 (proof verification): 37 seconds

(The peak memory usage figures might be inaccurate as I was just eyeballing them in top/nvidia-smi; they could have been higher while I was not looking.)

So, yeah, generating a proof is only, let me see… 436338x slower than simply executing this program in PolkaVM, and in the time it takes for risc0 to verify that proof (which is supposed to be the blazingly fast part!) PolkaVM can execute this program… a mere 725 times, no big deal.

Again, this isn’t meant to show that PolkaVM is better or anything like that, simply because those are entirely different things with entirely different use cases! Apples and oranges. One is not a replacement for the other.

8 Likes

Are you referring to the NES emulator benchmark?

Sorry if I missed it, but I can’t seem to find the code to be able to independently verify and look into this benchmark. Can you link me to this?

1 Like

You can find the benchmark here.

Usually I benchmark it by calling initialize once (but without timing it), and then I call run multiple times and then time only those run calls. But in this case to make things simpler I essentially measured a single initialize + run.

Here’s a diff against risc0 v0.19.0 with the example modified to run the benchmark. (You still need to manually copy the ROM after applying the patch.)

Yeah let’s not confuse goals here :slightly_smiling_face: Adding to Koutes response: While risc0 is definitively fun and games, please remind yourself that its still experimental (a huge disclaimer banner about that is the first thing present in the risc0 GitHub repository README, and that hasn’t changed since a year or so). And ZK VMs still have a looong way to go regarding execution (proofing) times of their guests.

However I argue that PolkaVM guests are in principle provable in risc0 or any other RiscV based ZK VM for that matter anyways. At the very least they are guaranteed to compile for that ZK VM too: PolkaVM using the most minimal instruction set that can realistically make it work plays massively into our favor. Meaning that anyone considering the privacy gains offered by those ZK VMs worth to pay the premium in execution time, can theoretically already just do that today. Note that, next to the contract execution itself, you’d also need to commit over the runtime / environment state, making it even more expensive.

1 Like

I mean, this seems like a pretty egregious comparison, because you’re comparing just the execution of the program in your VM to the execution, splitting of the program into segments for continuations (to have the proof generation be parallelized), io of storing the segment data to temp storage which includes dirty pages from execution (default), as well as other related logic.

I haven’t yet looked into and reproduced the proving time and verification of this very specific benchmark, but I guess that the reason the figure you cited seems orders of magnitude off is possibly one of:

  • Your verification log is incorrect println!("Verified in: {}", elapsed.as_secs_f64() * 1000.0); in your pastebin
  • The proof is not recursively generated and is instead verifying the 636 segment proofs individually of this 666,000,000+ cycle count program.

Not completely sure about this, as I also can’t verify exactly your workflow, but just wanted to share some perspective!