There’s no need to bother emitting WASM or something.
I’ve tested making the browser run the runtime, and since there’s no way for smoldot and the runtime to share memory, and no way to do cooperative execution of multiple Wasm instances in the same thread, all the host function calls must be implemented by sending a message to webworker and copy the necessary data.
This completely nullifies any performance gained by the execution itself.
Maybe in the distant future this will be advantageous, but seeing the speed at which Wasm features are shipped there’s really no need to rush.
The benefit is more that code like dalek should actually be optimized for WASM by now, so yeah the native vs VM gap increases to 5-10 x, just like it does in WASM, but it’s really the WASM vs PolkaVM gap that’s most interesting for you.
curve25519-dalek had an explicit u32 feature, so even if it cannot recognize your target correctly then you can probably still switch it to doing arithmetic in the way rustc+llvm handles better
CorePlay will require us to suspend programs and resume them in another block. That requires persisting the entire state of that program. Something that is not supported in wasmtime. It is something that is pretty hard to implement for Wasm. At least in a platform independent and deterministic way. One of the reasons is that the stack is defined as infinite in the Wasm spec. In PolkaVM there is no stack. Just memory and a finite number of registers. Those are mapped 1to1 to native registers. This makes RISC-V a much better architecture for suspending the state.
The number of different programs is expected to go up dramatically with Polkadot V2. Relying on pre-checking and caching is a source for all kinds of headaches in that scenario. Just knowing that every program is guaranteed to compile in linear time gives us a much bigger design space and reduces overall complexity and attack surface. When it is no longer required to purchase a whole slot in order to attack our pre-checking some people here will get a really bad sleep.
Due to popular demand I’ve also added compile time benchmarks.
The initial results were… good, but we weren’t the best. And us not being the best has the magic property of motivating me to make it so that we are the best. So while I was at it I also grabbed a profiler and improved our compile times a little bit. Here are the final results: (lower is better!)
PolkaVM: 433us
Wasmi (0.31.0): 603us
Wasmi (master, stack): 632us
Wasmi (master, register): 862us
Wazero: 1511us
PVF Executor: 2621us
Wasmer: 3000us
Wasmtime: 70578us
Yes, we have the fastest compilation times now; even faster than wasmi!
I also slightly improved the execution performance again, because why not:
PolkaVM: 5770us
Wasmtime: 6054us
So we have 5% faster execution times than wasmtime while having over 160 times faster compile times!
Initial post mention 19x difference in almost same functions with different values so there is probably some value add to compare performances between different CPUs. Here are benchmarks done on NUC with AMD Ryzen 7940HS, 64GB DDR5 and 2x 2Tb PCIe4.0 nvme. Seems to be quite in line with koute’s benchmarks.
EDIT:unreliable data done without correct profile → see correct benchmarks on later post
OS: Arch Linux x86_64
Kernel: 6.6.0-AMD
CPU: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics (16) @ 5.423GHz
GPU: AMD ATI c7:00.0 Phoenix1
Memory: 48721MiB / 60007MiB
Disks:
- /dev/nvme0n1 Hanye ME70-2TA01 2.05 TB / 2.05 TB 512 B + 0 B HA02010B
- /dev/nvme1n1 Hanye ME70-2TA01 2.05 TB / 2.05 TB 512 B + 0 B HA02010B
3857 ± cargo run criterion ⏎ [23h23m] ✹ ✭
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `/home/user/src/polkavm/target/debug/benchtool criterion`
Benchmarking runtime/pinky/polkavm_no_gas
Benchmarking runtime/pinky/polkavm_no_gas: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.2s or enable flat sampling.
Benchmarking runtime/pinky/polkavm_no_gas: Collecting 10 samples in estimated 8.1955 s (55 iterations)
Benchmarking runtime/pinky/polkavm_no_gas: Analyzing
runtime/pinky/polkavm_no_gas
time: [3.4724 ms 3.4929 ms 3.5141 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/pinky/polkavm_async_gas
Benchmarking runtime/pinky/polkavm_async_gas: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.9s or enable flat sampling.
Benchmarking runtime/pinky/polkavm_async_gas: Collecting 10 samples in estimated 8.8731 s (55 iterations)
Benchmarking runtime/pinky/polkavm_async_gas: Analyzing
runtime/pinky/polkavm_async_gas
time: [3.9502 ms 3.9690 ms 3.9825 ms]
Benchmarking runtime/pinky/polkavm_sync_gas
Benchmarking runtime/pinky/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking runtime/pinky/polkavm_sync_gas: Collecting 10 samples in estimated 5.5443 s (30 iterations)
Benchmarking runtime/pinky/polkavm_sync_gas: Analyzing
runtime/pinky/polkavm_sync_gas
time: [4.5688 ms 4.6044 ms 4.6381 ms]
Benchmarking runtime/pinky/wasmi_stack
Benchmarking runtime/pinky/wasmi_stack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 792.6s.
Benchmarking runtime/pinky/wasmi_stack: Collecting 10 samples in estimated 792.57 s (10 iterations)
Benchmarking runtime/pinky/wasmi_stack: Analyzing
runtime/pinky/wasmi_stack
time: [2.1397 s 2.1690 s 2.1993 s]
Benchmarking runtime/pinky/wasmi_register
Benchmarking runtime/pinky/wasmi_register: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 752.9s.
Benchmarking runtime/pinky/wasmi_register: Collecting 10 samples in estimated 752.92 s (10 iterations)
Benchmarking runtime/pinky/wasmi_register: Analyzing
runtime/pinky/wasmi_register
time: [1.9806 s 2.0289 s 2.0808 s]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low mild
1 (10.00%) high mild
Benchmarking runtime/pinky/wasmtime_cranelift_default
Benchmarking runtime/pinky/wasmtime_cranelift_default: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.1s.
Benchmarking runtime/pinky/wasmtime_cranelift_default: Collecting 10 samples in estimated 6.1440 s (10 iterations)
Benchmarking runtime/pinky/wasmtime_cranelift_default: Analyzing
runtime/pinky/wasmtime_cranelift_default
time: [4.3618 ms 4.4205 ms 4.4919 ms]
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.8s.
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 10.769 s (10 iterations)
Benchmarking runtime/pinky/wasmtime_cranelift_with_fuel: Analyzing
runtime/pinky/wasmtime_cranelift_with_fuel
time: [5.9868 ms 6.0747 ms 6.1635 ms]
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.8s.
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 6.8213 s (10 iterations)
Benchmarking runtime/pinky/wasmtime_cranelift_with_epoch: Analyzing
runtime/pinky/wasmtime_cranelift_with_epoch
time: [4.9971 ms 5.0361 ms 5.0767 ms]
Benchmarking runtime/pinky/wasmer
Benchmarking runtime/pinky/wasmer: Warming up for 3.0000 s
Benchmarking runtime/pinky/wasmer: Collecting 10 samples in estimated 8.1066 s (20 iterations)
Benchmarking runtime/pinky/wasmer: Analyzing
runtime/pinky/wasmer time: [9.2409 ms 9.3848 ms 9.5353 ms]
Benchmarking runtime/pinky/wazero
Benchmarking runtime/pinky/wazero: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.9s.
Benchmarking runtime/pinky/wazero: Collecting 10 samples in estimated 6.8980 s (10 iterations)
Benchmarking runtime/pinky/wazero: Analyzing
runtime/pinky/wazero time: [16.626 ms 16.979 ms 17.356 ms]
Benchmarking runtime/pinky/pvfexecutor
Benchmarking runtime/pinky/pvfexecutor: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.1s.
Benchmarking runtime/pinky/pvfexecutor: Collecting 10 samples in estimated 7.0689 s (10 iterations)
Benchmarking runtime/pinky/pvfexecutor: Analyzing
runtime/pinky/pvfexecutor
time: [17.213 ms 17.357 ms 17.531 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/pinky/wasm3
Benchmarking runtime/pinky/wasm3: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 125.9s.
Benchmarking runtime/pinky/wasm3: Collecting 10 samples in estimated 125.87 s (10 iterations)
Benchmarking runtime/pinky/wasm3: Analyzing
runtime/pinky/wasm3 time: [324.79 ms 334.33 ms 345.52 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/pinky/native
Benchmarking runtime/pinky/native: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 6.7s or enable flat sampling.
Benchmarking runtime/pinky/native: Collecting 10 samples in estimated 6.6911 s (55 iterations)
Benchmarking runtime/pinky/native: Analyzing
runtime/pinky/native time: [3.1249 ms 3.2632 ms 3.3393 ms]
Benchmarking runtime/prime-sieve/polkavm_no_gas
Benchmarking runtime/prime-sieve/polkavm_no_gas: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/polkavm_no_gas: Collecting 10 samples in estimated 8.3846 s (110 iterations)
Benchmarking runtime/prime-sieve/polkavm_no_gas: Analyzing
runtime/prime-sieve/polkavm_no_gas
time: [1.7048 ms 1.7479 ms 1.7901 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/prime-sieve/polkavm_async_gas
Benchmarking runtime/prime-sieve/polkavm_async_gas: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/polkavm_async_gas: Collecting 10 samples in estimated 8.6181 s (110 iterations)
Benchmarking runtime/prime-sieve/polkavm_async_gas: Analyzing
runtime/prime-sieve/polkavm_async_gas
time: [1.7024 ms 1.7387 ms 1.7741 ms]
Benchmarking runtime/prime-sieve/polkavm_sync_gas
Benchmarking runtime/prime-sieve/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/polkavm_sync_gas: Collecting 10 samples in estimated 9.5357 s (110 iterations)
Benchmarking runtime/prime-sieve/polkavm_sync_gas: Analyzing
runtime/prime-sieve/polkavm_sync_gas
time: [2.0300 ms 2.0557 ms 2.0782 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/prime-sieve/wasmi_stack
Benchmarking runtime/prime-sieve/wasmi_stack: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 563.3s.
Benchmarking runtime/prime-sieve/wasmi_stack: Collecting 10 samples in estimated 563.34 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmi_stack: Analyzing
runtime/prime-sieve/wasmi_stack
time: [1.4281 s 1.5005 s 1.5865 s]
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high mild
Benchmarking runtime/prime-sieve/wasmi_register
Benchmarking runtime/prime-sieve/wasmi_register: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 590.2s.
Benchmarking runtime/prime-sieve/wasmi_register: Collecting 10 samples in estimated 590.25 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmi_register: Analyzing
runtime/prime-sieve/wasmi_register
time: [1.2612 s 1.3140 s 1.3821 s]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 14.1s.
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default: Collecting 10 samples in estimated 14.065 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmtime_cranelift_default: Analyzing
runtime/prime-sieve/wasmtime_cranelift_default
time: [1.6574 ms 1.7477 ms 1.8427 ms]
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 18.9s.
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 18.941 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_fuel: Analyzing
runtime/prime-sieve/wasmtime_cranelift_with_fuel
time: [2.0841 ms 2.1284 ms 2.1800 ms]
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 16.2s.
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 16.204 s (10 iterations)
Benchmarking runtime/prime-sieve/wasmtime_cranelift_with_epoch: Analyzing
runtime/prime-sieve/wasmtime_cranelift_with_epoch
time: [1.8622 ms 1.9513 ms 2.0411 ms]
Benchmarking runtime/prime-sieve/wasmer
Benchmarking runtime/prime-sieve/wasmer: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/wasmer: Collecting 10 samples in estimated 5.4978 s (30 iterations)
Benchmarking runtime/prime-sieve/wasmer: Analyzing
runtime/prime-sieve/wasmer
time: [4.6220 ms 4.6705 ms 4.7260 ms]
Benchmarking runtime/prime-sieve/wazero
Benchmarking runtime/prime-sieve/wazero: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling.
Benchmarking runtime/prime-sieve/wazero: Collecting 10 samples in estimated 8.8448 s (55 iterations)
Benchmarking runtime/prime-sieve/wazero: Analyzing
runtime/prime-sieve/wazero
time: [4.2059 ms 4.2347 ms 4.2708 ms]
Benchmarking runtime/prime-sieve/pvfexecutor
Benchmarking runtime/prime-sieve/pvfexecutor: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/pvfexecutor: Collecting 10 samples in estimated 5.3890 s (20 iterations)
Benchmarking runtime/prime-sieve/pvfexecutor: Analyzing
runtime/prime-sieve/pvfexecutor
time: [6.0807 ms 6.1702 ms 6.2776 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking runtime/prime-sieve/wasm3
Benchmarking runtime/prime-sieve/wasm3: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 64.5s.
Benchmarking runtime/prime-sieve/wasm3: Collecting 10 samples in estimated 64.465 s (10 iterations)
Benchmarking runtime/prime-sieve/wasm3: Analyzing
runtime/prime-sieve/wasm3
time: [168.93 ms 171.66 ms 174.36 ms]
Benchmarking runtime/prime-sieve/native
Benchmarking runtime/prime-sieve/native: Warming up for 3.0000 s
Benchmarking runtime/prime-sieve/native: Collecting 10 samples in estimated 7.3036 s (165 iterations)
Benchmarking runtime/prime-sieve/native: Analyzing
runtime/prime-sieve/native
time: [1.1463 ms 1.1470 ms 1.1479 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/pinky/polkavm_no_gas
Benchmarking compilation/pinky/polkavm_no_gas: Warming up for 3.0000 s
Benchmarking compilation/pinky/polkavm_no_gas: Collecting 10 samples in estimated 5.1307 s (1100 iterations)
Benchmarking compilation/pinky/polkavm_no_gas: Analyzing
compilation/pinky/polkavm_no_gas
time: [4.6757 ms 4.7766 ms 4.8837 ms]
Found 3 outliers among 10 measurements (30.00%)
1 (10.00%) low mild
2 (20.00%) high severe
Benchmarking compilation/pinky/polkavm_async_gas
Benchmarking compilation/pinky/polkavm_async_gas: Warming up for 3.0000 s
Benchmarking compilation/pinky/polkavm_async_gas: Collecting 10 samples in estimated 5.0304 s (990 iterations)
Benchmarking compilation/pinky/polkavm_async_gas: Analyzing
compilation/pinky/polkavm_async_gas
time: [5.0612 ms 5.0811 ms 5.0998 ms]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low mild
1 (10.00%) high mild
Benchmarking compilation/pinky/polkavm_sync_gas
Benchmarking compilation/pinky/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking compilation/pinky/polkavm_sync_gas: Collecting 10 samples in estimated 5.2140 s (935 iterations)
Benchmarking compilation/pinky/polkavm_sync_gas: Analyzing
compilation/pinky/polkavm_sync_gas
time: [5.5725 ms 5.6143 ms 5.6586 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/pinky/wasmi_stack
Benchmarking compilation/pinky/wasmi_stack: Warming up for 3.0000 s
Benchmarking compilation/pinky/wasmi_stack: Collecting 10 samples in estimated 5.6407 s (385 iterations)
Benchmarking compilation/pinky/wasmi_stack: Analyzing
compilation/pinky/wasmi_stack
time: [14.454 ms 14.492 ms 14.530 ms]
Benchmarking compilation/pinky/wasmi_register
Benchmarking compilation/pinky/wasmi_register: Warming up for 3.0000 s
Benchmarking compilation/pinky/wasmi_register: Collecting 10 samples in estimated 5.7161 s (330 iterations)
Benchmarking compilation/pinky/wasmi_register: Analyzing
compilation/pinky/wasmi_register
time: [17.204 ms 17.266 ms 17.312 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/pinky/wasmtime_cranelift_default
Benchmarking compilation/pinky/wasmtime_cranelift_default: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 10.3s.
Benchmarking compilation/pinky/wasmtime_cranelift_default: Collecting 10 samples in estimated 10.326 s (10 iterations)
Benchmarking compilation/pinky/wasmtime_cranelift_default: Analyzing
compilation/pinky/wasmtime_cranelift_default
time: [1.0226 s 1.0258 s 1.0294 s]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 13.0s.
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 13.018 s (10 iterations)
Benchmarking compilation/pinky/wasmtime_cranelift_with_fuel: Analyzing
compilation/pinky/wasmtime_cranelift_with_fuel
time: [1.3147 s 1.3325 s 1.3549 s]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 12.1s.
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 12.146 s (10 iterations)
Benchmarking compilation/pinky/wasmtime_cranelift_with_epoch: Analyzing
compilation/pinky/wasmtime_cranelift_with_epoch
time: [1.2184 s 1.2362 s 1.2550 s]
Benchmarking compilation/pinky/wasmer
Benchmarking compilation/pinky/wasmer: Warming up for 3.0000 s
Benchmarking compilation/pinky/wasmer: Collecting 10 samples in estimated 5.7392 s (275 iterations)
Benchmarking compilation/pinky/wasmer: Analyzing
compilation/pinky/wasmer
time: [20.024 ms 20.383 ms 20.731 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Benchmarking compilation/pinky/wazero
Benchmarking compilation/pinky/wazero: Warming up for 3.0000 s
Benchmarking compilation/pinky/wazero: Collecting 10 samples in estimated 5.0228 s (3245 iterations)
Benchmarking compilation/pinky/wazero: Analyzing
compilation/pinky/wazero
time: [1.3963 ms 1.4322 ms 1.4836 ms]
Benchmarking compilation/pinky/pvfexecutor
Benchmarking compilation/pinky/pvfexecutor: Warming up for 3.0000 s
Benchmarking compilation/pinky/pvfexecutor: Collecting 10 samples in estimated 5.2229 s (495 iterations)
Benchmarking compilation/pinky/pvfexecutor: Analyzing
compilation/pinky/pvfexecutor
time: [10.507 ms 10.579 ms 10.642 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Benchmarking compilation/prime-sieve/polkavm_no_gas
Benchmarking compilation/prime-sieve/polkavm_no_gas: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/polkavm_no_gas: Collecting 10 samples in estimated 5.6755 s (440 iterations)
Benchmarking compilation/prime-sieve/polkavm_no_gas: Analyzing
compilation/prime-sieve/polkavm_no_gas
time: [12.815 ms 12.900 ms 12.959 ms]
Benchmarking compilation/prime-sieve/polkavm_async_gas
Benchmarking compilation/prime-sieve/polkavm_async_gas: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/polkavm_async_gas: Collecting 10 samples in estimated 5.3764 s (385 iterations)
Benchmarking compilation/prime-sieve/polkavm_async_gas: Analyzing
compilation/prime-sieve/polkavm_async_gas
time: [13.927 ms 13.964 ms 14.006 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/prime-sieve/polkavm_sync_gas
Benchmarking compilation/prime-sieve/polkavm_sync_gas: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/polkavm_sync_gas: Collecting 10 samples in estimated 5.0258 s (330 iterations)
Benchmarking compilation/prime-sieve/polkavm_sync_gas: Analyzing
compilation/prime-sieve/polkavm_sync_gas
time: [15.127 ms 15.199 ms 15.272 ms]
Benchmarking compilation/prime-sieve/wasmi_stack
Benchmarking compilation/prime-sieve/wasmi_stack: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wasmi_stack: Collecting 10 samples in estimated 5.2597 s (165 iterations)
Benchmarking compilation/prime-sieve/wasmi_stack: Analyzing
compilation/prime-sieve/wasmi_stack
time: [31.365 ms 31.498 ms 31.596 ms]
Benchmarking compilation/prime-sieve/wasmi_register
Benchmarking compilation/prime-sieve/wasmi_register: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wasmi_register: Collecting 10 samples in estimated 6.1325 s (165 iterations)
Benchmarking compilation/prime-sieve/wasmi_register: Analyzing
compilation/prime-sieve/wasmi_register
time: [36.949 ms 37.048 ms 37.130 ms]
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 20.4s.
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default: Collecting 10 samples in estimated 20.426 s (10 iterations)
Benchmarking compilation/prime-sieve/wasmtime_cranelift_default: Analyzing
compilation/prime-sieve/wasmtime_cranelift_default
time: [2.1173 s 2.2556 s 2.4112 s]
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 26.6s.
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel: Collecting 10 samples in estimated 26.634 s (10 iterations)
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_fuel: Analyzing
compilation/prime-sieve/wasmtime_cranelift_with_fuel
time: [2.5312 s 2.5530 s 2.5723 s]
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 22.3s.
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch: Collecting 10 samples in estimated 22.295 s (10 iterations)
Benchmarking compilation/prime-sieve/wasmtime_cranelift_with_epoch: Analyzing
compilation/prime-sieve/wasmtime_cranelift_with_epoch
time: [2.1562 s 2.2061 s 2.2580 s]
Benchmarking compilation/prime-sieve/wasmer
Benchmarking compilation/prime-sieve/wasmer: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wasmer: Collecting 10 samples in estimated 7.3709 s (165 iterations)
Benchmarking compilation/prime-sieve/wasmer: Analyzing
compilation/prime-sieve/wasmer
time: [44.534 ms 45.266 ms 46.283 ms]
Benchmarking compilation/prime-sieve/wazero
Benchmarking compilation/prime-sieve/wazero: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/wazero: Collecting 10 samples in estimated 5.0900 s (1650 iterations)
Benchmarking compilation/prime-sieve/wazero: Analyzing
compilation/prime-sieve/wazero
time: [2.8460 ms 2.9335 ms 3.0683 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking compilation/prime-sieve/pvfexecutor
Benchmarking compilation/prime-sieve/pvfexecutor: Warming up for 3.0000 s
Benchmarking compilation/prime-sieve/pvfexecutor: Collecting 10 samples in estimated 6.5460 s (220 iterations)
Benchmarking compilation/prime-sieve/pvfexecutor: Analyzing
compilation/prime-sieve/pvfexecutor
time: [29.539 ms 29.675 ms 29.833 ms]
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low severe
1 (10.00%) low mild
Did I read that right? On newer hardware (@hitchhooker’s 7940HS) we’re almost as fast as native code now!? (3492us for PolkaVM vs 3263us for native) Okay, even I didn’t expect that!
Looking at the wasmi results, am I right that it says that wasmi is roughly 682-1275 times slower than native execution? In my own experiences I saw slowdowns between 60-80 (stack-machine) and 30-40 (register-machine). Also 160x for Wasm3 looks a bit off. The benchmarks posted by Koute earlier were much more aligned to what I experienced so far when benchmarking wasmi. One explanation is that wasmi ran in debug mode or at least without proper optimization settings.
Maybe this also explains the small gap between PolkaVM and native code if native code was not compiled with best optimization settings as well? For wasmiit is critical to be compiled via
pardon me. I were clearly high out of e1m1 riffs skipping correct profiles and running benchmarks with 6d uptime with dev processes hindering the native execution.
rebooted and redid criterion tests with correct profile:
runtime/pinky/native: 2.4457ms
runtime/pinky/polkavm_no_gas: 3.4234 ms
runtime/pinky/polkavm_async_gas: 4.0281 ms
runtime/pinky/polkavm_sync_gas: 4.4950 ms
runtime/pinky/wasmtime_cranelift_default: 3.7975 ms
runtime/pinky/wasmtime_cranelift_with_fuel: 5.2632 ms
runtime/pinky/wasmtime_cranelift_with_epoch: 4.2711 ms
This looks way more like what I would expect. Still impressive from PolkaVM!
Thanks a lot for redoing the benchmarks @hitchhooker !
Btw.: To clarify the compile-time benchmarks of Wasm3: They are simply not yet implemented in the benchmarks. Wasm3 has 2 configs, lazy and eager compilation. Eager compilation is slower than wasmi (register) whereas lazy compilation is way faster than wasmi (stack) since it skips both compiling and even validating Wasm.
Key take-aways from the new benchmarks:
PolkaVM executes just 40-55% slower than native.
PolkaVM compiles 97-171x faster than Wasmtime.
PolkaVM performance is on par with Wasmtime, being slower in one and faster in another benchmark.
PolkaVM executes 16-18 times faster than wasmi (register).
PolkaVM compiles 30-40% faster than wasmi (stack) and 65-80% faster than wasmi (register).
wasmi (register) is ~10% faster than Wasm3 in one benchmark. ()
Again impressive benchmarks! this results are exciting, even if its a non-goal to serve use cases outside Polkadot I think the use cases will just come to benefit from PolkaVM’s advantages, just how WASM started being used outside of the browser because it was a great alternative to existing solutions.
Some of the non-goals I wish I can help with in any way is embedded, seeing PVM quite light and memory friendly I wouldn’t be surprised if it can fit in that world and compete with Wasm3(that I use when playing with toys on weekends ) I’m very clueless about compilers or the feasibility of the task but would love to try get this to Xtensa or CortexM(help appreciated).
Well, the first question you’d have to ask is - do you want a full recompiler, or just an interpreter?
The interpreter should already mostly work, possibly with some very minor tweaks required. It’s not going to be fast because the interpreter is currently not optimized at all, but it should be functional. For a deeply embedded use you might need to make the polkavm crate no_std (and it currently isn’t no_std), but I’m not opposed to make it so (and in fact I’m planning on making it no_std myself eventually).
So making the interpreter work for whatever use case you have should be the first step. Once you have that fully working then you could think about making a recompiler work, and that would be significantly more complex (but still probably not very hard, since the VM is explicitly designed to be simple). It would need at least a sandbox implementation (probably based on the current generic sandbox), and a codegen backend.
You could do this as a prototype, and fundamentally I wouldn’t be opposed to merging it in, but definitely not before PolkaVM 1.0. (Because things are currently still in flux and I don’t want to have to simultaneously update multiple recompiler backends while I’m still changing things in a major way.)
It has been mentioned before on the subject, but I want to say it too. At this stage, I think it is more important for the VM to be provable than anything else. If Risc is a viable VM, much more attention should be given to the implementation of RiscZero.
No it’s not, because a ZK VM like risc0 is not a general purpose VM. Yes, it’s cool, and it gives unique capabilities, but the use cases are, in general, different and it is not a replacement for a normal, general purpose VM.
Let me demonstrate.
I compiled my usual benchmark for both risc0 and PolkaVM, just slightly modified to run in one-shot mode (so that I don’t have to call into the VM multiple times, which I’m not sure risc0 even supports):
PolkaVM: 52 milliseconds
risc0 (dev mode, so no proof is generated): 44 seconds
risc0 (proof generation, CPU-only, on my 32-core Threadripper using all cores): didn’t finish in 10 minutes (I got bored and killed it), peak memory usage was 8.7GB
risc0 (proof generation, with GPU acceleration on my $2000 RTX 4090): 6 hours 10 minutes, peak RAM usage was 8.0GB, peak VRAM usage was 19GB
risc0 (proof verification): 37 seconds
(The peak memory usage figures might be inaccurate as I was just eyeballing them in top/nvidia-smi; they could have been higher while I was not looking.)
So, yeah, generating a proof is only, let me see… 436338x slower than simply executing this program in PolkaVM, and in the time it takes for risc0 to verify that proof (which is supposed to be the blazingly fast part!) PolkaVM can execute this program… a mere 725 times, no big deal.
Again, this isn’t meant to show that PolkaVM is better or anything like that, simply because those are entirely different things with entirely different use cases! Apples and oranges. One is not a replacement for the other.
Usually I benchmark it by calling initialize once (but without timing it), and then I call run multiple times and then time only those run calls. But in this case to make things simpler I essentially measured a single initialize + run.
Here’s a diff against risc0 v0.19.0 with the example modified to run the benchmark. (You still need to manually copy the ROM after applying the patch.)
Yeah let’s not confuse goals here Adding to Koutes response: While risc0 is definitively fun and games, please remind yourself that its still experimental (a huge disclaimer banner about that is the first thing present in the risc0 GitHub repository README, and that hasn’t changed since a year or so). And ZK VMs still have a looong way to go regarding execution (proofing) times of their guests.
However I argue that PolkaVM guests are in principle provable in risc0 or any other RiscV based ZK VM for that matter anyways. At the very least they are guaranteed to compile for that ZK VM too: PolkaVM using the most minimal instruction set that can realistically make it work plays massively into our favor. Meaning that anyone considering the privacy gains offered by those ZK VMs worth to pay the premium in execution time, can theoretically already just do that today. Note that, next to the contract execution itself, you’d also need to commit over the runtime / environment state, making it even more expensive.
I mean, this seems like a pretty egregious comparison, because you’re comparing just the execution of the program in your VM to the execution, splitting of the program into segments for continuations (to have the proof generation be parallelized), io of storing the segment data to temp storage which includes dirty pages from execution (default), as well as other related logic.
I haven’t yet looked into and reproduced the proving time and verification of this very specific benchmark, but I guess that the reason the figure you cited seems orders of magnitude off is possibly one of:
Your verification log is incorrect println!("Verified in: {}", elapsed.as_secs_f64() * 1000.0);in your pastebin
The proof is not recursively generated and is instead verifying the 636 segment proofs individually of this 666,000,000+ cycle count program.
Not completely sure about this, as I also can’t verify exactly your workflow, but just wanted to share some perspective!