Announcing PolkaVM - a new RISC-V based VM for smart contracts (and possibly more!)

All of this is required to execute the guest? If that would be removed, could as well remove the time it takes for PolkaVM to JIT the code and a bunch of other things (which is besides the point).

Nope, I don’t think so. I can’t say for sure as I didn’t design the system, but I assume it’s to have the same workflow and no differences between this dev mode execution, and the execution that’s split for proving.

Here’s what the docs says:

A risc0 project, when run in dev-mode by setting the `RISC0_DEV_MODE` environment variable, supports ([fake]) receipt creation and pass-through 'verification' function, so that dev-mode may be switched on and off at runtime without impacting project workflows.

Remember that Risc0 is designed for a completely different purpose than PolkaVM

…so what exactly should I have compared here?

I mean, the whole point of risc0 is that you give it an arbitrary program, it runs that program, and then it gives you a proof which can later be used as an alternative to running the program again.

So I think it’s entirely reasonable to compare this with how long it actually takes to run the program again, if your use case doesn’t actually need the zero knowledge properties it gives you. (And if you do need this property then this is a moot point because no matter how fast a general purpose VM like PolkaVM is you can’t use it anyway.)

What do you mean it’s incorrect? I’m timing the verify function, am I not?

If it’s about the * 1000.0 then that’s intended. Well, kind of. Originally I didn’t expect the whole run to take 6 hours so I wanted to multiply the time to get the number of milliseconds for convenience, and I left it as-is.

But I properly took this into account when reporting the results. I also had a secondary time shell command running which measured the execution time of the whole thing.

Okay, if that’s the case then how do I make it generate it recursively?

Well, I pasted the exact patch that I used, so that’s my whole workflow. (: Is there anything else you need?

If I made any mistake or misrepresented the performance of risc0 then please do correct me; I can rerun the benchmark again if there’s some trick that can be used to make the proof generation not take 6 hours for this particular program.

I know. Was the “Again, this isn’t meant to show that PolkaVM is better or anything like that, simply because those are entirely different things with entirely different use cases! Apples and oranges. One is not a replacement for the other.” that I wrote in my original post not clear enough? (:

The whole point of my post wasn’t to seriously measure the performance of risc0 as a general purpose VM, deride it for being slow, or anything like that.

I just wanted to preempt people coming out of the woodwork, seeing the “RISC-V” keyword, matching it to the “ZK” keyword in their head, and suggesting that we should do that instead, where, as you’ve aptly noted, those kind of VMs are designed for a completely different purpose than what we’re doing! So I wanted to highlight this in easy to understand numbers. That’s all.

5 Likes

I’ll avoid bikeshedding this awesome post, so if you want to continue to resolve our miscommunications, feel free to DM me :slight_smile: I’d love to learn about the differences and how your VM works.

I just want to be clear I was only pushing back on the direct benchmark comparison of different systems doing different things:

And

when you don’t provide any context to the comparisons or understand the system you are benchmarking against.

1 Like

@alex (@koute) Based on this breakthrough, we’d like to prepare a path in 2024 for IDE tooling for

  1. ink! 5.0 with PolkaVM
  2. CorePlay with PolkaVM

and prep a “Decentralized Futures” plan to followup on some ChainIDE + Polkaholic.io WASM Contract Explorer Integration work we did this summer.

Questions:

  1. Can we get your best guess on 2024 PoC timing of 1+2 so we can plan 2024 accordingly?

  2. It seems obvious that between
    (a) IDE for ink! 5.0 with PolkaVM => rococo-contracts
    (b) IDE for CorePlay => rococo-coreplay
    Polkadot builders would get considerably more excited on (b) relative to (a) – is this the case and should it drive PoC prioritization? I know its not an either-or thing.

  3. I think getting a builder community of CorePlay by Spring, even if its on a PoC would be phenomenal and can be done in parallel with further improvements in core PolkaVM (on performance) - is this judgement correct? If not, what are the outstanding PolkaVM items needed to attack Coreplay?

  4. I imagine your PoC for (2) [Coreplay with PolkaVM] would have ink! flip replaced with something like fibonacci(n) that illustrates the breakthrough idea of long-running Coreplay suspend/resume – what are your favorite tutorial examples? Or, rather, where are the first few CorePlay test cases?

  5. I would like your ideas of how Coreplay developers can reason about their program being suspended and resumed in an IDE with PolkaVM. Can you give us your ideas on this? What else would be on your wishlist of what a 1.0 Coreplay IDE should contain?

  6. Can you provide us with a post-PoC CorePlay rollout plan in 2024 going from something like ‘rococo-coreplay’ to Kusama and Polkadot? Or, what are the bottlenecks in the R&D path?

Thank you so much, very excited to help build on this amazing breakthrough!

1 Like

I’m stunned that risc0 prover winds up only 436338x slower than execution. I’d expect risc0 provers to be maybe 10-100 million times slower than PolkaVM. It’s multi-threading on that threadripper plus GPU acceleration I guess.

All blockchains produce “proofs” but in various distributed systems threat models. That’s what blockchains do!

All parachain blocks finalized by Polkadot are “proven” in pretty close to the classical byzantine threat model, aka 2/3rd of honestty of polkadot validators plus some networking assumptions. A bridge ecosystem like Cosmos “proves” blocks subject to multiple threat models, ala 2/3rd honest on each interacting zone plus whatever.

ETH wants zk roll ups, and less secure optimistic roll ups, because they designed themselves into a corner. If you’ve ETH style per block zk roll ups then you still trust the ETH execution comittee for the chain. Amusingly, an execution comittee being small suggests this winds up worse than using state channels ala lighning or polkadot’s byzantine threat models, although this difference maybe academic.

If you’ve some full chain recursive chain roll up like Mina, then your light client would not necessarily check if proof verification takes half a miniute, or maybe they’d be paranoid enough to do so.

In theory, you could’ve no-chain user-side “roll ups” using TEEs, meaning each users’ device track that user’s money inside its TEE, and you perform transactions by running a protocol between two users TEEs which ensures that all money added first got deducted from the other TEE (and permits losing money). This is a “proof” in the TEE attestation thread model. It’s perfectly private even. It’s unlikely folks trust this threat model though, given how SGX gets broken every 6-12 months.

4 Likes

I expect that ink! will be adding a CorePlay backend and some abstractions to deal with the concurrency. The execution environments of pallet_contracts and CorePlay are similar enough. If you think about it: CorePlay is strictly more powerful than pallet_contracts. i.e pallet_contracts is a special case of CorePlay: All interactions fit into one block. CorePlay will support uninterrupted execution via its call entry point.

Ideally, you will be able to recompile your ink! program to run on CorePlay without any changes. However, to make use of multi block execution you will need to add a few things.

Meaning: Every ink! program is a (crippled) CorePlay program but not vice versa.

  1. I can’t comment on any timelines.

  2. There is no priorization to be done. Completely different teams. Nothing much changes about contracts except its VM. Hence it is a much smaller endeavour than CorePlay where everything changes and the change to PolkaVM is more or less a nice to have. This is why we expect it to be ready much earlier. Not because the priority is for contracts. PolkaVM was specifically created because Wasm didn’t work well for contracts. We didn’t expect it to perform so well that it can also be used outside of contracts.

  3. I don’t think PolkaVM will be the blocker here. I assume the 1.0 Milestone needs to be done first.

  4. The most practical thing that comes to my mind is probably a storage migration :smiley:

  5. Our idea so far with debug ability is to use execution traces. Meaning that PolkaVM will collect a trace of an execution and sends that to the IDE. It allows you to go forward and backwards in time. This is different from an embedded gdbserver where this is not possible. It is probably also a much simpler protocol.

  6. Can’t comment on timelines.

3 Likes

I think we should focus more on the speed of normal execution rather than proving time( The part that Koute refers to as Dev mode). To be honest, I didn’t expect there to be a serious difference in Dev mode, so I’m not exactly sure about the ideal scenario right now. However, the reason why I want it to be provable is that I think it makes more sense to produce Zk proof when proof is needed, rather than constantly producing proof like Zk Rollups. It’s like the fisherman proof requirement that Polkadot used to have. I think there are many advantages to being able to prove with Zk. It makes sense both in general in scenarios where Polkadot SDK is considered to be used and within Polkadot itself. But how it can be useful within Polkadot is something you know better. If you think that non-Polkadot scenarios are also useful for developer adaptation, I can guarantee that they will be useful there.

If I had to give a small example. Let’s consider a Rollup. Using Polkadot DA and also using Polkadot as inter-rollup communication. It sends block information via the Polkadot-Ethereum bridge. When he has a dispute, he proves it with Zk. It is possible to establish structures that are mostly connected to Polkadot in short-term interactions, and that show Optimistic Rollup characteristics in the long term.

CPU time is money. In theory, we could’ve some risc0-ish layer which produces snarks for all parachains. If we use the recent risc0 hype numbers of 180 tx for like 22 USD then we’re looking at

10 block/min * 22 USD/deci-block = 1.16 billion USD per polkacore-year

I’m cautiously optimistic each relay chain could run 500 polkacores, so we’d exceed the costs of even bitcoin by doing this. We’ve used a centralized prover in this computation too, so whatever decentralization those require adds some opertunity costs too.

Ain’t clear what dispute even means here, nor how sometimes-zk-proofs help.

As a rule, you could only dispute when you’ve some party who knows the exact execution location and some reason to bring the dispute, so usually a “dispute in a bridge” only refers to the operation of the bridge itself.(*) Your bridge could propogate some fault from anywhere in the ecosystem, even hidden behind zk, and which occured arbitrarily far back, in principle even before the bridge existed.

We prove proactively in polkadot, zk roll ups, omniledger, etc speciifically so that everyone learns when some fault occurs. Users could trust the somtimes-zk-proof bridge only if the zk proofs were superfluous dead code.

An “optimistic roll up” means “adding significant latency in which you locate historical faults”. It makes sense if you narrow the functionality & parties like state channels do. It becomes problematic in felxible tx models like parachains or smart contracts.

(*) The sampling based BEEFY bridge was secure-ish for PoW ETH. In PoS ETH, randao sucks so we’ll replace sampling with web3sum BEEFY. It’s a bespoke plonk-sans-wires that proves a given BLS aggregate public key represents 2/3rds of polakdot validators.

Thank you!

Can you comment on Decentralized Futures/OpenGov: "Polkadot Does What Ethereum Cannot" 2024 Brand Marketing Campaign

You know you have a once in a decade breakthrough.

We need Polkadot 2.0 brand marketing to match it.

I end up finding koute’s speech in YT stream and reuploading transcribed here:

Also based on the speech I wrote an article here.

5 Likes

Optimistic Rollup adds latency between your Rollup and Settlement layer(Ethereum). Polkadot will act as a Sequencer so you don’t need to wait to be able to interact. You can look at how Optimism plans to use Superchain as an interoperability solution but theirs is pretty much centralized. We can do the same with Polkadot but in a decentralized manner. Since we are only proving when there is a problem in execution it is not necessary to calculate the proving cost. If everything works there will be no incorrect execution unless Polkadot validators are byzantine majority.

I would normally not talk these with the Polkadot community but considering that all the major Polkadot projects looking for becoming a Rollup maybe I should talk more about these.

Polkadot can act as both Sequencer and DA layer and interoperability solution for Rollups.

Also we technically can prove PolkaVM inside Risc0 too but that is two layers of complexity instead of one. If we could have a VM that both provable and easy to execute in a normal way (without proving) that would be better I think.

Yes of course, Polkadot could be a settlement layer over ETH, using the BEEFY bridge. In fact, Polkadot could be a settlement layer over BTC, if we deploy a large threshold DKG for schnor signatures. I think neither adds too much latency though because polkadot does proofs proactively.

It’s also fine if a paranoid party uses the BEEFY bridge plus a “governance delay”, assuming they made this make sense somehow.

There is no benefit from involving high-latency roll up technologies though: If you’ve optimism L2s or risc0 L2s then any messages between roll ups must wait out their full dispute or prover delays. As otherwise, if unsoundness propogates then adversaries could easily hide the unsoundness, making disputes or proofs impossible.

We inherently have low latency messaging in Polkadot, but our pro-active proofs stop unsoundness, but only within our threat model. Your parachain cannot participate in messages if you want to benefit from the roll up’s threat model. As a rule, these optimistic roll ups have no sensible threat model anyways, so negligable benefit from their threat model anyways.

In fact, I learned recently that ETH execution comitteees have only like 256 nodes, so 1 in every 3418 nodes, or 18 M USD in total. We also randao sucks and ETH ignores network attacks like BGP, so it’s maybe easier to have a malicious execution comitteee and break polkadot’s soundness. lol

1 Like

Rollup is basically just a bridge. They have different properties but still they are just bridges. If Polkadot somehow applies censorship to prevent tx inclusion or withdraw requests from Ethereum then this bridge can result in losses at Polkadot side which is a resonable outcome. Parachain’s have different bridges that can result in losses. And in this particular case it is Polkadot’s fault if this kind of thing happens.

You are not going to care about Optimism Rollups. What I am talking about is some kind of sub ecosystem of Rollups in Ethereum which they are also Polkadot Parachain and they can interact with each other because they are all using Polkadot as their Shared Sequencer.

Do you, or anyone in this thread, have any hunch about the potential speedup available if this was run on a FPGA? Say Amazon’s F1 as a reference.

Given there are RISC-V’s that have been instantiated on FPGA’s I wonder how FPGA friendly this might be?

no. polkavm(a general purpose user-level RISC-V based virtual machine) currently runs natively only on x86 CPUs. its difficult to think of any near future incentive of going that extra mile and writing implementation for fpga. afaik execution of contracts/pvf is meant to be done by the collators/validators. its unlikely for polkadot weights to get so demanding that you would need to offload execution for fgpa to handle.

but I feel there is enough offrailing in this thread already. im glad to announce that we applied for a grant this week building a web app disassembler for polkavm binaries.

2 Likes

Hi @koute

It’s been on my list for a while to reply! First of all, thanks so much for driving the discussion so openly in the forum and sharing all of the stats :clap: Makes it very easy to follow and join the discussion.

From a product marketing perspective, I have one thing to add: I’d recommend using another name than PolkaVM, and something more descriptive (e.g. RISC-V VM).

Reasons:

  1. The biggest reason: We would undermine one of the biggest selling points for not using EVM: EVM is proprietary, Web3 specific, etc., and we always sold WASM as a more widely-used standard, it’s open, more efficient, … - if we start saying “On Polkadot, we use PolkaVM”, people would think it’s proprietary and just another environment to learn.
    Using RISC-V is amazing, but “hiding” it behind a new name could not only harm your VM implementation, but also adoption of Polkadot (if the RISC-V VM is becoming the new standard).
  2. Even as a general rule of thumb, we should avoid coming up with our own names if possible. The PD ecosystem is already infamous for its lingo, and if we have a chance to avoid another moment where people have to check a glossary, we should use it. We have so many special terms that can be easily explained by one single term from the industry:

When you want to build parachains (=appchains) on Polkadot, you can use pallets (=modules) from Substrate (=our SDK).

Maybe because smart contracts were never much in focus, we don’t have that lingo yet, and I’d pledge for keeping it simple and understandable too.

  1. Calling it RISC-V VM (or something similarly descriptive) has the advantage, that for those who know the benefits of RISC-V, you ring the right bells, and save a paragraph of explanation. In your previous post, you mention the benefits of RISC-V:

→ These are amazing qualities we want to bring to people’s minds, and we don’t want to bring up the negative associations with a potentially proprietary VM like PolkaVM.

Curious what you (and others) think!

3 Likes

On the flipside, if PolkaVM becomes a huge success and other ecosystems copy it, we at least have our branding there :sunglasses: so users will always know where the great tech is actually coming from.

Naming-wise calling it a “RISC-V VM” isn’t strictly accurate because it’s not a pure RISC-V VM (due to numerous technical reasons), so as a consequence it doesn’t run raw RISC-V machine code, it has a couple of VM-specific extensions and it doesn’t pretend that it’s an actual von Neumann-like RISC-V CPU. That’s why I’ve been calling it “RISC-V based”.

Calling EVM “proprietary” is also something that I think is very inaccurate. EVM is not really proprietary - there’s a spec, and there are multiple implementations, and there’s nothing legally preventing people from using it. It’s the exact opposite of proprietary!

But what EVM is not is “general purpose”. It’s a terrible fit for running general computations (because it’s been designed with one, very narrow use case in mind), and it requires special snowflake languages and compilers. You can’t just take a normal, random program in a normal programming language that normal programmers use and have it run on the EVM. You can do that (possibly with minor modifications) on WebAssembly and PolkaVM. And that is what mainly separates EVM from WebAssembly and PolkaVM! (Besides other auxiliary reasons like efficiency, etc.)

Anyway, I don’t think we’ll be changing the name of PolkaVM the implementation, but I would be open to changing the name of PolkaVM the standard (because there will be a spec and multiple implementations!) if someone can come up with a good name. But I really don’t think a generic “RISC-V VM” would be a good name, for the same reasons that e.g. Google Chrome is not just called “a standards based web browser” or Linux is not called “Unix-like kernel” - it needs to be something that isn’t generic (there are a ton of RISC-V VMs out there, so saying “RISC-V VM” could mean any of them).

2 Likes

Let’s cross that bridge when we get there :innocent: can always be added (“powered by”) later and I’d vouch for naming for early success than for potentially later success :slight_smile:

2 Likes