Elastic Scaling

TLDR: Elastic scaling is very useful for parachains that need a higher throughput than the current protocol allows. Polkadot is going to deliver elastic scaling in the near future.

Polkadots mission is based on delivering a platform that focuses on excellent scaling and security. The aim is to allow decentralised applications to operate with the best conditions possible. In this article we review Polkadot scaling, new changes that are making it even more scalable and achieve elastic scaling and conclude by explaining why elastic scaling is cool.

Polkadot Scaling
Polkadots scales by applying hierarchy to the platform architecture. Parachains are allowed to submit one block per relay chain block, which is the chain that provides the security. The relay chain can serve more than 50-300 parachains . Another feature that this architecture enables is shared security. The hierarchical architecture of Polkadot allows parachain projects to be able to combine their security resources and have strong security backing, making attacks extremely expensive while if each parachain were to run their own blockchains these security resources would have been split up and would make attacks cheap and easy to carry out.

Recently exciting changes have been proposed for Polkadot that open up more scalability opportunities.

Scaling a parachain beyond one Core
Currently polkadot can validate one parachain block per relay chain block. The newest change that Polkadot is enabling, part of Elastic Scaling, is to allow parachain to produce several parachain blocks for each relay chain block and have them validated. These parachain blocks can still be built in a sequential fashion, but the relay chain processes them in parallel.

Polkadot can validate many parachain blocks at a time. We refer to the relay chain resources and validator time used to validate a parachain block on the relay chain as a core, similar to what was previously loosely referred to as a slot. So if the relay chain can validate 50 parachain blocks at a time, we say it has 50 cores, in analogy with a processor with 50 cores being able to execute 50 threads at a time. So parachains will be able to obtain more than one core at the same time for execution, this allows parachain with high throughput to be able to get transactions executed faster.

Core assignment
Currently, on Polkadot prospective projects apply for slots by participating in an auction. Once they win the auction they become a parachain. The auction determines how much tokens need to be locked. Polkadot allows the use of a single core, so one parachain block per relay chain block, for lease periods the auction covered which can be from 6 months to 2 years.

Most recently, agile coretime is being implemented which allows for a more flexible assignment of cores. Coretime refers to the right to use a core on the relay chain. The new changes (more information in the RFC and FAQ) will allow purchase of one or more cores for shorter periods of time such as one month, one hour, or even one block, through on-chain purchase or from a secondary market for coretime.

Elastic Scaling
These two changes, multiple cores per parachain and agile coretime, come together as follows to enable elastic scaling: a parachain can lease extra cores on short notice for a short time and use them to get parachain blocks validated at a faster rate and execute more transactions. Elastic scaling is useful to various entities in the blockchain space. Here we review the benefits of it from their perspectives:

For service providers: they can serve more customers (application developers) which is better business-wise for them and increases their popularity.

For applications: many applications have bursty core-time usage. Applications can save costs by only paying for core-time according to how much they need at any time and not have to choose between high cost and low performance, which will impact the quality of their application for end-users and benefit their popularity. Moreover, when applications start off they may have much less users and with elastic scaling they can adjust how much core-time they pay for depending on their users. It is hard to estimate how much core-time an application is going to need in the next six months to two years up front. If the service is providing rigid scaling, they either have to get lots of core-time in the beginning and pay a high price or end up with issues such as being slow and loose end-users once they become popular. What many applications tend to do is acquire more core-time than they need which raises the price for all parties interested in core-time and raises the bar for entry for application developers. Elastic scaling allows them to only pay for core-time when needed and reduces the price for everyone. In case of a wrong estimation at the time of acquiring core-time, elastic scaling allows applications to load balance and, using agile coretime’s secondary markets, re-sell any core-times they won’t need in future.

Let’s dive into a couple of examples of applications to demonstrate that most end-users applications do bursty activity periods where they need a large amount of core-time as opposed to a constant amount. Any application that has auctions would be a great example of an application that has bursty activity periods. Other examples would be messaging/betting/social media or any application that has human end-users, depending on the end-users geographic location, which for the majority of applications is not evenly distributed across time zones. A large study on messaging between students shows that most messaging is higher towards the evening and night [1]. This is because humans are creatures of habit. For example, a messaging application that is popular in Japan is going to have a moderate amount of messaging during the day, bursts of messages in the evening, and very few from 1 am to 6 am Japan time.

For end users: They don’t have to choose between low cost and good performance (or in worst case halting of the service) when choosing what applications they want to use.

Comparison to other scaling models
Alternative scaling models are very high throughput chains where throughput as much as 2000-3000 transactions per second (of non-protocol related transactions, aka end-user transaction) has been reported. It is not clear how they might grow when the demand gets higher than these numbers. Even if this number may be tenfold or twenty-fold higher. To put this into perspective. Only a couple of major credit cards need to process about 20-30 fold of this number of transactions per second. If we want blockchains to be capable of handling transactions beyond that we need to have a plan to scale further. For example, for IoT applications that will become mainstream sooner or later with a mass amount of global users. Blockchains are excellent for collecting data from IoT devices in terms of preventing abuse and providing privacy, however scaling to these numbers would not be nearly enough. Elastic scaling allows Polkadot to optimise the use of its cores and enables parachains to become high throughput chains themselves at minimum cost.

Another alternative is having rollups which solves the scalability problem better. However, the current solutions being either optimistic or zero knowledge based, suffer from weaker security or having to do heavy computation for nodes respectively. In rollups the execution of blocks is delegated to outside the set of validators. As such, the bulk of the computation and storage is off-chain. In optimistic rollups, it is assumed that the off-chain blocks are valid and that there is a fraud-proving scheme to detect cases where blocks are not computed correctly. Even though in principle ’any party’ can attempt to check and detect invalid blocks, it is not clear how such schemes ensure provable security from a formal standpoint, since it is not specified how to ensure that the blocks are in fact checked. In zero-knowledge rollups, a cryptographic proof of validity for a batch of transactions is published towards the on-chain validators, who can verify it. Producing the proofs of validity, however, is computationally expensive (e.g., 50,000 times slower than natively execution) and this may lead to centralization of proving. In comparison, parachains enjoy high security (in comparison to optimistic rollups) and low execution cost and better decentralisation (in comparison to zero knowledge rollups).

References
[1] “A large scale study of text-messaging use”, Agathe Battestini, Vidya Setlur, Timothy Sohn, Proceedings of the 12th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2010, Lisbon, Portugal, September 7-10, 2010.

Bonus: What else to look forward to after Elastic Scaling?
CoreJam is a more far reaching and general extension of Polkadot’s core model than asynchronous backing and elastic scaling. Polkadot can validate more than just chains. For example, smart contracts on parachains have the limitation that while calls between contracts on a chain are synchronous and fast, cross-chain calls would always be slow and asynchronous, which leads to a difficult choice of which chain to be on and so which contracts are easy to interoperate with. The CorePlay idea, that will be possible on CoreJam, is that the same smart contract can be scheduled with different smart contracts as demand for faster calls allows.

6 Likes

Thanks for taking the time to post these details.

The only thing that struck me as ‘pump-DOT’ was this claim:

I believe possible ‘centralization of proving’ is an unreasonable characterization of the risks in Zero-Knowledge model for scalable/elastic computation.
Specifically, if you look at Mina you see clearly the presence of a decentralized proof generation and even 3rd party proof market place. More generally, there are several projects exploring proof market places, e.g. Succinct

Finally, it is not clear to me that the computation offered in the ZK model can be a substitute for the computation offered in Substrate based chains.
To that extent making cost comparisons seems unreasonable - unless there is some reason to believe the ZK compute is some subset of that available from Polkadot?

It is true that Mina has some ‘type’ of centralized storage. IIUC, Polkadot has that issue to some degree too.

Also, I believe the Mina protocol does not provide for, nor have on its road map, reciprocal accounts between two Mina chains. In that sense it too is a centralized-decentralized ecosystem, and has the “elasticicity” limit that entails.
As do Substrate based chains. See here for some discussion on that topic.

1 Like

Why should you not able to compare them? If you run 1 + 1 in native versus you build a proof for this, why not compare the costs? You want to achieve the same result, proof that you have done the correct computation. However, ZK being quite expensive when it comes to proof generation that brings a risk of centralization, especially when you want to prove a actual transactions and not just 1 + 1.

This is not number is not really correct. This was always some guess, but never backed by any numbers. Reality is that we should go much higher than this (probably).

2 Likes

I didn’t say you should not. I said it seemed unreasonable to me. Yes, ZK calculations can cater to the classic use cases, every thing public, etc. And there are such use cases where the two are comparable, in those narrowly defined use cases maybe the cost comparison makes sense - it is more likely other functionality enters the trade-off space undermining the cost comparison.

I believe I provided current evidence that expressly disproves this claim.

I even linked to the web pages that shows 3rd parties thinking about proof markets which to my mind are a form of decentralization, as well as a source of elastic compute. This is where ZK will likely have a comparative advantage - one piece of tech helps address three problems (privacy (sufficient but possibly imperfect), decentralized-consensus and elastic compute).

Of course this is all “Wet Paint”. It is possible that ZK services are more expensive - I can also imagine economies of scale and tech such that the reverse is true - but higher costs are not axiomatically connected to centralization.

Privacy can mean that your users do not want you to tell the world that in their circumstances you calculated 2 * 1. For that user, a public 1+1, is not a substitute calculation.
Likewise, many users will not want you to tell the world they bought/sold N units, just that they paid/received the correct price from an authorized seller/buyer.
Obviously a contrived example, but hopefully the point about privacy is clear.

Also hopefully it is clear that a user that wants you to keep their calculation or txn private will not regard your offer, to calculate or display everything in public, as a drop in substitute.

Making price comparisons between things that are not substitutes seems unreasonable to me - like comparing the price of an ice pack with a hot water bottle.

Updated: to clarify that even with txn’s some users will value privacy.

1 Like

I worked on Polkadot a lot, so I think it is very fair to say I am biased a bit. However, I do believe in being able to state facts no matter what my affiliation or preferences are. Science and technology only improve by many rounds of feedback and trying to solve limitations over time.

If someone says zero knowledge based chains provide more privacy than Polkadot I would not say to them “you are pumping (any zk)-coin.”

I am not very familiar with of Mina technology, and I am happy to believe you that their solution reduces centralisation significantly, yet the fact that their solution exists is another proof that it is a general problem of zk technologies and adds to the cost of running a chain. I understand that it is a cost one needs to pay for obtaining features like privacy, but that doesn’t make it non existent. I did not claim Polkadot provides the same level of scalability and privacy for a cheaper price. Polkadot is not trying to replace Mina nor any other zk technology. I compared Polkadot sharding model to the rollup model, which I think is a fair comparison for the scalability context.

I disagree. One can run a zero knowledge parachain on Polkadot too, however the centralisation problem will then exist there too at the moment. However, if an application developer doesn’t need zero knowledge for privacy, but is looking for a platform to run their application with optimal scalability and interoperability then the question is whether they go to sharded ecosystem or a rollup one. If they need zero knowledge and interoperability is useless to them, then the rollup model or sharding model have less differences for them in terms of scalability.

Sorry, I am happy to change it. Can you give me an approximate range?

Not really. It depends upon way too many factors, but we discuss the paramters here:

It roughly looks like (needed_approvals + backers) * num_cores = (relayVrfModuloSamples+1) * num_validators where

  • backers = 5 is a liveness paramater, which nobody likes making smaller,
  • needed_approvals = 30 is our security paramater from our simulations and the machine elves paper, which cannot easily be changed,
  • num_validators has quadratic costs in gossip bandwidth, so likely stays around 1000,
  • num_cores counts the number of parablocks being executed per relay chain block,
  • relayVrfModuloSamples depend upon the node specificates and how much resources each core consumes.

We’ll have larger relayVrfModuloSamples and more cores if we increase minimum node specs, but this demands validators change hardware, which sucks. We’ll have more cores if we optimize substrate better, like fixing the stupid storage, or make runtimes more strict using PolkaVM.

We’ll have smaller relayVrfModuloSamples and fewer cores if we give cores 3 seconds of execution instead of 1 second, or enable multi-threading ala solana, or bundle multiple parablocks onto the same core.

All that said, we’re dong very parallelizable work here, so if you imagine validators have 4 physical CPU cores, spend 1 running the relay chain, and spend the other three doing parachain work, which requires 2 seconds of CPU for every 1 second of execution, and that bandwidth is not a problem, then you’d have something like relayVrfModuloSamples+1 = 9 and num_cores = 252, but really a bit less to leave some slack.

At present, our numbers are much worse than this for many reasons: We need way more than 2 seconds CPU time per 1 second of execution without PolkaVM. All this other optimization work. Actually bandwidth matters. And folks run underspec validators. And our validators specs say silly things like “prefer single threaded performance over core count.”

As a goal, I’ve always trotted out one second of execution every six seconds per two validators, so 500 one-second cores on 1000 validators. At first blush, this sounds impossible since execution would fully utilize those hypothetical 3 physical cores, but that makes it interesting. :wink:

As for your question…

I’d say something vague like: Polkadot maximizes on-chain execution time, while still operating within a conventional byzantine threat model.

1 Like

Since seeing Polkadot as a rollup service is a topic again nowadays, I want to share my observation here that:

  1. The advantage of Polkadot over rollups is indeed the higher security guratantees that parachains get due to a constant (approval+backing) checking process with the relay chain validators.
  2. An optimistic rollup system has “elastic scaling” out of the box, assuming finality can be delayed, as the rollup’s state transition is never executed on the settlement layer, unless if disputed. So, a rollup’s “tps” is variable by default, and merely a mirror of how fast or slow it wishes to send its data to the DA layer.

I wonder if my second point is correct and resonate with others?

2 Likes

We know prover improvements like Binus aka DP SNARK in the last six months, but they’re focused upon really general execution. As they start from that slower end, they’ll likely cost significantly more than 50,000 times the CPU time of a single verifier.

As of six months ago, Justin Thaler (a16z) stated that zk roll up provers cost 1-100 million times the CPU time of a single verifier, likely this still holds true today. This does not mean they’re unprofitable, since ETH pays 1 million validators!

All this says, zk roll ups incur significant bills from CPU time, power, etc, money which ultimately comes from somewhere. There are always opertunity costs associated with decentralization too, which multiplies these bills further, but they’ll stay expensive regardless.

As for centralization, zk roll ups could only become raw costs compeditive if they reduced prover costs down below the bandwidth primium, but polkadot’s bandwith costs look like needed_approvals + backers + 3 (availability) + num_collators = 38 + num_collators. We only need num_collators = 10 or 15 or so, since they only control liveness. A centralizer prover could’ve 1 here, but a decentralized prover infrastructure has the same liveness concerns, but likely worse safety concerns too. In other words, decentralized provers cannot ever beat polkadot’s costs, even if all other zk fantasies panned out.

We could make the same statement about network-of-bridges schemes like cosmos, but at least if each cosmos zone has 100 validators then cosmos execution only costs like 2x what polakdot costs, which they easily recoup by using native code. Cosmos assumes 2/3 honest on each zone though, which while still “decentralized” makes a pretty big exploit target.

As zcash makes clear, these roll up costs have zero impact upon deploying privacy, since privacy means end users doing the zk proofs on their own hardware, and then merely batching those on the node. You do however pay those high costs if you roll up nullifier tracking.

Anyways…

Yes, there are apples & oranges here, but real costs exists, latency eixsts, etc, and polkadot brings these way down relative to compeditors, while operating within standard execution & threat models.

Correct.

It is also worth noting an implicit assumption in all these claims is that both networks operate similar (comparable) speculator tokens, and that all cost differentials exclude any token effects. That is probably reasonable in the current state-of-play.

If a consumer token ever emerges all bets are off in terms of the costs consumers pay in a network designed around speculators and one designed around consumers.

I suspect then everyone comfortable about un-like comparisons will start objecting. I’m not suggesting you have taken this position, this is to make this clear to a wider audience:

  • Claim: A ZK txn on network X is 10% cheaper than the same txn in FK (Full Knowledge) form on Polkadot.
  • DOT-Gallery: You can’t say that. They are different networks. You are measuring different things. etc. etc.
  • Claimant: No. You are selectively measuring, and excluding what is inconvenient to you.
  • DOT-Gallery: Well. “Making price comparisons between things that are not substitutes seems unreasonable to me” us.

Live by the zword (intended) die by the zword.

Is such a change likely to reduce Total costs by a couple of orders of magnitude or more?
Who knows. Some metrics do shift that much. It is not yet clear if the consumer surplus or ‘prices’ shift that much.

Curious. Do you place Mina in the category of ZK rollups?

Yes, Mina is a zk roll up in cost terms. These high costs come firstly from accessing storage, but sometimes from doing recursion, non-native arithmetic, etc.

Zcash only requires two-ish storage accesses per tx, as nullifiers are handled transperently, so nobody cares if this takes 20 seconds on the user’s device.

It’s impossible in terms of energy spent.

It’s maybe possible in terms of user fees spent while VCs fund the zk roll up, and dot stakers earn far more than validators, but they must overcome a huge cost differential.

At least one of these zk roll ups was crowing about $22 per block with 182 tx in AWS plus $5 in ETH gas, well those AWS fees cost $116 million per year if you want 6 second block times. You want 10x that tx rate too, so now we’re talking $1 billion per year.

That’s more than all of polkadot’s total invlation, burnned on one single parachain core equivelent. They can cut costs by not using AWS, using better proving system, FPGAs, etc. Yet, they still must pay people for other stuff polakdot does, like decentralization information. And here they’ll face roughly the same arguments for & against inflation as polkadot (see below).

Again, the zk roll up maybe cheaper than paying ETH directly, but that’s because ETH has nominally 1 million validators, which drives up costs.

Also, the zk roll up can do almost nothing when they’ve no users, so the VC can recoupe by dumping their tokens, before anyone sees the real costs. Polkadot must pay for its whole network all the time, even while we stupidly do not build applicaitons ourselves to drive usage.

We optimize designs for real world costs, especially undertandable ones like energy usage. I’d expect “token effects” and “VC effects” disappear eventually, or at least wind up being priced similarly across chains, or that dot holders profit from the efficency, or that the whole industry disapears.

We’ve ongoing casual discussions about slowly bringing rewards more into line with costs, primarily through minimum commission or a seperate validator payouts, and somewhat reduced inflation, but the ecosystems has deflationists who prefer more radical approaches.

All proof-of-stake blockchains pay stakers for information about which validators they trust. Afaik, nobody ever analyized if this even makes sense, but it’s usually much less bad than how corporate board ellections work, and polkadot worked extra hard here ala NPoS. Also, some chains like dfinity vote much like corporate board ellections, which makes them worse. lol

Anyways, this information always incurs costs: Tor solves this with ecosystem politics, witness the drama when they expelled Appelbaum and later Lovecruft. Although tricky to price, it’s far cheaper than staking, and likely far more secure, but not free either.

2 Likes

Agreed. That is all one can do now.
Network take rates are equally real world, but only when they become measurable. Right now they are pretty much hidden. Witness the absence of any term structure: A ZK txn to purchase your daily milk supply is indistinguishable from a txn to purchase a Boeing

The interesting part is when such take rates become observable: Is there a scramble to morph tokens, or do we discover, and admit, that all along people were only ever interested in tokens in the form of securities.

In case anyone following is interested… things are moving along:

Relative to native execution of a RISC-V program, how much slower is the Jolt prover today? The answer is about 500,000 times slower.
Since Jolt is 6x (or more) faster than deployed zkVMs (RISC Zero, Stone, etc.), this means that today’s deployments are actually millions of times slower than native execution.

ZK likely has the scale economies to warrant specialized hardware. How much does that change things:

Because Binius works over the field GF[2128] (rather than the scalar field of an elliptic curve like BN254, which is what Jolt uses today), field operations in the sum-check protocol will get much cheaper. Exactly how much is hardware-dependent, but Ulvetanna is working hard to ensure the speedups are substantial. For example, on ASICs and FPGAs, it will likely be 30x or more (even without bringing in algorithmic optimizations that minimize the prover’s field work over small-characteristic fields). On ARM machines, the speedups should also be substantial: while Binius is targeted at the so-called tower basis of GF[2128], a lot of the prover’s work can happen over the POLYVAL basis, and ARM chips support fast multiplication in this basis.

Call all that improvement 60x (conservative). And today, the expectation is a 8k-9k speed differential, given the historical record of compute costs that cost will be trending down.

I can believe a non-private network being left with a take rate in the order of 1% (milk purchases), and a private network (birthday gift and surprise holiday purchases) capturing a take rate in the order of 10%.

What prices are the take rates calculated on for a private network to be the same or cheaper (economies of scale likely mean they won’t stop at matching your total costs) than a non-private network?

Interesting times.

We’ll see if others’ benchamrks match a16z’ press releases.

Relative to native execution of a RISC-V program, how much slower is the Jolt prover today? The answer is about 500,000 times slower.

You’ll still have oppertunity costs overhead from “decentralization” too, which zk roll up guys ignore so far. As I said above, decentralized provers should nerf any possible cost advantage for zk roll ups.

ZK likely has the scale economies to warrant specialized hardware. How much does that change things:

Bitcoin has ASICs in part because sha256 never changes. ASICs look high risk while algorithms change so fast.

Because Binius works over the field GF[2128] (rather than the scalar field of an elliptic curve like BN254, which is what Jolt uses today), field operations in the sum-check protocol will get much cheaper.

Awful lot of suposition here, so far those SNARKs have come in much slower, meaning Binus has lots of catchup first.

It’s all kinda irrelevant…

1st) We’ve our own conventional optimizations which we must deliver, whatever the zk roll up people deliever. Aside from conventional optimizations, we’d maybe gain another 2x or 3x by using better threshold randomness too, which complicates our protocol.

2nd) We’ve a much nicer computation model, which simplifies development. And the game is really just to ship applications that real people use.

3rd) In fact, we’ve access to variant computation models too, including some crazy nice oracles things, but at much worse costs. I’d wager some polkadot fork explores those, not polkadot itself.