Distributed validator infrastructure for Polkadot

Hello, I posted an issue on some thoughts on a distributed validator infrastructure for Polkadot validators that would ideally support any Substrate chain with GRANDPA/BABE consensus. Looking for feedback, insights, advice, and interest from anyone who might be open to exploring with me and the Tangle (https://tangle.tools) team!

Goal

One major goal with building out a DVI is to build a decentralized staking infrastructure and LST that would earn native yield from Polkadot validation and provide a rich primitive for applications such as restaking. A major goal for Tangle is to bring in DOT liquidity for restaking security and provide a similar ecosystem as Eigenlayer but here in this ecosystem. LRSTs are an effective way to continue to bootstrap security and provide new yield opportunities and I think if this infrastructure is possible to build that it would provide a lot of value to this ecosystem.

Thanks for bringing these pieces of tech to light:

This would be similar to what Obol and SSV Network are doing on Ethereum. Examples of what a distributed validator cluster look like on Ethereum can be found here

IIUC, this provides a validator with some redundancy - the motivation being that they could avoid some penalty if their infrastructure falls over at a point in time when they are expected to produce/validate a block, etc.

Rather than reinvent the wheel… I wonder if this is a starting point or shared functionality for validating across relay chains?

For motivation I’ll note that some very rough and preliminary calculations suggest that for an equilibrium state you might need in the order of 2K-3K participants. What is the best definition of a participant? Who knows. But for current purposes, let us say it is validators. The good news is that number is not 1 million, the bad news is that number is not 100. For context see these figures from @burdges:

The pertinent observation is that for more that 1K validators you are likely looking at more than one relay chain.

Unfortunately some obstinately refuse to acknowledge the focus of development really needs to be the relay chain.

Any thing you can do to move the ball forward would be great - even if all you do is establish what won’t work or won’t help.

You might wonder if the recent gray/JAM paper improves matters. While it does make several assertions about the economic security, and it does correctly (in my view) acknowledge the critical role this plays. It is, on this topic, unfortunately, another example of crypto-obscurantism. As best I can tell does not provide anything new on the economic security front. And silent on the critical question of what is the ideal number, and more importantly the minimum number, of participants under different conditions.

I’ll reiterate the preliminary and incomplete nature of what I raised above. And point out the obvious problem of not having data from a system (ETH, DOT, etc.) that we think is capable of reaching a steady state, or we can reasonably conjecture is in that state, and so can be used to inform our parameter estimates - in fact we aren’t even sure we have systems capable of maintaining a equilibrium in the face of adverse events. Also I’ll note that my figures above relate to what I would call a non-speculative token design, while ETH, DOT etc are speculator token designs (aka securities).
Finally, there may be more than one way to skin this particular cat, and it is possible the calculations referred to above are correct and irrelevant - a better alternative being available.

A “distributed validator” seems nonsensical. All these blockchains are already a distributed systems.

A “threshold validator” makes sense: It’s several physical machines doing redundant work but doing threshold concensus signatures.

Afaik “distributed validator” could only really mean “threshold validator” with different subnodes under the control of different sysadmin. It’s way less than what people usually mean by “distributed”. It’s possible “threshold validator” would provide operational security benefits for validator operators, maybe even if the same sysadmin controls all subnodes.

I replied to threshold validator on github of course. As I said there…

We do keep our crypto threshold friendly whenever possible, but at least from polkadot’s perspective threshold validators provide little value, even though the underlying idea makes sense. If we needed more “decentralization” then we should adjust our parameters.

Kusama should not use threshold validators. We should debug & deploy NIST post-quantum crypto in concensus on kusama temporarily, which proves polkadot could deploy post-quantum crypto in production. There is no chance that NIST selects crypto with simple & secure threshold flavors.

Unfortunately some obstinately refuse to acknowledge the focus of development really needs to be the relay chain.

Rob’s comments look unrelated.

We do actively develop the relay chain, adding features & improving performance, but…

At present, we’ve no resoruce pressure on polakdot, so we need more development of example applications, like games or whatever, probably both externally and in-house.

As I said elsewhere…

You could bridge polkadot, kusama, and similar projects, but these bridges require 2/3rd honesty on both sides, like what cosmos assumes. You’d expect social engeneering attacks bring down cosmos-like bridge ecosystems eventually.

We envision multiple parallel relay chains randomly divvying up one large validator set, selected by NPoS ellection, using only DOT staking of course. In this, we’d prove 2/3rd honesty on each chain, instead of assuming it like cosmos does. We know only two ways to do this proof:

  1. We make all validator operators run equally many nodes for each relay chain.
  2. We require (a) the heavier 80% honest security assumption, as well as (b) shuffling of validators between the relay chains using (c) threshold randomness.

Anyways…

  • Threshold validators need threshold crypto, not relay chain features.
  • Distributed validator could only mean threshold validator. Staking is irrelevant there.
  • NPoS is already better than other “liquid sataking” ideas.

There do exist other reasons to provide more staking features, like staked DOTs being refrenced by collators, who require only liveness assurances not safety or soundness.

That seems reasonable.

My understanding was the current use case was as you note:

and to do so independently of the RC configuration/preferences.

Agreed. Although my use case is subtly different: The Attribute X may be adequate for Property Y, but inadequate for Property Z. That is on me - I did weakly cast this as being validator specific, but it needn’t be. What is generic is the presence of more than one RC.

I’m not disputing the RC development pace, and the use case is more strategic than tactical. But, as I acknowledged, this use case is a non-speculative token and that is categorically different from DOT.

So far, as best I can tell, the differences fall within the scope of parameter settings. Apologies for not being clearer, the RC code base is working its way up my todo list.

This should focus things: Are there integration (or unit) tests exercising the multi-RC use case? Or even documented rules-of-thumb about the trade-offs?

Here you mean non-BEEFY bridges? Or does BEEFY share these properties?

I’m inclined to agree with your GH comment that OmniLedger probably has some useful insights/results.

Thanks for the feedback and thoughts.

Yea, distributed validator or threshold validator, I’m using both terms to describe the same system. The MPC would also be relevant presumably for other use cases, key management/custody applications that want to interact with Polkadot.

I’d like to keep pushing out thoughts on a design and see if @burdges @taqtiqa-mark you have more thoughts here, but going deeper begins to expose some possibility of PBS style block building. Specifically, if you distribute the signing of BABE and GRANDPA blocks across a cluster or a network of nodes, you run into the following decisions:

  1. For BABE, consensus needs to be reached on what block to sign.
  2. For GRANDPA, nodes can run the finality client and generate threshold signatures for blocks they want to finalize with less coordination (or maybe also requiring consensus although I don’t see the immediate malicious behavior with not).

For BABE it seems that if the cluster is being managed by different entities they may have some say in how they collectively want to build a block and then sign it with their key shares (still considering the threshold validator). This process seems like an avenue to explore proposer-build separation as I know @rphmeier has mentioned being interesting in Polkadot. Builders could send blocks w/ proofs to the threshold validator (proposer) who signs off on a block without having seen it. Of course a threshold validator is necessary to accomplish this but as I’ve thought about this more I’ve realised the space in between steps has room for exploration. Or maybe not and I still have more to understand.

and to do so independently of the RC configuration/preferences.

IIUC you’re saying that it would be potentially possible to reuse a cluster across relay chains? That does sound like an interesting extension.

No. Closer to saying it MAY be necessary to have more validators than what one RC can support.
This is very much wet paint, and as I emphasized likely depends on how a RC is parameterized/configured. So you may be able to configure a chain so that this threshold changes - its all about tradeoffs - and you’ll need to accept that tradeoff.

You should also bear in mind the issues I have in mind arise from a consumer token design, while Polkadot is a speculator token (that consumers have no choice but to use). These are categorically distinct and it seems reasonable to expect that details that are an “issue” for one are inconsequential for the other.