Background Signature verification

There is this open pr by moonbeam to bring back background signature verification:

I’m personally in favor of this pr (even when I procrastinated hard on it :see_no_evil:). I think the current format should be okay, besides that we should add one more change. This was proposed by Pierre some time ago. He said we should attach an unique ID per background validation instance. This way we could have multiple of these instances at the same time.

One important thing that we need to consider before merging this pull request is the impact on the Parachain validation. If we add support for multiple validation instances, we should probably add some cap to prevent spawning infinite of these instances at the same time. Maybe something else I’m missing or other things we should consider.

I’m posting this here mainly to get some feedback/ideas before merging it.


Did someone benchmark what the performance gain from this actually is?
It aims at reducing the import time of blocks with many sr25519 extrinsics I assume?

Back in the days we had some benchmarks, but nothing that can be compared to what we have today.

Not only sr25519. Either ed25519/sr25519/ecdsa.

Do we have a benchmark that uses a full block? Then we could use this there.

Granted PVF gets gas metered, it’s not entirely clear how to charge gas for the things running in parallel.

How do we charge gas for host functions at all? Do we just assign it some fixed value?

Not necessarily. It can be a formula that depends on the arguments or the state. The API even allows the pay-as-you-go approach.

How would we estimate the fuel cost of any host function call? Do some sort of benchmarking and assign some fuel cost based on this? Could we then not do the same for the background validation? Something like measure 1000 balance transfers with sync signature validation vs background validation. Then we should hopefully see that background validation is faster and be able to calculate the rough cost of one background validation request.

The benchmark extrinsic bench builds a full block and then re-imports it many times.
I tried to compile with background-signature-verification but somehow its not calling the code :man_shrugging:.
I am probably using it wrong somehow.

You are sure that you activated the feature correctly?


Assuming we want to have wall clock time spent and the amount of spent fuel correspondence, this might not work. The reason for this is that PVF is an adversarial environment.

Assume you assign the fuel cost according to the benchmarked average execution time. Assuming that the same operations executed serially will take more time (otherwise, there is no benefit in parallelization in the first place), then the adversary can make the PVF execution time larger than expected for that fuel amount. This is because the adversary can make the parallel checks essentially serial by spawning execution and immediately joining it by requiring the result.

One potential way to side-step that is to change the API in a way that makes the result of the verification is not observable from within the PVF. For example, the batch verification can only be spawned, and if any of the signatures does not check out, the whole PVF execution is aborted.

That implies a few things:

  1. The parachain author must ensure that all the calls to the batch verification APIs always succeed under the risk of botched candidates. In practice, the collator will have to pre-validate each signature.
  2. Because of that, there are limits to the usability of the API. Contracts won’t be able to use the batched verification APIs. XCM VM should not be able to use it as well.
  3. There is no way to fallback validation to serial in case such a need arises.

At moonbeam we are very interested in being able to experiment with this feature quickly, even in a very conservative way.

I think that even without changing the benchmark this feature offers gains for some parachains, because sometimes the collator that produces the blocks has more resources than the validator core dedicated to the PVF.
In the case of Moonbeam, we observe that some of our blocks candidate are rejected without explanation, we think that they may not be validated in time, and that this feature could perhaps help us reduce this problem.

At first, we can be very conservative and limit to 2 instances (the main one + 1 to check the signatures), that should already offer significant gains.

This is not a problem, because the limiting factor is not the production of the block, but its import.

They can use it to verify their assertions, but there can be no conditions based on the result of the verification.

The Core runtime API can be extended by adding a function execute_block_in_parallel / execute_block_serial (according to the convention defined by default for execute_block).

I tested some benchmarks with this command, here is what I get for a block with 30 transfer:

Background signature verification enabled:

Min: 58922, Max: 60312
Average: 59329, Median: 59271, Stddev: 239.6
Percentiles 99th, 95th, 75th: 60041, 59727, 59491  

Background signature verification disabled:

Total: 9427603
Min: 90389, Max: 152408
Average: 94276, Median: 91182, Stddev: 9922.74
Percentiles 99th, 95th, 75th: 136476, 113739, 91640

The gains seem to be around 30%.

In practice we don’t want the PVFs to be parallelized internally, at least as validators execute them. The reason is that validators are already executing multiple PVFs in parallel, so their cores are likely already fully occupied. Further parallelization will just add scheduling overhead, as well as complexity.

However, we do want stuff like this for optimizing block authorship. As we progress towards asynchronous backing and Parachain Boost, authorship will likely become the bottleneck.

I would support this change if it could be enabled in authorship and disabled in the actual PVF executed by validators. We’d need some logic to ensure that deterministic limits are adhered to. I’d expect that it should be possible to predict an upper bound for the amount of ‘gas’ a signature verification can take, and have block authorship utilize this estimate.

1 Like

Good point, worth mentioning. Although TBH, I don’t have a good grasp on the hardware available for parachains validators and what is the current load.

I might’ve expressed myself poorly. In that post, I explained why a certain shape of the threading API (namely, if the result of a thread could be requested eagerly) is not sound.

I’m not sure how impactful it will be for authorship, but I think the PVF part is important.
I understand your concerns, specially if it is meant to execute multiple PVF at the same time
(I suspect it would be when asynchronous backing will be there and the parachain has delays and send multiple blocks at once, which can be processed in parallel).
However the PVF is the most limiting factor for now (and I believe also in the future) and allowing to reduce up to 50% its time is a good improvement.

I thought of having dedicated cpu thread for signature verifications, but the bottleneck in case of many PVF to execute could become challenge.

I think it comes down to the math of how many cpu threads is considered “required” to run a validator, versus the number of expected parallel PVF.

I will think on this a bit more to formulate a more detailed response, but in the meantime

In fact, due to the nature of the approvals protocol the nodes are already executing multiple PVFs in parallel, and the amount that they are required to is a function of the number of availability cores in the system. We would like to see scaling via stuff like Parachain Scaling by Parablock splitting that allows approvals to validate sequential blocks in parallel.

Hi Rob,

I think the other scaling ideas are good, but I also feel this one is a low hanging fruit we could benefit from. We already know that the signature part is independent of the execution of the transaction (for all blockchain/parachain I believe), making it easy to run concurrently.

If you don’t want to introduce concurrency in the PVF (to avoid possible bottleneck or hard to predict resource management), I can understand. Please let us know your decision so we can move forward with his.

At the moment I believe that introducing PVF parallelism introduces much more complexity without any apparent benefit so I am opposed to it. However, I also believe that down the road block authorship will become the bottleneck and parallelism in block authorship is a good design space to explore.

@librelois Do you think it would be possible to have the concurrent execution be “controlled” by the client ? Allowing to have it enable and if required by async backing or another reason could have the sequential execution ?

Otherwise, how much work would it be to have it only for the authorship ? As it is supposed to run the same code as the PVF right ?

If we want to keep a sequential execution on the PVF side, the only solution I see is to create another runtime API execute_block_in_parallel which would be used by the collators to import the parablocks.
This way each “client” could choose if they want to import the block sequentially or in parallel.

No, the creation of the block does not use the same code as the import. And the parallelization proposal discussed here is only possible for the import.

Also, I think that this functionality (background signature verification) is still interesting for the relay chain and for standalone substrate chains.

For authorship we don’t really need this. Authorship could be improved very easily and much better than background verification. The magical solution here would be to skip signature verification completely on block building. We already verify signatures when a transaction enters the tx pool. So, there is no need to do this on block production.