Relay Chain Vulnerability: False Validator Slashing Due to Proof Verification Bug

Affected Systems: Polkadot Relay Chain, Kusama Relay Chain
Severity: Critical

Notes on Responsible Disclosure

We want to first address the issue on responsible disclosure. This is a particularly interesting vulnerability, where, by publishing the full detail of the vulnerability, it strengthens the defenders’ situation and makes attackers’ situation a lot more difficult. The vulnerability details enable everyone to monitor the attackers’ potential attack vectors, effectively making it impossible, and buy time before the vulnerability is patched. More details about this is explained below.

We, of course, also want to grumble that Parity/W3F didn’t treat the bug seriously in the beginning, but we want to note that this did not affect our consideration for Polkadot’s network security.

Summary

PR#11738 fixes a same-class bug of proof verification in Polkadot SDK, similar to the Hyperbridge vulnerability just happened. Developers initially labelled this simply as “improved to be more secure”. However, upon our further analysis, we think this is an actual critical security vulnerability, related to false validator slashing, and fixes should be deployed immediately.

Attack Scenarios

Attack is done through Beefy equivocations. Before PR#11738 is deployed, forged MMR verification proofs enables anyone to submit beefy fork-voting offense report. Full trace from extrinsic to vulnerable primitive is as below:

attacker signed extrinsic
    pallet_beefy::report_fork_voting[_unsigned](ForkVotingProof, KeyOwnerProof)
        T::EquivocationReportSystem::process_evidence(reporter, ForkVotingProof)
            check_equivocation_proof:
                is_proof_optimal(...)
                extract_validation_context(header) -> canonical_mmr_root
                T::AncestryHelper::is_non_canonical(commitment, proof, validation_context)
                    [runtime wiring: type AncestryHelper = BeefyMmrLeaf]
                    pallet_beefy_mmr::Pallet::is_non_canonical
                        pallet_mmr::Pallet::verify_ancestry_proof(canonical_mmr_root, proof)
                            [buggy function]
                            returns Ok(attacker_controlled_prev_root) for any prev_peaks pre-fix
                if canonical_prev_root != commitment_root -> return true (non-canonical)
                check_commitment_signature(commitment, id, signature)
        slash_fraction() = Some(Perbill::from_percent(50))
        Offences pallet -> StakingAhClient (relay) -> StakingRcClient (Asset Hub)
        pallet-staking::on_offence -> UnappliedSlashes entry (deferred)
        Session: DisabledValidators updated immediately (not deferred)

The offense report would then trigger validator slashing, with all the subsequent effects followed.

Exploit

Inputs needed are all public:

  1. A real BEEFY signed commitment from target validator V. Every validator signs every round; all publicly gossipped on the BEEFY p2p layer.
  2. A valid key_owner_proof for V’s BEEFY key in some historical session. Derivable from pallet-session::Historical’s publicly-readable merkle tree.
  3. Any valid finalized relay header with an MMR-root digest, to serve as header for extract_validation_context.
  4. An arbitrary ancestry_proof with attacker-chosen prev_peaks. Pre-fix, the verifier accepts these without checking they bag to a real ancestor root.

No validator cooperation required, no private keys, no bond or deposit.

Maximum disruption model

With UpToLimitWithReEnablingDisablingStrategy<3> in pallet-session:

  • Max concurrent session-disabled: 199 out of 600 validators ((n-1)/3)
  • Slashing cap: no cap (all 599 non-invulnerable validators can be slashed; each slash applies independently)

Attacker can sustain 199 disabled indefinitely by submitting fresh reports against different (set_id, round) time-slots each session.

The “perceived” attack economics is also huge. However, there’s an important caveats, which we explain below in the “Governance” section.

  • Active validator set: 600 (Polkadot)
  • Total staked: 892,209,096 DOT
  • Mean/median bond per validator: 1.49M / 1.45M DOT
  • Slash fraction (fork-voting): Perbill::from_percent(50) fixed (not the dynamic (3k/n)² GRANDPA/BABE formula)
  • Per-target slash: ~725,000 DOT (~$2.9M at DOT=$4)
  • Whole-set slash: ~446,000,000 DOT (~$1.78B, ~30% of DOT market cap)
  • Attack cost: zero (Pays::No waives fee on successful report)

Consequences at 1/3 sustained disabling

System Result
Relay block production (BABE) Degraded: block time 6s → ~8-9s
Relay finality (GRANDPA) Unaffected. All 600 authorities continue to vote.
Asset Hub / Bridge Hub / Collectives / People / Coretime Degraded to ~95% throughput (group-backing probability: P(≥2 enabled in size-5 group) = 95.5%)
Snowbridge outbound (Polkadot → Ethereum) All 599 BEEFY authorities continue to sign. Outbound gated only by Bridge Hub parachain throughput (~95%).
Snowbridge inbound, Kusama↔Polkadot bridge Degraded to ~95%, not broken (don’t use BEEFY)
Governance extrinsic processing (Referenda, Fellowship) Functional but slower (~95% parachain throughput)

The chain does not halt; services are degraded.

Governance

However, slashing would take 28 days to apply. As long as all community members keep monitoring the chain for incorrect slashing and actively participate in slash cancelling referendum, the risk for this vulnerability is reduced to a minimum.

Timeline

Event Date Commit / reference Delta
Bug introduced in polkadot-sdk 2024-05-13 f4b73bd182 (PR #4430, Serban Iorga) “Add generate and verify logic for AncestryProof” 0
BEEFY caller wired in 2024-07-03 b6f1823244 (PR #4522, same author) “BEEFY: Add runtime support for reporting fork voting” +51 days
First stable SDK branch 2024-07 stable2407 +~60 days
Pulled into Polkadot relay runtime 2024-12-12 polkadot-fellows/runtimes 7b096c14 “Update to SDK stable2409-1” +213 days
Referendum #1877 whitelists v2.2.0 2026-04-12 subsquare referenda 1877 +699 days
Fix merged 2026-04-13 122cb84 (PR #11738, Serban Iorga) +700 days
Referendum #1877 cancelled 2026-04-13 (same day) +700 days

Dwell time:

  • In polkadot-sdk master: 700 days (1.92 years)
  • In Polkadot relay chain production (v1.12+ runtime): ~485 days

Disclosure Rationale

Pre-disclosure, on-chain events for a hypothetical attack were technically public but practically unmonitored. ConcurrentReportsIndex[b"beefy:equivocati"] had been empty for 4,713 sessions; essentially no infrastructure operator maintained targeted alerting on BEEFY-specific offence kinds because no historical baseline existed.

By publication, we create the necessary monitoring population. Every developer, exchange, indexer, and bridge operator now has incentive to alert on BEEFY offence reports. Latent observability converted to active surveillance. This actively dissuades attackers.

The conventional “silent patch reduces 1-day window” heuristic assumes defender attention is already maximally focused on the bug’s signals. For this bug, that assumption fails: nobody was watching the right storage map. Publishing the fix does more to activate defender attention than it does to accelerate attacker discovery, because:

  • The attacker discovery rate pre-publication was effectively zero over 485 days
  • The defender attention rate pre-publication was also effectively zero
  • Publication recruits every patch-reviewing security researcher (many) and every responsive infrastructure operator (many) to the defender side
  • Publication recruits attackers only to the extent that attackers who couldn’t find the bug themselves now can — a smaller effect given the hypothesis that attackers aren’t reading this code

Recommendations

An emergency runtime upgrade is of course recommended.

In addition, the bug is not the only one in the Polkadot SDK codebase related to proof verification issues. The other two, fortunately, wasn’t exploitable:

  • PR#11144: Pre-fix, tx_index wasn’t validated, but that can’t really be exploited.
  • PR#11739: PR#11144 itself introduced a serious unlimited mint bug in Snowbridge, which was fixed by PR#11739 (and as the PR author wanted to note, this PR was the reason runtime 2.2 got delayed).

Because of the above, we also recommend a full audit of the Polkadot SDK codebase for all proof verification issues similar to Hyperbridge.

Did you manage to run a local relay chain and trigger the exploit(successfully) in a local environment?

Hi Wei, just tunnel visioning on these two statements:

Consequences at 1/3 sustained disabling

[…]

Relay finality (GRANDPA) Knife-edge: exactly 2/3+1 (401/600) enabled; any additional validator outage stalls finality

[…]

Snowbridge outbound (Polkadot → Ethereum) HALTED — BEEFY signing set drops below 2/3+1 signer threshold

[…]

It’s true that whenever validators get disabled, this happens immediately for the associated report’s session’s remainder (except where re-enabled if disabling limit reached), yet this does not drop the disabled validator(s) from GRANDPA/BEEFY active sets. So disabling wouldn’t on its own affect finality or BEEFY rounds, hence neither Polkadot->Ethereum outbound.
Happy to discuss if you disagree.

Thanks for the correction! Fixed.