Polkadot state sync

Polkadot Host implementations use Fast/Warp sync to synchronize missing block headers. They are followed by the state sync. The faster we can synchronize missing block headers and the most recent state, the faster our node will be able to start executing the most recent blocks and create new blocks (if we are running validator). In this post we will explore current strategies to sync the state as well as propose a new approach.

The goal is to come up with state sync protocol that:
1. Easily parallelizable
2. To save bandwidth responses do not contain overlapping data
3. Resistant to DOS attacks

If you are already familiar with how state sync works, just scroll down to Proposed solution section.

State

Key-value

Each block in polkadot blockchain has associated state.
State is key-value collection.

state["money:alice"] = "100$"
state["money:bob"] = "200$"

State hash

Block state is immutable.
Each block stores hash of state in a form of merkle root to ensure state integrity.
Forged state will have different hash.

hash1 = hash(state({ "money:alice": "100$" }))
hash2 = hash(state({ "money:alice": "1000$" }))
hash1 != hash2

Polkadot trie

Polkadot stores state as merkle trie.

Trie

Trie is prefix tree.

Nodes with common prefix are children of parent node with that prefix.

[
  "money:alice",
  "money:bob",
]
->
node("money:", [
  node("money:alice"),
  node("money:bob"),
])

Parent node stores children by letter which follows parent prefix.

[
  "money:alice",
  "money:bob",
]
->
node("money:", {
  "a": node("money:alice"),
  "b": node("money:bob"),
})

Node key omits redundant prefix.

[
  "money:alice",
  "money:bob",
]
->
node(partial="money:", {
  "a": node(partial="lice"),
  "b": node(partial="ob"),
})

Merkle trie

Merkle trie is merkle tree.
Block stores hash of root trie node.
Immutable node is uniquely identified by it’s hash.
Parent node stores child hash.
Polkadot node database stores node by hash.

trie1 = node("money:", [
  node("money:alice"="100$"),
  node("money:bob"="200$"),
])
trie2 = node("money:", [
  node("money:alice"="1000$"),
  node("money:bob"="200$"),
])
->
{
  hash1: node("money:", [hash3, hash4]),
  hash2: node("money:", [hash5, hash4]),
  hash3: node("money:alice"="100$"),
  hash4: node("money:bob"="200$"),
  hash5: node("money:alice"="1000$"),
}

/state/2

Polkadot “/state/2” protocol is used to obtain state from other peers.
There are two subprotocols inside “/state/2” protocol.
They are distinguished by request.proof: bool field.

/state/2 proof=false

This protocol requests next page of key-values.
It stops when response.end: bool field is true.
Then it re-creates trie from key-values.

step 1
  request(after="")
  ->
  response(
    "money:alice"="100$",
    "money:bob"="200$",
    end=false,
  )

step 2
  request(after="money:bob")
  ->
  response(
    "money:charlie"="300$",
    "money:dave"="400$",
    end=true,
  )

Problems

We can’t check root hash until we receive all key-values.

Re-creating trie nodes from key-values may change root hash (inconsistency during trie v1 migration).

Waiting for all requests sequentially may be slow.
We may want to send parallel requests to different peers.
But we don’t know range of keys in trie.
We may blindly guess keys, but responses may overlap , increasing bandwidth.

step 1 (parallel)
  request(after="a")
  request(after="b")
  request(after="c")
  ->
  response(["a", "b", "c"])
  response([     "b", "c"])
  response([          "c"])

/state/2 proof=true

This protocol requests trie nodes (merkle proof) required to get next page of key-values.
That increases response size, especially if trie is getting deeper.
It stops when client receives all trie nodes.

step 1
  request(after="")
  ->
  response({
    hash1: node("money:", [hash2, hash3, hash4, hash5]),
    hash2: node("money:alice"),
    hash3: node("money:bob"),
  })

step 2
  request(after="money:bob")
  ->
  response({
    hash1: node("money:", [hash2, hash3, hash4, hash5]),
    hash4: node("money:charlie"),
    hash5: node("money:dave"),
  })

We can check root hash for each request.

It receives trie nodes as is, so doesn’t need to re-create them.

Prefixes from received nodes limit possible keys for future requests.
But we don’t know how many keys exist under prefix, so responses may overlap.

step 1
  request(after="")
  ->
  response({
    hash1: node("money:", {
      "a": hash2,
      "b": hash3,
      "c": hash4,
      "d": hash5,
    }),
  })

step 2 (parallel)
  request(after="money:a")
  request(after="money:b")
  request(after="money:c")
  request(after="money:d")
  ->
  response([hash1 hash2 hash3])
  response([hash1       hash3 hash4])
  response([hash1             hash4 hash5])
  response([hash1                   hash5])

Even sequential request responses duplicate parent nodes (root, “m”, “mo”, “mon”, “money:”).

Problems

  • Responses duplicate parent nodes.
    • Uses more network bandwith.
    • Client already knows these nodes.
  • Parallel responses overlap.
    • Uses more network bandwith.
    • Reduces profit of parallel over sequential requests.

Proposed solution

Protocol can request trie nodes by hashes.
Response will contain one or more requested nodes.
Response may contain children/decendants of requested nodes.
Unknown child hashes from response can be used in new requests.
Unknown hashes can be requested in parallel from different peers.

Response doesn’t return keys outside of requested node prefix.
Client requests non-overlapping subtrees.
So responses don’t overlap.

Response doesn’t contain parent nodes, so it doesn’t duplicate them.

tree
  hash1 => node([
    hash2 => node([
      hash4 => node(…)
      hash5 => node(…)
    ])
    hash3 => node([
      hash6 => node(…)
      hash7 => node(…)
    ])
  ])

step 1
  request(hash1)
  ->
  response(node([hash2, hash3]))

step 2 (parallel)
  request(hash2)
  request(hash3)
  ->
  response(node([hash4, hash5]))
  response(node([hash6, hash7]))

step 3 (parallel)
  request(hash4)
  request(hash5)
  request(hash6)
  request(hash7)
  ->
  response(node(…))
  response(node(…))
  response(node(…))
  response(node(…))

done

If client needs only particular keys (like “/light/2” protocol), or wants to receive them as early as possible,
request can contain such keys.
Server will return nodes on path along this key, ignoring nodes outside this prefix.

Spam

Malicious client can request many invalid hashes from server.
Server may not have all valid nodes, so it can’t prove that hash is invalid.
Server will process these hashes independently.
Such invalid request with N invalid hashes will be handled in O(N) db reads.

Client may prove server that requested hashes exist in trie.
Simple merkle proof can consist of trie nodes containing requested hashes.
Valid node hash must be root, or child of valid node.

Server already knows all valid trie nodes which client may request.
So proof can use hashes instead of nodes.
Proof and request hashes will be organized as tree, to preserve parent-child relation.

request(
  hashes=[hash8, hash7],
  proof={
    hash1: node(…, [hash2, hash3, hash4, …]),
    hash2: node(…, [hash5, hash6, hash7, …]),
    hash5: node(…, [hash8, hash9, …]),
  },
)
->
compact(hash1, [
  compact(hash2, [
    compact(hash5, [
      compact(hash8, []),
    ]),
    compact(hash7, []),
  ]),
])

Proof validation bound is same for valid and invalid proof.
First db read will check if server has root node.
If it doesn’t it stops.
If it has node, before next db read it will check that next hash is child of current node.
So invalid hashes won’t be read from db.

hash1 = node([hash2, hash3])
valid = compact(hash1, [
  compact(hash2, […]),
])
invalid = compact(hash1, [
  compact(hash2, […]),
  compact(invalid1, […]),
  compact(invalid2, […]),
  …
])
1 Like