Block header pruning

This thread follows the issue from github: Add APIs for block pruning manually · Issue #1570 · paritytech/polkadot-sdk · GitHub

The block header pruning seems like a logical continuation for the state and block pruning and appears like a nice feature to have.

How to implement block header pruning correctly, though?

Our initial experiments show that it required several steps:

  • change fork calculation logic (the previous algorithm worked on the premise of always existing block headers)
  • add the block header pruning by:
    • removing data from DB using columns::KEY_LOOKUP, columns::HEADER, and block hash.
    • removing data from block header in-memory cache
    • removing data from header metadata in-memory cache (the last two using remove_header_metadata)
  • fix the metadata saving algorithm: prune_blocks_on_finalize_and_reorg test marks several blocks in a row as finalized and commits the transaction later in contrast with prune_blocks_on_finalize which finalized every block in a separate transaction. If we modify prune_blocks_on_finalize_and_reorg test to query a header from the pruned fork we will get one, however, the database won’t contain it. It seems that the cache gets polluted by dirty reads originating in meta updates because meta updates are saved separately and after the transaction.

I would appreciate comments.

Part of the discussion continues here: Change forks pruning algorithm. by shamil-gadelshin · Pull Request #3962 · paritytech/polkadot-sdk · GitHub