Referendum #234: Data availability, retroactive funding and on-chain invoices


Please consider adding contextual information about what this referendum is about so that the community is aware and can vote accordingly. - Anaelle - Parity, Polkassembly

One of the aims of Referendum #234 is to raise awareness about the off-chain nature of proposal data and the downstream dependencies and centralisation created when apps such as Nova and Subsquare read title and description from Polkassembly, rather than directly from the chain state.

In this regard it has been successful, as Polkassembly will soon read system remarks and add this info to proposal descriptions - a first step that can be expanded further as detailed later in the post.

It was also a chance to test out a simple version of a fully on-chain invoice that can help provide longer term assurances for contributing teams in terms of data availability.

The associated invoice relates to retroactive payment of 12 months common good research and development in the ecosystem, of which this post constitutes an ongoing part.

Existing proposals

Currently individuals and teams submit proposals to the common good treasuries beginning with a discussion post that is hosted in a governance UI (a publishing platform) such as Polkassembly or Subsquare.

The post serves as a marketing document for an idea, concept or early stage project.

Those logged in via email or Web3 address can share their opinions on the proposal and this feedback, alongside other comments received on shows such as Attempts At Governance constitutes an off-chain discussion phase since data is stored in Polkassembly’s databases, where it is then read by other ecosystem apps.

Once a team has taken on board feedback and feels confident in their proposal, they move to the on-chain phase constructing extrinsics using the following process in OpenGov and requesting funds for work yet to be done.

Since 2020, the Kusama and Polkadot treasuries have spent c. $50m on ecosystem development.

Kusama treasury annual spending analysis - 2020 > 2023 (june)
Polkadot Treasury annual spending in USD & DOT | 2020 - 2023 June

Retroactive payments

Due to concerns around spending the culture in Kusama has shifted through both off-chain social convention and on-chain voting towards teams submitting requests for retroactive payments.

Unlike funding for work yet to exist, retroactive payments relate to work already completed or nearing completion.

As a result, retroactive payments don’t rely on a convincing pitch, nor on discussion, since the value delivered can be objectively assessed on its own merits.

People either appreciate the work, or they don’t, a distinction that aligns with the binary outcomes of referendums which require a simple majority to pass.

A further benefit of retroactive payments is they can be more easily standardised into fully on-chain forms.

Data assurances

Perhaps the primary value proposition of a blockchain is to offer long term assurances as to the availability, consistency and provenance of data.

In the case of individuals and organisations requesting payments from a treasury, there is huge value in knowing that payments and also the associated context is preserved for the forseeable future for legal and accounting reasons.

Although it is possible to pin data to IPFS, it is preferable to contain relevant data within the confines of the sovereign state to which the payment relates, rather than creating a secondary dependency.

System remarks

The vast majority of Substrate’s basic super-powers remain underused and unexplored.

One such super-power are system remarks - a tool better known in the ecosystem as the hack that launched RMRK .

Remarks are like notes, like graffiti on the blocks. The information is not stored in the chain’s trie, but along blocks as input. Remarks are no-effect extrinsics (external inputs), which means they do not alter the chain’s storage, but are stored on the hard drive of the nodes alongside block records.

Two days ago I submitted a preimage via a single system remark function using the following calls:

I included the following text which was uploaded as .txt and then encoded into hex before being grafittied onto the chain state where it will exist as a permanent record.

Date: 12 July 2023
Proposal to: Kusama common good fund
Summary: 12 months independent research and analysis
Term: 1st August 2022 → 31st July 2023   
Type: Applied 
Structure: Retroactive
Approval: Simple Majority
Comparable: Web3 foundation researcher salary
Cost: £100k 
VAT: £20k
Total: £120k
Total USD: £130k 
Price: $23.89 (2023-07-12 16:01:06 (+UTC), Block #18758035)
Total KSM: 5,441.461
Organisation: Decent Partners
Structure: Hybrid  
UK LTD company #10153409
VAT number: 931912824
Project Partner(s) - Richard Welsh
Onchain: jXC7ghviUzVcVySg3eD8prB7C9m6VzK6cw2MTyMTDTUye5q
Conflicts of interest: none
Governance: 1p1v
Use of funds:
- Working capital
- Recoupable loans
- Decision deposits
- Voting
Voting policy: 
Aye - directions
Nay - domains 
Abstain - real or perceived conflict of interest
Donations: FvrbaMus8iASyrQYkajQWDxsYvG5gb72PFPuvy8TvkFFVGn 

Since it was asked elsewhere, KPIs links to an aggregate overview of the work to date, rather than a single defining metric. On their own, each of these metrics can gamed, but they do at least constitute concrete and targeted outcomes since a key objective of independent research is to challenge orthodoxy.

On-chain invoices

Since it is impossible (afaik) to preserve text formatting when encoding/decoding to/from hex, we can allow off-chain UIs to intepret a batch of system remarks to populate a standardised invoicing table and framework that exists entirely on-chain.

To look further ahead, should we begin to identify theses for what might constitute useful on-chain metrics, or batches of on-chain metrics, we can move towards funding requests that offer performance related bonuses and fees to effective ecosystem contributors.

1 Like

Let’s NOT put strings on a blockchain. That’s NOT what blockchains are for.

There’s an easy way to address this which Centrifuge is working on much more generally: push content to IPFS and link it on chain. This is decentralized, adds minimal onchain data and Centrifuge Proposals have been doing this with system.remark for a long time. Here’s an example of a proposal with utility.batch that includes an onchain action and a system.remark pointing to an IPFS hash: Referenda 31

We currently just put a simple proposal text on IPFS but ideally we’d come up with some well structured JSON file that could then be parsed and displayed in any tool such as subscan, subsquare or even your wallet. We are thinking of improving over this by building an extension for substrate to automatically pin files (i.e. storing a local copy of the IPFS hash) that are referenced in democracy proposals. You could possibly extend this other places - for example you automatically pin all NFT metadata for any NFT minted.


This seems fairly subjective.

Per the above, if blockchains are useful for anything its preserving data, the methods used to ensure that data is preserved are myriad. Are Ordinals not what Bitcoin is for?

In the end what matters is what people end up using, not some dictatorial approach that declares some nascent tech better than some other nascent tech.

Could you expand on this a little please?

Following on Lucas’ thoughts, I think Rich does have a point when he warns about the danger when using a secondary dependency: Crust would be an option, as we used for contextual information of child bounties back when the infrastructure maintenance bounty was up and running, but what happens if Crust stops one day? I think IPFS is fine without a chain, as Lucas mentioned once: “with IPFS, you get integrity. And it’s such a small amount of data that it’s easy to just store it and not worry” - But coming back to the issue of a secondary dependency: is a model like IPFS usage sustainable in the long term?

I like the idea of json file to be used over and over to be parsed and displayed in any tool, but worry about the minimal vision of the relay chain. If we use relay chain: whats the usage limit of this? Rich’s proposal is not in harmony with Polkadot’s relay chain minimal architecture vision (this vision could change in the future, but with the recent discussion on system parachains, moving governance to a common good chain, etc, I doubt the minimal vision has changed).

I would see this as a prompt rather than the end solution - we could use Statemine/t. I am the least technically capable here but the issue is real and obvious and should have a simple enough solution that does not introduce secondary dependencies.

If all functions are to move from the relay into system chains then the question to ask is one about persistence - where do we store the most critical long term info?

1 Like

I think ideally, we would do this on a system parachain: as a temporary solution, system.remark might work, but also IPFS could do the trick temporarily. I need to check if Statemine/t is the right choice and what the process would be for submitters to execute this.

1 Like

Subsquare now reading system remarks from Kusama where you can see Referendum 234 invoice.

1 Like

I’m gonna play devil’s advocate here - @RTTI-5220 @lucasvo, right now there is no issue with storage on substrate chains… in fact the counter is true, we have masses of empty blockspace, everywhere.

How many invoices might be submitted on-chain each year? 100? Maybe 250? 1000?

Will this approach really ‘clog’ the chain?

And if it does, well, maybe we should celebrate that the over-use of some particular feature is a positive sign rather than this current culture which seems intent on over-building and expanding for some future demand to satisfy the number go up culture of announcements and hype.

So sure we can use a JSON to structure data and append that to a system remark, or figure some specific system chain for storage, but in the mean time, does this simple hack fulfil the most important quality of solving the problem at hand? Yes.

Technology doesn’t grow by continuing to extend and expand features without some product market fit - more often it is the inverse that is true - you remove features… or as Paul Graham wisely advises - do things that don’t scale

Over-engaging with early users is not just a permissible technique for getting growth rolling. For most successful startups it’s a necessary part of the feedback loop that makes the product good… And except in domains with big penalties for making mistakes, it’s often better not to aim for perfection initially. In software, especially, it usually works best to get something in front of users as soon as it has a quantum of utility, and then see what they do with it.

Polkassembly will be pushing their approach to reading system remarks today.

I would advise against a system parachain. Why? Other parachains don’t have system parachains to offload their governance data to. It’s also something that can’t be extended easily then.

IPFS has a big base of support - anything from client libraries to both Brave and Opera supporting it natively. We only need to add a type to substrate that’s a dedicated IPFS hash and immediately anyone will be able to trustlessly fetch offchain data in any frontend. This will scale to any parachain and only adds ~40 bytes of data to any proposal.

Every snapshot voting project on Ethereum uses IPFS as storage, MakerDAO uses it, OpenSquare makes data available on IPFS. It has critical mass. There are no dependencies to introduce it and opensquare/subsquare are surely happy to do this than directly loading from onchain remarks.

I would much rather have a solution that doesn’t rely on a system parachain. I think the longterm plan is for governance maybe to move to a system parachain itself at which point the data issue isn’t a problem anymore for the relaychain and the proposed IPFS solution would seemlessly transition over.

Job to be done: storing and persisting important data within the sovereign state of the network in a simple and secure way.

Here are the issues with relying on IPFS:

1. Persistence

  • IPFS is not in the business of persisting data, but of addressing it
  • IPFS isn’t so much about storage as it is about distribution.
  • IPFS does to guarantee persistence, just that the data will be immutable via a Content ID hash.
  • If you want peristence, then you need to pin IPFS data, or it can and will disappear.
  • Files put into IPFS will eventually age out of the system over time unless they are either frequently accessed or “pinned” permanently into one or more IPFS servers which has a cost…

2. Value accrual

  • If you then decide to pin data, then you need to pay fees… enter Filecoin or Arweave.
  • If you pay fees to a third party network, that value accrues outside of the local economic arena.
  • Then you get into questions like ‘who pays these fees’?
  • Maybe the network agrees to pay some small fees in the short term, but as this scales, maybe people change their mind, maybe a whale wants to prove a point and cuts off this route via governance.
  • Suddenly data on IPFS starts disappearing. Not everything, but random things.

3. Network effects

This is true and yet none of these network effects accrue to Polkadot / Kusama or the local economy.

This is a perfect example of the kind of thinking that reinforces the status of the incumbent networks (Ethereum) rather than building your own network effects.

Polkassembly now reads System Remarks and populates the description. Thanks @Jas and team.

Working out the title - which needs to be read from either a separate remark or structured data in a JSON.

I’d like to point out that system remarks also don’t solve the problem, as the archive of Kusama/Polkadot suffers from exactly the same problems as IPFS.

Nobody in the blockchain ecosystem has yet figured out a proper solution to this problem.

1 Like

That’s fair - so it’s important when we say IPFS good, System Remark bad, we are clear on which part of the problem we are talking about.

System remarks do at least make it a local problem, rather than a global one.

The only solution I can see is if we encourage personal servers within a bounded collective, we had some of this debate in the back and forth in Rethinking The Road to Decentralisation. and then in Data Storage and Availability.

A core reason for pushing proposals this way is to press on the problem so we can work on ideas / approaches.

I think IPFS doesn’t make it worse than remarks but has several benefits that would be hard to address with a blockchain and does it in a much more scalable way. I think we should move away from people just linking a google doc in their proposal which is what a lot of people do on their polkassembly posts. If people just make a small remark with a google doc link we’ve not accomplished anything. IPFS would allow us to verify the integrity and have easy trustless distribution of a 1MB proposal vs. a few kb which we could realistically do with a remark.

As for pinning/availability: Centrifuge is working on a plugin to Substrate that can autopin IPFS hashes it detects on chain so any full-node operator is encouraged to help with data-availability. This is how I think we can solve it (there still won’t be any guarantees on availability) but as Pierre pointed out, that’s not the case with remarks either.

Definitely, I think we all agree.

So why can’t we hash and address “natively” and require the costs to be charged by node operators in KSM/DOT? Or at least a local network?

Decred do exactly this for their gov / proposal system Politeia. They are by far the most structured when it comes to data immutability availability and persistence around proposals.

From their docs:

All proposals, comments, and votes are anchored on the Decred blockchain using dcrtime and stored in a public git repository. See the Navigating Politeia Data page for instructions on accessing and interpreting Politeia data.

They also utilise the notion of “transparent censorship”:

Politeia is built around the concept of transparent censorship , using dcrtime. Users cannot be silently censored; they can prove that censorship has occurred. When a user registers, a cryptographic identity (pub/priv key pair) is created. This cryptographic identity is then used to create a “censorship token” for each user submission (proposal, comment, comment upvote/downvote). If a user is censored, these tokens can be used to prove that a specific submission was submitted, the time it was submitted, and the exact form of the submission. This cryptographic identity is stored in the user’s browser by default, but can be exported and re-imported at any time.


Discussing a standard JSON structure with @Jas for Polkassembly inviting him to comment to the thread too.

Also note that Polkadot/Kusama use the same DHT implementation as IPFS, and implement the bitswap protocol that IPFS uses. Basically, Polkadot/Kusama are more or less supersets of IPFS. The only thing that’s missing is ways to easily interact/access these capabilities.

The integration of IPFS within Substrate is one of these many things that was started but never really finished, because the 10 people who have the technical knowledge to actually finish it have an enormous TODO list.

I knew that polkadot was loosely based on libp2p and that it should be a small step to make it also an IPFS node should be simple. Thanks for confirming that.

That’s very helpful to know - we’re working on a spec and will perhaps nudge you on giving some feedback.

I will also reach out to other governance projects, UIs etc. to discuss standardizing on the actual format of the offchain data.

Incentive systems make it extremely complex - we end up with the same challenges that filecoin has had 100+ people work on for years… Full nodes should be expected to make this data available and people paying for full nodes are implicitly paying for this data availability the same way they are paying for archival data and other full node services.

To be clear, I’m not proposing to move voting off chain - because that does get messy and we’ll have issues like censorship. We don’t have those with just proposal metadata as we know of at least one person incentivized to make this data available on IPFS: the proposer who wants this proposal to be passed and thus have the data be available. What Decred is doing is working on offchain voting/private voting from what I understand which is a different beast and should be out of scope for this discussion.

So on the pinning / value accrual / network effects question, the fact remains that the IPFS “service” is situated outside the local economic bound of KSM/DOT and usage does not accrue direct benefits to the core.

This is a trade off but my own view is all decisions should be seen through this lens - as they each compound and concentrate value into the common good. Every decision that doesn’t follow this strategy dilutes this.

It’s the use of DCRtime which is the key here - aka proof of existence aka hashing/addressing relevant data.

All proposals, comments, and votes are anchored on the Decred blockchain using dcrtime and stored in a public git repository. See the Navigating Politeia Data page for instructions on accessing and interpreting Politeia data.