Governance discussion data decentralization

Sorry I posted same post on subsquare, but I want to draw more attention and let more community members join for review and discussions, so I posted here too.

We are OpenSquare team, currently dedicated to infrastructure building in the dotsama ecosystem, has developed and are maintaining products including subsquare, dotreasury, statescan, etc. We are proposing a specification to decentralize dotsame off-chain governance discussions data.


In current governance workflow, community users leave proposal context and have discussions in centralized platforms like Polkassembly, Subsquare, etc. Though usually these platforms have public APIs with which community members or other platforms can sync the data, there are still problems.

  1. We can not verify the users’ data. This is a common problem in web2 environment. Maybe these platforms have no motivations to tamper with the data, but we need more verification, less trust. So we can be 100% sure data belongs the owner.
  2. Users’ data maybe lost with a platform’s misoperation, or a platform may just stop operations. We need a way to keep the data is always accessible.
  3. Data should be more auditable. Currently, platforms don’t save all the data modifications. User can modify the legacy leaved context, and the website will just show the context is updated, so users may not be able to see the old data which may include the proposal owner’s commitments.
  4. The more one signle centralized governance platform own users’ data, the less possibility another platform can provide better solutions. We need a relatively decentralize data hosting solution which will not rely on single platforms or a dedicated group of people.
  5. Differences of different platforms’ API data format make it hard to sync and adapt all other platforms’ data.

We propose SIMA spec to solve above problems. SIMA defines a set of user actions and data standards to decentralize off-chain governance data for substrate based blockchains. In general, with this spec governane users sign their actions data with their polkadot keys, submit the signed data to spec implementers. Spec implementers will be responsible for submitting the IPFS CID of user actions data to blockchain with a system#remark extrinsic.

It’s still in draft status, and we’ll be very appreciated with any suggestions. We will begin the development when the community reach some degree of consensus with its feasibility. Please check the full spec.

1 Like

Overall agree with the problems, and hope Substrate can lay the foundations on which to solve them, as I know it’s a much larger problem than just Polkadot governance!

I wanted to make sure I get the process correctly:

  • Is every user action “off-chain” actually mapped one-to-one with a remark? If so, what other than the data is “off-chain” about it. Seems like a pallet to manage these interactions on chain could be made to be more expressive, performant, and overall more useful IMHO
  • All interactions and JSON objects are stand-alone, and thus nothing prevents publishing of possibly masking / misleading statements, like amending a post that does not exits previously for example… correct? For example referencing a CID that is junk (intentionally) or not available (perhaps even lost forever to IPFS nodes). I don’t see protocols in this spec to address validity of order of operations and referencing IPFS data that may be junk.

What other work has been done in this area, in the Polkadot ecosystem as well as beyond, that you are considering or being inspired by?

@NukeManDan Thank you for your time.

Is every user action “off-chain” actually mapped one-to-one with a remark?

No. Every user action is represented by a json object, mapped one-to-one with a CID. Remark extrinsic has a CID which map a json object which contains an array of user action CIDs.

All interactions and JSON objects are stand-alone, and thus nothing prevents publishing of possibly masking / misleading statements, like amending a post that does not exits previously for example… correct?

  • For discussion post, we can not amend it, but the author can append extra content which will be highlighted.
  • For proposal context, proposal authors can override the previous provided context with new actions.
  • We can not amend comments. Just leave new comments to explain previous ones.
  • I think I miss actions to cancel upvote/downvote.

For example referencing a CID that is junk (intentionally) or not available (perhaps even lost forever to IPFS nodes).

At the very first implementation, I don’t want to make it mandatory that spec implementers have to upload user actions’ data to arweave, crust, etc, but I believe decentralization storage solution should be a way to solve data loss. Subsquare as a potential spec implementer will upload them to multiple public IPFS host services.

I don’t see protocols in this spec to address validity of order of operations and referencing IPFS data that may be junk.

Every user action contains a timestamp field, and spec implementers should not accept user submission with a timestamp which has a long distance with ‘current time’. Yeah, there maybe malicious spec implementers, spec implementers can choose the trusted ones.

What other work has been done in this area, in the Polkadot ecosystem as well as beyond, that you are considering or being inspired by?

I was inspired just by problems we encountered in our experience of developing subsquare, dotreasury, off-chain voting.
It’s impossible for us to find out all related work in this area. We learned uniswap’s governance portal. I think they upload the full proposal text to ethereum blockchain which I think too expensive and not so necessary.

I think this is a step in the right direction, thanks for preparing this spec proposal.

My thoughts on the current proposal
One problem that the document tries to solve is providing authenticity of messages on governance platforms. Signing the action, context, change and timestamp with the key of the author already helps to provide authenticity that is verifiable in the future (i.e. some action really was performed by some user in a given context at a given time). This is already a great improvement compared to the current situation, as governance platforms could currently easily modify the contents of their page without anyone being able to verify it. Having a valid signature, the raw data and another external service to verify the data is already a huge benefit from my perspective.

Another big feature that the spec provides is the unification of all governance platforms: Any governance platform would be able to utilize the data from other governance platforms, thus creating a unified governance database. New governance platforms could easily catch up with all the data and even participate in adding new data. I think that utilizing identities can help here immensely to verify the authenticity of the provider. I think that using system.remark is a good generalized solution, as it is available on any Substrate chain as of now. In addition, data usually is not deleted, so there should be barely any storage benefit when providing a specialized pallet for that. I like the git-like approach of providing deltas, so the complete history of changes can be replicated and interpreted by different software products.

My improvement proposal
What is the purpose of the blockchain here?
Since the governance data is stored in IPFS and all the data can be verified, since it contains all necessary metadata and a signature to proof authenticity and reconstruct the correct temporal order of changes, why do we need store all the CIDs on the blockchain at all? Instead, we could create a public IPFS cluster that utilizes the blockchain ONLY to retrieve authorized keys that can push data into that cluster. That way, any governance platform could apply to be listed as a verified governance platform and have their keys added by the governance body into the “verified IPFS cluster” set on the blockchain. The IPFS cluster retrieves the keys authorized to add data from the blockchain. This would immensely reduce the storage requirement on the chain, as all the actual data lives in IPFS, whereas only the set of keys authorized to add content to the IPFS cluster lives on the blockchain.

2 Likes

Instead, we could create a public IPFS cluster that utilizes the blockchain ONLY to retrieve authorized keys that can push data into that cluster.

I’m not an expert of setting up IPFS cluster, but I have following worries:

  1. We can not assume the cluster is always stable, and I’d prefer redundant public IPFS gateways for stability and availability.
  2. I’d prefer all the process should be open and permissionless, but the application to join the cluster seems unreasonable which may prevent a third party to become a spec implementer.
  3. I think these cluster node should be maintained by spec implementers. Though only permissioned members can push data, there still be possibilities that one node may push huge irrelevant data which will cause trouble for spec implementers to verify and recover data.

What is the purpose of the blockchain here?

2 kinds of blockchains are involved. One is substrate chains like dotsama, while another is storage chain like arweave. Both of them provide the guarantee about stability and availability.

  • system#remark on blockchain different heights coordinate the process to recover data.
  • The gas for remark extrinsics will be completely acceptable. I’ll give the detail below.
  • Gas for remark extrinsics will prevent irrelevant huge data attack.
  • Arweave/crust chains will guarantee data is stored forever with max possibility.

This would immensely reduce the storage requirement on the chain, as all the actual data lives in IPFS, whereas only the set of keys authorized to add content to the IPFS cluster lives on the blockchain

In current spec, the extrinsic and remark data spec implementers will submit is under control and the max value is acceptable.

  1. The spec implementer won’t submit every action CID, while they submit only CID of a group of actions. Check here for details.
  2. Spec implementers can control the frequency to submit remark extrinsic. For example, if a spec implementer submit 1 extrinsic every 5 mins, there will be 288 remark extrinsics submitted at most in one day.
  3. A spec implementer don’t have to submit extrinsic if there are no users’ action data. This spec is designed for a forum like governance discussion solution. not like IM product, users’ action frequency is usually not so high that I expect remark extrinsics will be much lower than the calculated max possible value.
2 Likes

Well written but I have a question

You state here that Users’ data maybe lost with a platform’s misoperation, or a platform may just stop operations. We need a way to keep the data is always accessible.

What ways and how do you intend to keep users data safe and accessible at the same time?

  1. User sign their data which can be verified, so it’s not possible for centralized platform to modify them.
  2. We upload these data to IPFS, both on different IPFS content nodes and decentralized storage solutions, like arweave or crust.
  3. We submit system#remark calls which record user data IPFS CID, so other platform can scan them and recover all user data and actions.

Check sima-spec for more details.

Nice write-up. I believe that the essence of web3 is decentralisation, therefore, amazing innovative ideas as this should be encouraged.