Updated Squidsway proposal Nov/ Dec 2025

I’m looking for feedback on this proposal.
Since the last post, I’ve focused more on what governance insights are and what they’re for
and have tried to make it more understandable what kinds of data the tool will ingest.

So, community, please let me know … is the proposal clear?
Can you understand from this what the tool does?
Why we need governance insights?
Why it’s necessary for the project and tool to be agile and ongoing?

Squidsway Governance Report and Tool

Actionable governance insights, from a rich data chain indexer

GOVERNANCE FAILURES ARE A TREASURY ISSUE.

SQUIDSWAY WILL SOLVE THOSE FAILURES FASTER.

I want to improve Polkadot governance because I’m a cypherpunk and I think Polkadot can lead the world, not in just governance of blockchains, but in blockchain-based governance of the offchain world.
Governance is a product on Polkadot, its a field we are leading in, and we should invest in growing the lead we have - make it something to showcase.

But you, dear tokenholder, should fund improving Polkadot governance because

GOVERNANCE FAILURES ARE A TREASURY ISSUE

We are iterating our processes based on assumption, hunches and louder voices, instead of evidence.
That wastes time and costs money.

The alternative to iterating based on vibes is data.
Squidsway is a proposal to collect and compile specific bespoke data, targeted at objectively assessing how OpenGov users respond to everything we do in OpenGov - and to generate insights from these assessments, in order to inform how we continue to iterate OpenGov.

The proposal for Squidsway funds two things:
The Squidsway tool:

A chain indexer with rich data ingestion modules,
for testing and quickly iterating hypotheses and generating actionable insights about user behaviour.

The Squidsway project:

Publishing governance insight reports roughly every 3 months (and on shorter timescales case-by-case).
Continually adding modules to the tool, to support investigation through the tool, for the purpose of generating insights.

The tool will be open source, for any dev (eg, ecosystem product teams) to use, and future work includes an LLM-based frontend for non-devs to query it.
The project will be funded by the community on an ongoing basis, so will be focused on live, open questions that the community is discussing at any given time. There will be a mechanism for the community to request data on issues of interest.
This proposal funds only the first three months. If the community likes what it sees, then subsequent proposals will fund ongoing work.

Deliverables

This first proposal is for $8k USDC, to fund 80 (=40+40) hours over around 3 months,
being the development of an MVP, followed by the first half of the validation phase.

At the end of the work funded by this proposal, the tool should consist of:

modules to:
. ingest relevant governance events from chain data
. ingest structured/quantitative offchain data (e.g. from Polkassembly)
. curate data (using queries to assign tags, e.g. “whale”, “shrimp”)

and
. an indexer capable of reindexing based on these types of data.

The second proposal would fund the second half of the validation phase.
By the end of that work, I intend that the tool will be ingesting qualitative (natural language) data and outputs would begin to demonstrate what is possible with the tool. I should also have some basic benchmarking to flag up any feasibility questions and potential non-labour costs for the future.

Methodology

The methodology is intended to be very, very agile.
The idea of generating insights is to tell us something we didn’t know, rather than setting out to prove or disprove a pre-defined set of hypotheses.
Central to that is the ability to, in investigative terms, ‘pull on threads’ - or, in software terms, to ‘rapidly iterate’. This means that the treasury will, for each sprint/for each proposal, be funding something that it does not know what it will be.

This agile way of working is necessary because:

1 - We need to go where the evidence takes us
2 - It’s likely that many of each of the small technical steps that would make up a milestone can only be identified once a previous step is complete, so identifying and costing out these small technical steps in advance would either lead to wasted labour or force investigations down an inflexible path.

The fact that, in the base case of Squidsway funding referenda, the treasury will be funding something unknown should be mitigated by the ongoing nature of the project, and the fact that each ‘milestone’ (ie funding period) is a small amount.

What kind of user behaviour are we trying to encourage?

Defining and encouraging the desired outcomes is a question for OpenGov or for the teams making use of Squidsway.
Squidsway is not the part which incentivises or encourages user behaviours
– it’s the part which identifies where the opportunities are to do that.

‘user behaviour’:

To illustrate the meaning, though: ‘user behaviours’ will generally be (individual or aggregated) measurable actions that can be taken onchain in the Polkdaot ecosystem, such as voting, staking, liquidity provision, but likely in more specific detail then this, such as “voting by pre-existing wallets that never voted before”.

‘encourage’:
Already, we seek to change user behaviours all the time - incentivising adoption and liquidity, using social norms to encourage delegation and voting, working on UX to reduce friction and using (some pretty blunt) technical instruments to encourage the adoption of procedures and norms for proposer and delegate behaviours.
The mechanisms we are using - game theory, finely targeted incentivisation and the like - are powerful but we are often applying them amateurishly, iterating our processes and mechanisms based on just guessing what works.

The idea behind Squidsway is that we encourage these kinds of user behaviours by more empirical (ie more reliable) means.

WTF is ‘rich data’ / ‘chain indexer’?

A chain indexer is a tool that indexes and stores, in greater or lesser detail, a blockchain’s data. Most relevant data in a blockchain (even data as basic as account balances) is not accessible unless you consult a node or an indexer. RPCs under the hood of polkadot.js or your wallet software connect to full nodes but data applications like most block explorers, or data dashboards, use chain indexers on their backend.

Applications that process blockchain data usually index and store the information which is easiest to obtain, and when they want to combine of these different data sources (such as comparing voting frequency of wallets against those wallets’ balances), they combine already indexed datasets. This is faster, but limits the complexity of the combination.
More complex data applications such as Chainalysis’s perform some degree of multi-step indexing, allowing them to retrieve during index time, so they can treat their datasets as graph data (meaning the indexer can follow trails at index time).
The Squidsway tool takes this a couple of steps further with what I’ll call 'compiled and ‘curated’ data.

compiled data is just data that has been indexed and combined through multi-step context-aware indexing. It could be, for example, “average conviction” for each account (across the accounts’ lifetimes), or “voted on with higher/ lower/ usual conviction” for each proposal.

curated data uses tags for fast reindexing of commonly used conclusions - for example, accounts could be tagged with categories from “whale” to “shrimp” despite the fact that account balances change over time.

In addition to these, the third and most powerful kind of rich data the Squidsway tool will index is
offchain data
Since the tool will reindex multiple times, there is less need for its data sources to be fast.
This opens up the possibility to make use of (API-based) web data and, at a higher processing cost, scraped web data and LLM outputs.
For example, the tool will ingest discussions from Polkassembly/ Subsquare / Polkadot forum and process the natural language in discussions there in order to generate tags for sentiment, contentiousness, compliance with each norm, etc.
I hope that this particular feature will help proposers avoid creating proposals that fail for predictable reasons, and create a healthier environment in online governance discussions in general.

(Full proposal)

gm @Mork

IMO, the only thing that needs further clarification is the deliverables. tool + research. (Squidsway tool vs squidsway project). Prob break it into 2 proposals.

Additionally, I echo the suggestion on your previous post that your project should have a feature to index data for new proponents, like SEO indexing.

I also read your reply, but I still believe that we need a discovery tool for newcomers. Or maybe a solution would be implementing other off-chain data sources (e.g., eth forum, bitcoin forum, etc.), to see if they are active in different ecosystems, something like a off-CrossChain data indexer.

gm @wariomx ser,

Thanks for taking the time to feed back.
Sorry, but I’m not completely clear on some of your points :confused: Hope you don’t mind clarifying?
But first I’ll answer what I think you mean by this first suggestion:

the only thing that needs further clarification is the deliverables

Do you mean specify in more details what the deliverables look like?
Or do you mean as in:

tool + research. Prob break it into 2 proposals.

If it is that, I’m not sure how it would make sense as a proposal to do that - maybe I need to think on that.

I guess it’s obvious why a proposal for the insights report makes no sense without the tool (ie the tool is necessary in order to investigate in enough detail to get info which is valuable)

Would the tool make sense without the governance insights report?
I mean, yeah, kinda, in that would be a useful platform for product teams to evaluate their own metrics, and for any community member who is sufficiently nerdy to dig in and generate insights on their own. But I’m not sure that anyone would take the time to do that if they had to write all of the ingestion modules themselves :thinking:

Maybe if I explain the flow more, it will be easier to see why …
The tool at it’s most basic is just a chain indexer like SQD, run multiple times.
but what gives it its value is to add modules for ingestion of ‘rich data’. So, sure, this could be a single data source, such as natural-language posts in this forum.
But then that data alone doesn’t tell us very much. What gives it value is ‘compiling’ and ‘curating’ the ingested data, and then ingesting and compiling the new data generated, rinse and repeat.

Each time a new dataset is compiled, or curated, the user (in this case, me) is making a choice of what to look for - to ‘pull on threads’.
When the datasets are combined this will (not always, but for arguments sake, let’s think of it like this) be a new module to insert into the tool.

So, while the following through of threads creates value by getting us nearer to an insight, it also adds modules to the tool - makes the tool itself more valuable, irrespective of the tool’s results on a given run.

Without adding modules for compiled and curated data, the tool would be just a chain indexer that is also able to ingest offchain data sources. Which is cool - but if a dev is skilled enough to be able to write a module to ingest some kind of non-API offchain data (that being the messiest kind of data), then that’s the hard part already done - they would just then combine that with a simple chain indexer or (if the dataset they are comparing it to exists already) run a query using some database-based solution or, eg, Dune Analytics.

Maybe I got too deep in the weeds there but my point is, I guess, that the value of the tool comes from it being a living evolving thing (unlike a dashboard-like solution that repeats the same query on newer data) - and what makes it live and evolve is it being used to drill down into insights.

(some of this was covered in the more detailed proposal background - I’m still working on that (hence asking for feedback here!) but if you are interested in more detail, hit me up on tg and I’ll send you the WIP)

your project should have a feature to index data for new proponents, like SEO indexing.

If I have understood this right, you mean somebody puts up a proposal, and then Squidsway project kinda background-checks the proponent using SEO scores? (and maybe some other info)?

Yeah, sure that is possible, but it’s a pretty simplistic use of the tool - though certainly the tool would help with lookups of onchain data (previous proposals, maybe an activity score). But that onchain part would be pretty simple work to do.
If there is interest, when the tool is up-and running, I would suggest make a request as a community member for such a module and if there is support in community, sure I could add that, and the query could just be run by a bot when a new proposal is posted.

like SEO indexing.

As I argued in that other post, I feel that subjective results like SEO would be out-of-scope for the Squidsway project, funded by the treasury.
But it would certainly be possible (with whatever degree of subjectivity the user chooses to accept) with the Squidsway tool and I would of course give you or Anaelle or whoever wants to build modules, technical support.
Alternatively, maybe the community will tell me this is what the project should be doing rather than governance insights and then sure, the project can pivot :person_shrugging:. I’ll wait for a ot more discussion than just this before doing that pivot, though :wink:

a discovery tool for newcomers

And I’m guessing here - apologies if I’ve got it wrong - you mean ‘discovery’ in the sense of like ‘background-checks’, as I mentioned above?

e.g., eth forum, bitcoin forum, etc.), to see if they are active in different ecosystems,

.. goes to the subjectivity issue again but here on a more technical level - the fuzzier the data source, the more the results will tend to meaninglessness and in the case of forum posts outside of a given narrow and well defined context, that would be some pretty fuzzy data. Then multiply that error by the low reliability of identifying individuals across different forums … it’s really a job for a different kind of tool with a different kind of output.