Getting consistent, well defined, and thus useful metrics is challenging. I’d like to take steps to address this if the community agrees and encourages it.
Blockchains are incredibly transparent but lack visibility in many areas. For example, anyone can view the activity of any address but gaining visibility into the total number of active address can be challenging. The core of the challenge is due to differences in defining what constitutes a metric. Some may define an active address as any address that has sent a signed an extrinsic, others include addresses that have received a transfer. Many other nuances exist across most metrics . This leads to differing definitions and thus different results for a metric with the same name. The problem is exacerbated when analyzing metrics across different chains and ecosystems.
Example: Moonscan, Subscan, and Polkaholic are great , but which one has the most accurate answer for Active Addresses from Jan 11th 2023 in the table below?
Having clearly defined metric definitions is super valuable to each chain’s community (devs, token holders, investors). We strongly believe its essential that parachains monitor these metrics on a regular basis to decide whether its ecosystem is healthy + growing, and is sustainable, and that Polkadot also monitor parachain health to keep [repeat] demand for slots+quality block space high. Communities need to look at metrics to know how to improve them, and you can’t improve a metric without having a clear idea of how its defined and produced.
We [Colorful Notion, devs of Polkaholic.io] aren’t sure how everyone else computes their metrics exactly, but I think we and everyone should welcome the effort of having a systematic effort to get metrics for the ecosystem defined and produced in the same way. Most of the first metrics we can think of can have a pretty simple computation, but once you start getting into say, TVL of relatively illiquid defi pools you can get pretty different answers. We’d love to come to consensus on how to compute every metric and generate the same results. It turns out, there are a lot of things that are missing (eg public archive nodes, bootnodes, etc.) for indexers beyond metric definition and I think surfacing these missing things will be critical.
That said, whatever these metrics are, computing them tend to output a leaderboard where Substrate parachains look like they are trying to compete to be #1 on these metrics. If we set aside the outright gaming such a leaderboard, I don’t think this is healthy because I don’t think parachains should compete with each other on these metrics: A bit of sibling rivalry between parachains is ok but its ultimately its a zero sum game for users within the Polkadot ecosystem.
Instead, I think we should be computing these metrics for other decentralized ecosystems. You can make the scope of this arbitrarily large but I would recommend the chains on defillama. If it is possible to put these metrics against whatever is available from CeFi (Binance, Coinbase, OpenSea, … ) then we can know if Defi is gaining ground on Cefi. We can compute all the metrics in the world but the expectation is that we be able to interpret them so as to know if anyone is making a dent with the incredible innovation that happens here.
Well said. I think a good starting point is to work together to define a few of these metrics for some parachains, set a best practice, then start doing the same for other decentralized ecosystems and compare to CeFi.
Second this. We at SubWallet and Dotinsights are seeking solutions to this too. I strongly support having a standardized set of metrics in terms of definition. Not sure if all the teams working on data can get together and discuss?
Hey Nate, thanks for kickstarting this important discussion; we from the Data Team in Parity think it’s a step in the right direction because we care very much about precision, consistency, transparency and eco-wide consensus when it comes to key ecosystem metrics.
We have come very far in building our own internal data capabilities and data platform in Parity and try to engage with and support as many data ecosystem teams as possible (both users and providers) - to exchange know-how and best practices both on engineering and analytic side of things. In doing this, we often came across ambiguous or unclear definitions and computation of key metrics (from basics such as Daily active users, Developer activity or TVL). To improve this situation, having a shared open space (Wiki or alike) to systematically publish, share and align information on data retrieval/transformation/computation is crucial - teams dealing with data will hopefully see more benefits in opening up their know-how than keeping their metric logic hidden because they can leverage on peer review of their work in order to improve it.
We encourage everyone in the data field to join the TG discussion where we can plan first steps in hopefully creating an ecosystem Data Guild/Alliance/Community consisting of representatives of all data teams taking part in using or serving Polkadot metrics.
Daily: The 24-hour timespan of a day that is >= 00:00:00 UTC and < 00:00:00 UTC of the following day
Weekly: The 7-day timespan of a week that is >= Monday 00:00:00 UTC and < Monday 00:00:00 UTC of the following week
All transactions in the given timeframe
Extrinsics that are performed in that timeframe
Inherents are excluded
All accounts that creating transactions in the given timeframe
Users that send at least 1 extrinsic in the given timeframe
Accounts that only perform validation/collation transactions and no other extrinsics do not fall under this definition
This does also not include accounts that just receive funds without performing extrinsics
Accounts/Users that are the target of a fungible/non-fungible token transfer or any other activity
This might be hard or impossible to measure, or hard to achieve consensus on in cases where a lot of accounts are targeted by actions in the system (e.g. migrating accounts, airdrops etc) and can vary in exact definition from chain to chain.
Accounts/Users (without adjectives)
The union of all active and passive Accounts/Users of the given timeframe
Would Active Addresses be synonymous with Active Users?
Active Users is a subset of Active Accounts correct?
As for the passive accounts/users, maybe it remains a work in progress for now and eventually becomes more defined (collecting thoughts here is at least a great start!). In the meantime we could still represent it as a metric like so: Passive Accounts = All Accounts - Active Accounts ?
Thank you for the invitation and mentioning Subscan. It was a great pleasure to participate in the discussion. This is an excellent topic! Defining a data standard is of immense significance to the data statistics and analysis of the entire Substrate ecosystem. Community participation and cooperation are particularly essential, especially for the definition of some “custom” data. We would love to join the discussion and apply it to Subscan.
Regarding the mentioned “active account” definition, this is also the data that users are highly concerned about. I am delighted to explain more here for your reference and discussion. At first, we only counted accounts that actively sent extrinsics. However, it is impossible to count all active accounts, such as EVM accounts, multisig accounts, proxied accounts, and others. We have received some feedback from users and partners, and the current statistical rules of active accounts have been expanded to include:
Accounts that sent the extrinsic
Accounts that sent the EVM transaction (if supported)
Actual execution accounts: proxied accounts, multisig accounts
Of course, expanding the scope of the discussion to more teams, especially data professionals, will make everything more standardized and meaningful. This is the organization we have been looking for, and we would love to join and discuss more!
As marketing people, “investors”, competing ecosystems, govts monitoring Web2, etc. use DAU and MAU vocabulary, I strongly believe that it’s important to retain the notion of “active” and not abandon it because it’s difficult to define otherwise. In particular, I think we should arrive at a definition that conforms to human intuition for “active” - something like “clicked the sign button on a wallet” with only one private key, or a bot doing so via equivalent code (which we can’t distinguish), and corresponds to signed extrinsics (or signed EVM Transactions). I don’t think it’s healthy to attempt to include accounts that haven’t passed the “is there a signature?” test as “active.”
Let’s consider four cases:
If P is a proxy account for X, and there is an extrinsic signed by P to transfer Assets from Account X to Account Y, only P is active, while both X and Y are passive.
If there is a 2-of-3 multisig account with S_1+S_2+S_3 controlling X, and there are two extrinsics from S_1 and S_2 to transfer Asset from Account X to Account Y, only S_1 and S_2 are active, while Y are passive and S_3 has not performed any action onchain to be considered active nor passive.
If there is an XCM transfer from an origination chain C1 of account S_4 with an amount A to a destination chain C2 with a beneficiary Y, only S_4 is active, while Y on destination chain is passive.
If there is remote execution from an origination chain C1 of account S_5 causing some Transact operation with a destination chain C2 with some keyless derivative account Y who does a transfer of some amount A to account Z, only S_5 is active, while Y and Z are passive.
The border between active and passive is guided by the simple “is there a signature?” test, where signing is the fundamental operation that drives our human intuition of “active”. I recommend refining (or coming to consensus) by explicitly considering these 4 cases, and adding additional cases to refine further. If we wanted to artificially inflate the number of accounts, we could consider X + Y + S_3 + Z as “active” accounts. However, this would be overzealous and potentially misleading, and it goes against human intuition. These accounts should remain “passive.” I understand the marketer within us may want to claim bigger numbers of active accounts, but honest analytics should not be driven by these insecurities.
The Polkadot ecosystem is led by people who consistently do the right thing and take the long-term view. We shouldn’t twist “active” based on any other psychology. I would like the industry to conduct cross-ecosystem analytics in a multichain objective way, with Polkadot doing the right thing. We should not suddenly go against human intuition unless CeFi entities (Coinbase, Binance, OpenSea, etc.) publicly start using “active” in this manner.
Therefore, we keep the “active” definition simple and adhering to human intuition:
Active Accounts: (Substrate)
Accounts that have signed an extrinsic
Passive Accounts: (Substrate)
Accounts that aren’t active but have any balance changes for any asset (native or non-native asset, within chain or cross-chain). This can include proxy and multisig (X, Y, S_3 in the above examples), remote execution triggered by other accounts, transfer recipients (including crowdloan and staking rewards distribution), and balance deductions (e.g., pre-authorized transferFrom/proxy called by other active accounts).
We have mechanized the above definitions with exact substrate-etl BigQuery here:
The 4 cases active/passive distinctions are covered perfectly with the above definition, but we imagine being able to refine both definitions precisely with open source transparent code as new forms of active/passive evolve.
The open source approach taken here is of transparency and reproducibility: with exact BigQuery computations, anyone can reproduce any datapoint and improve any definition with simple code adjustments to the query form.
The contributions by the folks on this thread and the telegram channel have been great. We created a proposal and just published it for feedback. Feel free to chime in on the discussion in polkassembly to help move this forward.