All metrics are imperfect, but many are useful. Let's make them more available

dataPhysicist · January 29, 2023, 10:44pm

Getting consistent, well defined, and thus useful metrics is challenging. I’d like to take steps to address this if the community agrees and encourages it.

Problem Statement

Blockchains are incredibly transparent but lack visibility in many areas. For example, anyone can view the activity of any address but gaining visibility into the total number of active address can be challenging. The core of the challenge is due to differences in defining what constitutes a metric. Some may define an active address as any address that has sent a signed an extrinsic, others include addresses that have received a transfer. Many other nuances exist across most metrics . This leads to differing definitions and thus different results for a metric with the same name. The problem is exacerbated when analyzing metrics across different chains and ecosystems.

Example: Moonscan, Subscan, and Polkaholic are great , but which one has the most accurate answer for Active Addresses from Jan 11th 2023 in the table below?

Moonscan	Subscan	Polkaholic
4,504	5,917	4,346

It’s a trick question, they are likely all correct but differ because the calculations are based on different formulas.

It would be valuable to the ecosystem to have place anyone can acquire data from various sources with detailed descriptions and any calculations/transformations it has undergone.

Proposed solution

Collaborate to publicly define metrics as they exist today.
Propose recommended (standardized) definitions and seek feedback from the community.
Publish an initial set of metrics and their definitions publicly.
Continually add and refine metrics as suggested by the community.

If there is interest, we will submit a treasury proposal to make it happen. We’d love to hear from you, chime in here. For example:

Would this be useful to you and your ecosystem?
What metrics would you like to see?
What parachains?
From what sources?

Note: Two link limit in the forum, Moonscan link here: moonscan .io/chart/active-address

sourabhniyogi · January 30, 2023, 5:27am

Having clearly defined metric definitions is super valuable to each chain’s community (devs, token holders, investors). We strongly believe its essential that parachains monitor these metrics on a regular basis to decide whether its ecosystem is healthy + growing, and is sustainable, and that Polkadot also monitor parachain health to keep [repeat] demand for slots+quality block space high. Communities need to look at metrics to know how to improve them, and you can’t improve a metric without having a clear idea of how its defined and produced.

We [Colorful Notion, devs of Polkaholic.io] aren’t sure how everyone else computes their metrics exactly, but I think we and everyone should welcome the effort of having a systematic effort to get metrics for the ecosystem defined and produced in the same way. Most of the first metrics we can think of can have a pretty simple computation, but once you start getting into say, TVL of relatively illiquid defi pools you can get pretty different answers. We’d love to come to consensus on how to compute every metric and generate the same results. It turns out, there are a lot of things that are missing (eg public archive nodes, bootnodes, etc.) for indexers beyond metric definition and I think surfacing these missing things will be critical.

That said, whatever these metrics are, computing them tend to output a leaderboard where Substrate parachains look like they are trying to compete to be #1 on these metrics. If we set aside the outright gaming such a leaderboard, I don’t think this is healthy because I don’t think parachains should compete with each other on these metrics: A bit of sibling rivalry between parachains is ok but its ultimately its a zero sum game for users within the Polkadot ecosystem.

Instead, I think we should be computing these metrics for other decentralized ecosystems. You can make the scope of this arbitrarily large but I would recommend the chains on defillama. If it is possible to put these metrics against whatever is available from CeFi (Binance, Coinbase, OpenSea, … ) then we can know if Defi is gaining ground on Cefi. We can compute all the metrics in the world but the expectation is that we be able to interpret them so as to know if anyone is making a dent with the incredible innovation that happens here.

dataPhysicist · January 30, 2023, 1:09pm

Well said. I think a good starting point is to work together to define a few of these metrics for some parachains, set a best practice, then start doing the same for other decentralized ecosystems and compare to CeFi.

djhatrang · February 1, 2023, 9:54am

Second this. We at SubWallet and Dotinsights are seeking solutions to this too. I strongly support having a standardized set of metrics in terms of definition. Not sure if all the teams working on data can get together and discuss?

dataPhysicist · February 1, 2023, 2:59pm

Some have suggested a telegram channel. I just created one to get things started. TG might not be the best long term. We will still use this forum for the main dialog.

pavla · February 2, 2023, 11:27am

Hey Nate, thanks for kickstarting this important discussion; we from the Data Team in Parity think it’s a step in the right direction because we care very much about precision, consistency, transparency and eco-wide consensus when it comes to key ecosystem metrics.

We have come very far in building our own internal data capabilities and data platform in Parity and try to engage with and support as many data ecosystem teams as possible (both users and providers) - to exchange know-how and best practices both on engineering and analytic side of things. In doing this, we often came across ambiguous or unclear definitions and computation of key metrics (from basics such as Daily active users, Developer activity or TVL). To improve this situation, having a shared open space (Wiki or alike) to systematically publish, share and align information on data retrieval/transformation/computation is crucial - teams dealing with data will hopefully see more benefits in opening up their know-how than keeping their metric logic hidden because they can leverage on peer review of their work in order to improve it.

We encourage everyone in the data field to join the TG discussion where we can plan first steps in hopefully creating an ecosystem Data Guild/Alliance/Community consisting of representatives of all data teams taking part in using or serving Polkadot metrics.

– @Karim & Pavla

alice_und_bob · February 2, 2023, 4:12pm

I am giving it a first try:

Definitions

Transactions: Any type of transaction as defined by Substrate in the Substrate Docs. This includes signed, unsigned and inherent transactions
Extrinsic: A transaction originating from outside the chain. This includes signed and unsigned transactions, but excludes inherents
Account: as defined in the Substrate Docs.
Block Producers: An account that creates blocks
User: An account that is interacting with the chain by sending an extrinsic
Time
- Time Definitions follow ISO 8601
- Daily: The 24-hour timespan of a day that is >= 00:00:00 UTC and < 00:00:00 UTC of the following day
- Weekly: The 7-day timespan of a week that is >= Monday 00:00:00 UTC and < Monday 00:00:00 UTC of the following week

Metrics

Transactions
- All transactions in the given timeframe
Extrinsics
- Extrinsics that are performed in that timeframe
- Inherents are excluded
Active Accounts
- All accounts that creating transactions in the given timeframe
Active Users
- Users that send at least 1 extrinsic in the given timeframe
- Accounts that only perform validation/collation transactions and no other extrinsics do not fall under this definition
- This does also not include accounts that just receive funds without performing extrinsics
Passive Accounts/Users
- Accounts/Users that are the target of a fungible/non-fungible token transfer or any other activity
- This might be hard or impossible to measure, or hard to achieve consensus on in cases where a lot of accounts are targeted by actions in the system (e.g. migrating accounts, airdrops etc) and can vary in exact definition from chain to chain.
Accounts/Users (without adjectives)
- The union of all active and passive Accounts/Users of the given timeframe

ArbiterOfData · February 2, 2023, 5:51pm

I like your breakdown. Couple questions:

Would Active Addresses be synonymous with Active Users?
Active Users is a subset of Active Accounts correct?
As for the passive accounts/users, maybe it remains a work in progress for now and eventually becomes more defined (collecting thoughts here is at least a great start!). In the meantime we could still represent it as a metric like so: Passive Accounts = All Accounts - Active Accounts ?

hao · February 3, 2023, 4:18am

hi all,
This is Hao from Web3Go team. We are working on data analysis for Polkadot ecosystem.

please take a look at what we have done for Moonbeam:
https://app.web3go.xyz/#/MoonbeamPublicDashboard

Happy to join!

yakio · February 17, 2023, 9:28am

Hi @dataPhysicist @djhatrang and everyone here,

Thank you for the invitation and mentioning Subscan. It was a great pleasure to participate in the discussion. This is an excellent topic! Defining a data standard is of immense significance to the data statistics and analysis of the entire Substrate ecosystem. Community participation and cooperation are particularly essential, especially for the definition of some “custom” data. We would love to join the discussion and apply it to Subscan.

Regarding the mentioned “active account” definition, this is also the data that users are highly concerned about. I am delighted to explain more here for your reference and discussion. At first, we only counted accounts that actively sent extrinsics. However, it is impossible to count all active accounts, such as EVM accounts, multisig accounts, proxied accounts, and others. We have received some feedback from users and partners, and the current statistical rules of active accounts have been expanded to include:

Accounts that sent the extrinsic
Accounts that sent the EVM transaction (if supported)
Actual execution accounts: proxied accounts, multisig accounts
Accounts with passive balance changes: receive transfers (native token, asset, custom token, ERC-20 token)

Of course, expanding the scope of the discussion to more teams, especially data professionals, will make everything more standardized and meaningful. This is the organization we have been looking for, and we would love to join and discuss more!

alice_und_bob · February 17, 2023, 11:43am

This thing about passive balance changes counting as active accounts is that accounts can be recipients of several kinds of changes, e.g. state migrations, airdrops, payouts etc…

since it is passive, I wonder if it really should be counted as active action. This is why I suggested “Accounts” to summarize all those concepts without using the adjective active.

sourabhniyogi · February 18, 2023, 2:39pm

As marketing people, “investors”, competing ecosystems, govts monitoring Web2, etc. use DAU and MAU vocabulary, I strongly believe that it’s important to retain the notion of “active” and not abandon it because it’s difficult to define otherwise. In particular, I think we should arrive at a definition that conforms to human intuition for “active” - something like “clicked the sign button on a wallet” with only one private key, or a bot doing so via equivalent code (which we can’t distinguish), and corresponds to signed extrinsics (or signed EVM Transactions). I don’t think it’s healthy to attempt to include accounts that haven’t passed the “is there a signature?” test as “active.”

Let’s consider four cases:

If P is a proxy account for X, and there is an extrinsic signed by P to transfer Assets from Account X to Account Y, only P is active, while both X and Y are passive.
If there is a 2-of-3 multisig account with S_1+S_2+S_3 controlling X, and there are two extrinsics from S_1 and S_2 to transfer Asset from Account X to Account Y, only S_1 and S_2 are active, while Y are passive and S_3 has not performed any action onchain to be considered active nor passive.
If there is an XCM transfer from an origination chain C1 of account S_4 with an amount A to a destination chain C2 with a beneficiary Y, only S_4 is active, while Y on destination chain is passive.
If there is remote execution from an origination chain C1 of account S_5 causing some Transact operation with a destination chain C2 with some keyless derivative account Y who does a transfer of some amount A to account Z, only S_5 is active, while Y and Z are passive.

The border between active and passive is guided by the simple “is there a signature?” test, where signing is the fundamental operation that drives our human intuition of “active”. I recommend refining (or coming to consensus) by explicitly considering these 4 cases, and adding additional cases to refine further. If we wanted to artificially inflate the number of accounts, we could consider X + Y + S_3 + Z as “active” accounts. However, this would be overzealous and potentially misleading, and it goes against human intuition. These accounts should remain “passive.” I understand the marketer within us may want to claim bigger numbers of active accounts, but honest analytics should not be driven by these insecurities.

The Polkadot ecosystem is led by people who consistently do the right thing and take the long-term view. We shouldn’t twist “active” based on any other psychology. I would like the industry to conduct cross-ecosystem analytics in a multichain objective way, with Polkadot doing the right thing. We should not suddenly go against human intuition unless CeFi entities (Coinbase, Binance, OpenSea, etc.) publicly start using “active” in this manner.

Therefore, we keep the “active” definition simple and adhering to human intuition:

Active Accounts: (Substrate)
Accounts that have signed an extrinsic

Passive Accounts: (Substrate)
Accounts that aren’t active but have any balance changes for any asset (native or non-native asset, within chain or cross-chain). This can include proxy and multisig (X, Y, S_3 in the above examples), remote execution triggered by other accounts, transfer recipients (including crowdloan and staking rewards distribution), and balance deductions (e.g., pre-authorized transferFrom/proxy called by other active accounts).

We have mechanized the above definitions with exact substrate-etl BigQuery here:

github.com

colorfulnotion/substrate-etl/blob/ff793176f5c49a39ed5e760409fe0b16a497e65a/DEFINITIONS.md

# Substrate-etl Definitions

_Note: These are tentative, and may be revised based on community feedback._

To support precise transparent definitions of all data summarized within substrate-etl report summaries (and used in polkaholic.io)
(see [All metrics are imperfect, but many are useful. Let’s make them more available](https://forum.polkadot.network/t/all-metrics-are-imperfect-but-many-are-useful-lets-make-them-more-available/1858/4)),
we attempt to define terms used in this repo both in English and through BigQuery on the substrate-etl datasets.

The open source approach taken here is of _transparency_ and _reproducibility_: with exact BigQuery computations, anyone can reproduce any datapoint and improve any definition with adjustments to query form.  

## Account Metrics (Substrate)

* _Active Accounts_ (Substrate): Accounts that have signed an extrinsic on a Substrate chain 
* _System Accounts_ (Substrate): Accounts that have participated in consensus and produced a block
* _Passive Accounts_ (Substrate): Accounts that aren't active but have any balance changes for any asset (native or non-native asset, within chain or cross-chain). This can include proxy and multisig accounts, remote execution triggered by other accounts, transfer recipients (including crowdloan and staking rewards distribution), and balance deductions (e.g., pre-authorized transferFrom/proxy called by other active accounts).

The above definitions are mechanized in `substrate-etl` BigQuery below.  The following computes _Active Accounts_, _System Accounts_ and _Passive Accounts_ for the Kusama relay chain for February 1, 2023 using the following public data:

* `substrate-etl.kusama.extrinsics0`
* `substrate-etl.kusama.blocks0`

This file has been truncated. show original

The 4 cases active/passive distinctions are covered perfectly with the above definition, but we imagine being able to refine both definitions precisely with open source transparent code as new forms of active/passive evolve.

The open source approach taken here is of transparency and reproducibility: with exact BigQuery computations, anyone can reproduce any datapoint and improve any definition with simple code adjustments to the query form.

dataPhysicist · March 5, 2023, 8:19pm

The contributions by the folks on this thread and the telegram channel have been great. We created a proposal and just published it for feedback. Feel free to chime in on the discussion in polkassembly to help move this forward.

dataPhysicist · April 24, 2023, 11:58am

The proposed solution has been implemented. The goal is to get feedback and measure engagement, check it out at web3metrics.com.

Topic		Replies	Views
Adoption, metrics, treasury spending and ROI - open discussion thread	12	1558	March 3, 2023
Treasury spending - teams reporting user numbers, active addresses and useful metrics Governance treasury , kusama , metrics	13	878	July 26, 2023
PolkaPulse: DeFi Dashboard for Polkadot Parachains Ecosystem treasury	0	121	January 30, 2025
Select * from polkadot; Tech Talk	9	2665	February 15, 2024
Kusama Analytics Ecosystem ecosystem-grants	0	127	April 16, 2024

All metrics are imperfect, but many are useful. Let's make them more available

Problem Statement

Proposed solution

Definitions

Metrics

Related topics