Exploring alternatives for Substrate Telemetry

We rely on the centralized substrate-telemetry to collect the telemetry of the chain. This provides real-time data but comes with associated costs and maintenance.

One alternative for collecting telemetry is a fully decentralized p2p application.

p2p network discovery

In the subp2p-explorer experiment, we are able to crawl the network of any substrate-based chain. After crawling the network for a few minutes, we have enough details to describe a snapshot of the p2p network, including geolocation of peers.

Peer 12D3KooWAdHQjjtvXvkMWMKZYdrnGWG7PQ2Fy4wmUPQEXh9hvcic: Location
{ city: "Ashburn", accuracy_radius: Some(1000), latitude: Some(39.0469), longitude: Some(-77.4903), metro_code: Some(511), time_zone: Some("America/New_York") }

The tool is also capable of:

  • crawling the p2p network and collect identify protocol information
  • validate the connectivity of bootnodes from a chainSpec
  • submit extrinsics on the p2p network directly

In practice, a small application could submit multiple Kademlia queries to obtain a network snapshot.

We currently miss on the p2p network a protocol dedicated for telemetry.

/genesis/telemetry

Introducing a new p2p notification protocol, that is enabled by default on all substrate based-chains, and disabled when the node is started with -no-telemetry

The protocol is using the /genesis/telemetry/1 id string, and exposes telemetry data including node version, kernel version, OS, cpu etc (Polkadot Telemetry).

Extra block information could be obtained by the decoding the handshake of /genesis/block-announces/1 (BlockAnnouncesHandshake) that contains the best block and the genesis hash.

Light Client

This service could run entirely in the browser, compiling any rust code that collects the data to WASM. One option for doing this is using smoldot, which is already used from within browser environments.

Authority Data

The authority data can be successfully extracted via the AuthorityDiscoveryApi runtime API.

UI

The substrate-telemetry UI could be reused for this decentralized application.

Would love to hear your thoughts on this :pray:
(@tomaka @bkchr @altonen @jsdw)

4 Likes

I like the proposal but if there is a new protocol introduced, it will probably have to go through the RFC process.

I don’t know to what extent the information from substrate-telemetry is used but I’m not sure if this can be a full replacement. AFAIU this tool is limited to nodes it can dial successfully and there are plenty of undialable nodes in our network so telemetry from them wouldn’t be collected (unless they connect to this telemetry node). With substrate-telemetry even these unreachable nodes are able to submit their telemetry information to the server, yes? Is there a plan to work around this?

As far as I’m aware of, the telemetry is used for the W3F 1000 validators program, in order to make sure that validators are online and running.
However, being accessible from the public Internet is a requirement to being a conforming validator, so this can be a replacement. It would even be beneficial, as I imagine that accidentally not being accessible from the Internet is not an uncommon issue for validators.

Apart from this, I imagine that some node operators that run multiple nodes might use the telemetry in order to take a look at their nodes and check whether everything is working okay. But they really should be using Prometheus for that, and not the telemetry.

Hi everyone,

by chance, I just came across this thread and wanted to chime in. I want to bring to your attention a tool that we (ProbeLab formerly with Protocol Labs) have built:

The tool has support for Polkadot/Kusama/Westend for quite some time and we have been running a DHT crawl every 30m for the last 12 months for all three networks. So, we are sitting on quite a lot of data but just haven’t had the time to analyze it.

The tool is similar to the linked subp2p-explorer but instead of doing queries for random keys it structurally enumerates all peers in the DHT starting from a configurable set of bootstrap peers. A full Polkadot network crawl takes around ~4 minutes.

We gather the same information as in that “Experiment: P2P Network Visualization” and more (except the authority API parts - yet). (the visualization is great btw)

Nebula powers IPFS weekly reports like this recent one (discourse won’t let me post more links - see the end of this post):

github /plprobelab/network-measurements/blob/master/reports/2024/calendar-week-02/ipfs/README.md

We also produced one for Polkadot:

github /plprobelab/network-measurements/blob/master/reports/2023/calendar-week-41/polkadot/README.md

Which we discussed a few weeks back in this topic: /t/proposal-polkadot-p2p-network-health-weekly-reports/5094/6

For the IPFS network, we also discussed a /genesis/telemetry protocol a couple of times in the past and personally I’m totally in favour of something like this. Probably opt-in, but if it can be combined with some differential privacy concepts maybe even opt-out.


The ProbeLab team is keen to tailor Nebula more towards the Polkadot ecosystem. This could include Polkadot-specific logic like employing the AuthorityDiscoveryApi or trying to gather more information from, e.g., handshake data of another protocol.

Further, we could build an API to access the data and/or offer more sophisticated visualizations (the weekly reports were only an interim solution).

For that, we’re seeking funding and would like to submit a grant application to polkassembly.io. It would be great to get feedback on what other data the community would be interested in that can be extracted from the DHT/p2p layer to make the grant application as relevant as possible.


PS: This Discourse instance wouldn’t let me post more links that’s why I didn’t format them as such. I always got “An error occurred: Sorry you cannot post a link to that host.” which is super weird as I was only linking to GitHub or this forum. That’s why the links aren’t proper links.

3 Likes