I’ve been exploring Polkadot telemetry for a few years now. While I’ve made some progress, I’ve also hit roadblocks, which led me to pause the project. However, I’m giving it another shot and would love your guidance!
In Polkadot telemetry, there’s a map showing the locations of active validators, built as a TypeScript application. I’ve tried reverse engineering this map to understand how the data is retrieved and displayed, but so far, I haven’t had much luck.
Using the WebSocket feed (wss://feed.telemetry.polkadot.io/feed) provides some network data, but it doesn’t seem to include detailed validator telemetry information.
I’m wondering:
Is there another WebSocket feed (or a similar endpoint) that streams validator telemetry data, including geolocation details like city and latitude/longitude?
Are there tools or methods I could use to scrape or interact with the telemetry backend to extract validator-specific information (e.g., geolocations)?
I am also running my own telemetry server instance. Is it possible to modify this instance to create a custom WebSocket feed that includes the information I need? Specifically, I’d like a telemetry WebSocket that provides validator details, including city and latitude/longitude data.
My goal is to start logging validator data across Polkadot and other Substrate networks. With this data, I’d like to analyze the geographic distribution of validators and build a dashboard to visualize how decentralized the network is geographically.
Assuming this goes well, I’m also curious if it might be possible to create a mechanism to influence validator nominations to favor geographic decentralization. Does this sound like a crazy idea?
Any assistance with accessing telemetry data, modifying my own instance, or alternative approaches would be greatly appreciated. I’m excited to contribute insights into the decentralization of the network!
Thanks in advance for your help, and I look forward to hearing your thoughts.
You just touched on a topic that’s very dear to me and the whole Data team at Parity, as we collect this data mainly for debugging/statistical reasons.
It is relatively straightforward to setup as the team did a great effort with documentation. From the Readme: “It queries the GeoLite2 database from Maxmind to extract country information about the IP addresses and UdgerDB to detect datacenters.” so it should give you the data that you require.
This is a huge help, I am playing with the DHT tool and it seems like I might be able to get this all working. Thanks so much! I might have more questions so stay tuned!
Dennis from ProbeLab here. To me this sounds like a valuable project and I believe that we have the data that you are looking for. Feel free to reach out via DM or email at team@probelab.io and we could set up a call to discuss this in more detail!
That’s a lot of steps to reach the expected conclusion that validators run nodes where they want, and can’t be influenced to complicate their lives unless they get rewarded more to relocate node to a location in which they don’t want to run because it’s more remote, slower, costlier, etc.
But I’m curious how it turns out, so it’d be great if you post your findings 2-3 months from now!
Another comment - just noticed “other Substrate networks” (missed that part before, thought you wanted to do Polkadot alone) - xx Network stores this info on-chain and ups the rewards of “remote” (it’s a funny word because it’s not the right term, but I like it because it’s intuitive) nodes.
On xx Network this is largely a solved problem: the xx Foundation occasionally updates these incentive factors when they get out of whack (e.g. someone adds a bunch of nodes in Paraguay, after a while someone highlights that, then incentives get recalculated from on-chain stats and South Am multiplier gets dialed down).
So it’s not completely decentralized and automatic, but I think it works well enough and it’s better to have humans in the loop to prevent gaming, malicious moves or simply bugs. Successfully in production for 3 years.
As an example (and FYI), ANZ-based nodes get around 30% more, and Central Europe 10% less (lots of nodes there) than the “default” . These are significant because xx Network works differently to (relatively) simpler chain-only validators (what I mean by this is our main service is a mixnet, which is the service from the bulk of rewards come from, and it’s sensitive to latency).