Polkadot Provider API: a common interface for building decentralized applications

josep · October 2, 2023, 1:12am

Everyone!

Following my initial input in the Developer Experience must be our #1 Priority discussion, I’m here to elaborate on a significant advancement that I believe is crucial for the enhancement and expansion of our ecosystem: the Polkadot Provider API.

Why Do We Need the Polkadot Provider API?

Our ecosystem has a gap. We’re missing a standardized, simple convention to address some cross-cutting concerns for dApps, such as:

Finding available chains for dApp connection.
Proposing new chains for dApp connectivity.
Seamlessly connecting to a chosen chain via a minimal interface with the new JSON-RPC API.
Identifying which accounts are accessible for the connected chain(s).
Simplifying transaction creation by only requiring the transaction method and arguments, with the Provider taking care of the complexities (signed-extensions with their respective data, signing the transaction, checking if the transaction would raise errors, etc, etc).

The Impact and Opportunities of This API:

The introduction of this minimal and library-agnostic convention will pave the way for numerous interoperable opportunities:

Creation of PolkadotJs Alternatives: Empowering library authors to develop alternatives to PolkadotJs.
Wallet Flexibility: Enabling wallets to implement the Polkadot Provider API, freeing them from tight coupling to a PolkadotJs specific API.
Enhancer Development: Facilitating the creation of enhancers to handle transaction complexities, simplifying integration with Ledger, WalletConnect, etc.
Native App Support: Allowing the development of native Android/iOS apps with webviews to load compliant dApps.
Progressive Web App Providers: Enabling the creation of Polkadot Providers for Progressive Web Apps.
Seamless Wallet Integration: Allowing the smooth integration of favorite dApps into wallet interfaces.
And much more!

Current Limitations and the opportunity ahead of us:

Our ecosystem is somewhat limited by PolkadotJs (PJS), because it necessitates all wallets to adapt to a less-than-ideal API for creating transactions. This limitation hinders the seamless communication and interoperability, creating a roadblock for expansive and inclusive dApp and wallet development.

That being said, I have immense gratitude and respect towards PJS and its creator and maintainer, Jaco. Although, its current design limitations may seem evident in hindsight, it could not have been foreseen when PJS was initially developed.

Nevertheless, we’ve reached a pivotal point in the creation of building truly decentralized applications. Mass light client adoption can only be achieved by building the proper tooling that can be integrated with new light-client friendly JSON-RPC spec; something that PolkadotJs is unable to do. As a result, it’s paramount that we center our efforts around a streamlined, unified abstraction that address the earlier mentioned cross-cutting concerns, hence the introduction of the Polkadot Provider API.

The Polkadot Provider API will stand as a beacon of growth, inclusion, and seamless operations for our ecosystem, providing a future of enhanced interoperability, flexibility, and overall development.

Polkadot Provider API proposal:

For clarity, here are the TypeScript definitions (with some comments) encapsulating the envisioned API:

type Callback<T> = (value: T) => void
type UnsubscribeFn = () => void

interface PolkadotProvider {
  // Retrieves the current list of available Chains
  // that the dApp can connect to
  getChains: () => Promise<Chains>

  // Registers a callback invoked when the list
  // of available chains changes
  onChainsChange: (chains: Callback<Chains>) => UnsubscribeFn

  // Allows the dApp to request the Provider to register a Chain
  addChain: (chainspec: string) => Promise<Chain>
}

// The key is the genesis hash of the chain
type Chains = Record<string, Chain>

interface Chain {
  genesisHash: string
  name: string
  connect: (
    // the listener callback that the JsonRpcProvider
    // will be sending messages to
    onMessage: Callback<string>,
    // the listener that will be notified when the connectivity changes
    onStatusChange: Callback<ProviderStatus>
  ) => Promise<ChainProvider>
}

type ProviderStatus = "connected" | "disconnected"

interface ChainProvider {
  // it pulls the current list of available accounts for this Chain
  getAccounts: () => Accounts

  // registers a callback that will be invoked whenever the list
  // of available accounts for this chain has changed
  onAccountsChange: (accounts: Callback<Accounts>) => UnsubscribeFn

  // contains a JSON RPC Provider that it's compliant with new
  // JSON-RPC API spec:
  // https://paritytech.github.io/json-rpc-interface-spec/api.html
  provider: JsonRpcProvider
}

type Accounts = Array<Account>

interface Account {
  // SS58Formated public key
  publicKey: string
  
  // The provider may have captured a display name
  displayName?: string

  // `callData` is the scale encoded call-data
  // (module index, call index and args)
  createTx: (callData: Uint8Array) => Promise<Uint8Array>
}

interface JsonRpcProvider {
  // it sends messages to the JSON RPC Server
  send: Callback<string>

  // it disconnects from the JSON RPC Server and it de-registers
  // the `onMessage` and `onStatusChange` callbacks that
  // were previously registered
  disconnect: UnsubscribeFn
}

These definitions articulate the framework and essential structure we need for this proposed direction.

Addressing Potential Questions

You might logically wonder about the practicality of creating a new library based on an interface with no current implementations. Also, will the new libraries built around this Polkadot Provider not communicate with existing PolkadotJs wallets?

Incorporating Existing Tools

Something exceptional about the Polkadot Provider API is its minimalism, allowing the integration of existing tools seamlessly. For example, a library could be (and will be) developed to expose a function taking a PolkadotJs InjectedExtension as an argument. The library would then perform the necessary wiring using substrate-connect behind the scenes to produce a compliant Polkadot Provider API.

Even without extensions, a Polkadot Provider can still be established internally using substrate-connect, allowing dApps to establish necessary connections, despite always returning an empty list of Accounts for all Chains.

Expanding to Various Environments

The versatility of the Polkadot Provider API extends to creating providers for node.js or any other environment, demonstrating its wide-reaching applicability.

Our Efforts at Parity

What are we doing at Parity to crystallize this vision?

We are in the process of developing a composable, modular, and “light-client first” alternative to PJS, fundamentally built on the Polkadot Provider API.

In addition, my team is dedicated to empowering the community by offering the essential building blocks for creating these kinds of Providers. We are actively developing two different libraries, aiming for a swift release to expedite the enhancement of the Polkadot ecosystem.

I would like to share with the community the APIs of these 2 different libraries:

`@polkadot-api/light-client-extension-helpers`

This library is meant to be used from within extensions and it basically encapsulates all the challenging problems that have been solved over the years from the @substrate/connect extension. The library will expose four primary components:

backgroundHelper: which will be a function that must be imported and registered into the background of the extension.
contentScriptHelper: a function that will handle the communication between the tab’s webpage and the extension’s background script.
webPageHelper: an interface meant to be used from the tab’s webpage.
extensionPagesHelper: a group of functions that are meant to be used from within the extension’s page.

Let’s dig deeper into the responsibilities and the APIs of these different components:

backgroundHelper

The core logic that will be running in the background. Internally, it will register an instance of smoldot, keep the relevant-chains persisted into storage, keep the sync snapshots of those chains up to date, manage the active connections and bootnodes, etc. All this while communicating with the content-script and the rest of the APIs from the extension.

It’s API is fairly straight-forward:

export type BackgroundHelper = (
  // A callback invoked when a dApp developer tries to add a new Chain.
  // The returned promise either rejects if the user denies or resolves if the user agrees.
  onAddChainByUser: (input: InputChain, tabId: number) => Promise<void>
) => void

export interface InputChain {
  genesisHash: string,
  name: string,
  chainspec: string
}

contentScriptHelper

It will be a void function that will handle the communication between the tab’s webpage and the extension’s background script.

webPageHelper

When imported from within the web-page it will expose the following interface:

import type { ProviderStatus } from "@polkadot-api/json-rpc-provider"

type Callback<T> = (value: T) => void
type UnsubscribeFn = () => void

export interface LightClientProvider {
  // Allows dApp developers to request the provider to register their chain
  addChain: (chainspec: string) => Promise<RawChain>

  // Retrieves the current list of available Chains
  getChains: () => Promise<Record<string, RawChain>>

  // Registers a callback invoked when the list of available chains changes
  onChainsChange: (chains: Callback<RawChains>) => UnsubscribeFn
}

// The key is the genesis hash
type RawChains = Record<string, RawChain>

export interface RawChain {
  genesisHash: string
  
  name: string

  connect: (
    // the listener callback that the JsonRpcProvider will be sending messages to.
    onMessage: Callback<string>,
    // the listener that will be notified when the connection is lost/restablised
    onStatusChange: Callback<ProviderStatus>
  ) => Promise<JsonRpcProvider>
}

export interface JsonRpcProvider {
  // it sends messages to the JSON RPC Server
  send: Callback<string>

  // it disconnects from the JSON RPC Server and it de-registers
  // the `onMessage` and `onStatusChange` callbacks that was previously registered
  disconnect: UnsubscribeFn
}

extensionPagesHelper

An interface with a set of functions to allow the extension page to manage the persisted chains, their connections and boot-nodes. Its API is as follows:

import type { GetProvider } from "@polkadot-api/json-rpc-provider"

export interface LightClientPagesHelper {
  deleteChain: (genesisHash: string) => Promise<void>
  persistChain: (chainspec: string) => Promise<void>
  getChain: (genesisHash: string) => Promise<PagesChain>
  getActiveConnections: () => Promise<
    Array<{tabId: number, genesisHash: string}>
  >
  disconnect: (tabId: number, genesisHash: string) => Promise<void>
  setBootNodes: (genesisHash: string, bootNodes: Array<string>) => Promise<void>
}

export interface PagesChain {
  genesisHash: string,
	name: string
  ss58format: number
  nPeers: number
  bootNodes: Array<string>
  provider: GetProvider
}

`@polkadot-api/tx-helper`

A library, enabling consumers to integrate their custom UIs. This approach facilitates the safe signing of transaction data by users, capturing all pertinent information from signed-extensions.

Notably, whereas @polkaddot-api/light-client-extension-helpers is designed exclusively for browser extensions, @polkadot-api/tx-helper exhibits a broader utility, apt for different contexts such as CLI-based programs.

It’s public API:

import type { GetProvider } from "@polkadot-api/json-rpc-provider"

export type GetTxCreator = (
  // The `TransactionCreator` communicates with the Chain to obtain metadata, latest block, nonce, etc.
  chainProvider: GetProvider,

  // This callback is invoked in order to capture the necessary user input
  // for creating the transaction.
  onCreateTx: <UserSignedExtensionsName extends Array<UserSignedExtensionName>>(
    // The sender of the transaction
    from: MultiAddress,

    // The scale encoded call-data (module index, call index and args)
    callData: Uint8Array,

    // The list of signed extensions that require from user input.
    // The user interface should know how to adapt to the possibility that
    // different chains may require a different set of these.
    userSingedExtensionsName: UserSignedExtensionsName,

    // The user interface may need the metadata for many different reasons:
    // knowing how to decode and present the arguments of the call to the user,
    // validate the user iput, later decode the singer data, etc, etc.
    metadata: Uint8Array,

    // An Array containing a list of the signed extensions which are unkown
    // to the library and that require for a value on the "extra" field
    // and/or additionally signed data. This will give the consumer the opportunity
    // to provide that data via the `overrides` field of the callback. 
    unknownSignedExtensions: Array<string>,

    // The function to call once the user has decided to cancel or proceed with the tx.
    // Passing `null` indicates that the user has decided not to sing the tx.
    callback: Callback<null | {
      // A tuple with the user-data corresponding to the provided `userSignedExtensionsName`
      userSignedExtensionsData: UserSignedExtensionsData<UserSignedExtensionsName>

      // in case that the consumer wants to add some signed-extension overrides
      overrides: Record<string, {value: Uint8Array, additionalSigned: Uint8Array}>,

      // The type of the signature method
      signingType: SigningType

      // The function that will be called in order to finally sign the signature payload.
      // It is the responsibility of the consumer of the library (who is providing this callback),
      // to decode the provided input and ensure that the data that it's been requested to sign
      // matches with the previously provided `callData` and with the user provided
      // `userSignedExtensions`. If there are any inconsistencies, then the returned Promise
      // should reject. Otherwise, it should resolve with the signed data.
      signer: (input: Uint8Array) => Promise<Uint8Array>
    }>,
  ) => void,
) => {
  createTx: CreateTx,
  
  // it disconnects the Provider and kills this instance
  destroy: () => void
}

export type CreateTx = (
  from: MultiAddress,
  callData: Uint8Array,
) => Promise<Uint8Array>

export type SigningType = "Ed25519" | "Sr25519" | "Ecdsa"
export type AddresType = "Index" | "Raw" | "Address32" | "Addres20"

export interface MultiAddress {
  addressType: AddresType
  publicKey: Uint8Array
}

type SignedExtension<Name extends string, InputType> = {
  name: Name
  data: InputType
}

// This list is still a WIP
export type UserSignedExtensions =
  | SignedExtension<"mortality", boolean>
  | SignedExtension<"tip", bigint>
  | SignedExtension<"assetTip", { tip: bigint, assetId?: bigint | undefined }>

export type UserSignedExtensionName = UserSignedExtensions["name"]
export type UserSignedExtensionsData<T extends Array<UserSignedExtensionName>> =
  {
    [K in keyof T]: (UserSignedExtensions & { name: T[K] })["data"]
  }

In Closing…

Thank you for taking the time to journey through this comprehensive post. Your attention and dedication to understanding the nuances and developments of the Polkadot Provider API are sincerely appreciated. As we work towards refining and improving these proposals, your feedback is invaluable. Please share your thoughts, suggestions, and any questions you might have. Together, we can shape the future of a brighter ecosystem.

tomaka · October 16, 2023, 12:46pm

I’m somewhat skeptical of the interface that you suggest.

Here is what I’ve noted:

This is a detail, but send: Callback<string> instead of send: (data: string) => void and disconnect: UnsubscribeFn instead of disconnect: () => void seem to be here just to intentionally hurt readability. Unless I have a very deep misunderstanding of the interface, send isn’t a callback and disconnect isn’t an unsubscription function. Same for onMessage and onStateChange.
You can’t identify a chain just by its genesis hash. It’s possible for multiple different chains to use the same genesis hash. Two identical chain specifications always describe the same chain, but apart from this it’s in practice not possible to have a “key” that uniquely identifies a chain (in theory it is, but doing it is pretty complicated).
Why not merge ChainProvider and JsonRpcProvider into one? You seem to split them for just for the sake of splitting them.
If the extension user removes an account then adds it back, does the Account corresponding object get invalidated? This is really unclear. You can solve this by moving createTx from Account to ChainProvider, and indicate which account to use as an extra parameter. This will turn Account into just a data structure and thus remove all ambiguity.
I don’t understand the idea of “suggesting a chain”. Please keep in mind that dApps can be malicious. The extension also needs to protect itself from malicious dApps, and in case they’re sharing data dApps need to protect themselves from other dApps. I’m very skeptical of dApps sharing chains, for security reasons. It’s for this reason why substrate-connect bakes in Polkadot, Kusama, Rococo and Westend. Any other chain is shared or not purely at the discretion of smoldot, which is the only one capable of truly determining whether two chains are actually identical.
What does the ProviderStatus mean? If this interface is light-client-first, then you can never be disconnected from a light client. The reason why PolkadotJS thinks that it’s not connected to the light client is because of a hack we have intentionally introduced in order to avoid PolkadotJS requesting older blocks.
You should really specify whether promises can yield errors, and if yes in which conditions.
I think that it wouldn’t be a bad idea to obtain the list of accounts, be notified when the list of accounts changes, etc. through JSON-RPC calls rather than actual JavaScript functions. While in theory it’s the same, doing it through JSON-RPC calls eliminates tons of corner cases (or, more precisely, makes it clear what happens in these corner cases, whereas right now it’s ambiguous).
I would really suggest to try remove all promises as much as possible, as each Promise increases complexity (due to the possibility of race conditions) by a factor of magnitude. I can’t really give suggestions, because it would require answers to my previous remarks.

josep · October 16, 2023, 4:20pm

Hi @tomaka,

Thank you for reviewing the draft. I appreciate your feedback.

I intended to update the interfaces I mentioned earlier, but I seem to have lost editing access . Nevertheless, we’ve tested the interface and identified some improvements. Below is the refined interface proposal:

type Callback<T> = (value: T) => void
type UnsubscribeFn = () => void

interface PolkadotProvider {
  // Retrieves the current list of available Chains
  // that the dApp can connect to
  getChains: () => Chains

  // Registers a callback invoked when the list
  // of available chains changes
  onChainsChange: (chains: Callback<Chains>) => UnsubscribeFn

  // Allows the dApp to request the Provider to register a Chain
  addChain: (chainspec: string) => Promise<Chain>
}

// The key is the genesis-hash of the chain
type Chains = Record<string, Chain>

interface Chain {
  genesisHash: string
  name: string

  // it pulls the current list of available accounts for this Chain
  getAccounts: () => Array<Account>

  // registers a callback that will be invoked whenever the list
  // of available accounts for this chain has changed
  onAccountsChange: (accounts: Callback<Array<Account>>) => UnsubscribeFn

  // returns a JSON RPC Provider that it's compliant with new
  // JSON-RPC API spec:
  // https://paritytech.github.io/json-rpc-interface-spec/api.html
  connect: (
    // the listener callback that the JsonRpcProvider
    // will be sending messages to
    onMessage: Callback<string>,

    // the listener that will be notified when the connectivity changes
    onStatusChange: Callback<ProviderStatus>,
  ) => Promise<JsonRpcProvider>
}

type ProviderStatus = "connected" | "disconnected"

interface JsonRpcProvider {
  // it sends messages to the JSON RPC Server
  send: (message: string) => void

  // `publicKey` is the SS58Formated public key
  // `callData` is the scale encoded call-data
  // (module index, call index and args)
  createTx: (publicKey: string, callData: Uint8Array) => Promise<Uint8Array>

  // it disconnects from the JSON RPC Server and it de-registers
  // the `onMessage` and `onStatusChange` callbacks that
  // were previously registered
  disconnect: UnsubscribeFn
}

interface Account {
  // SS58Formated public key
  publicKey: string

  // The provider may have captured a display name
  displayName?: string
}

Let me address your points:

On the use of Callback and UnsubscribeFn types:
- You’re right regarding send. It indeed isn’t a callback function, and this oversight has been rectified in the latest review.
- For onMessage and onStateChange: They are genuine callback functions. They’re designed for the consumer to supply, allowing the producer to send data back, aligning with the classic callback definition.
- As for disconnect: The intention behind naming it an “unsubscription function” is to reflect its purpose, which is to de-register the onMessage and onStatusChange callbacks that were previously set up.
Chain Identification with Genesis Hash:
- I must admit, this was a revelation! When seeking advice from some seasoned experts at Parity on how best to identify a chain (similar to Ethereum’s chain_id and its use by the Ethereum Provider as specified here), I understood that the genesis hash could be employed for such identification. Clearly, I may have misinterpreted this. I appreciate you pointing this out. It’s crucial because it might have inadvertently influenced some API decisions in another library. We’ll certainly need to reevaluate how we can uniquely identify a chain.
- EDIT: Upon some consultation with domain experts, it appears that the most accurate method to uniquely identify a chain is by utilizing a combination of (hash_of_forked_block, block_number_of_forked_block). Meaning that: for chains that haven’t experienced any forks, the identifier would be (genesis_hash, 0). We could represent this information as a hexadecimal string, where a non-forked chain will have a 32-byte long hexadecimal string representing the genesis hash. However, for a forked-chain, its identifier will be longer than 32 bytes. This extra length will be attributable to the compact encoded block number, appended to the hash of the forked block. This mechanism ensures that every chain gets a distinct identifier. This approach not only provides a unique identifier for each chain but also allows for easy differentiation between original and forked chains based on the length of the hexadecimal string. What do you think?
Merging ChainProvider and JsonRpcProvider:
- EDIT: they’ve now been merged. Thanks @tomaka!
Account Object Invalidation:
- You’ve touched upon an important change we’ve recently integrated. The suggestion to shift createTx from Account to ChainProvider and specify the account as an additional parameter resonated with us. This not only makes Account a pure data structure but also alleviates the ambiguity you pointed out. The latest proposal encapsulates this change. Please take a moment to review it and let me know if this clears things up.
Concept of “Suggesting a Chain”:
- The idea behind “suggesting a chain” primarily serves to shield users from potentially harmful dApps. To illustrate: imagine a versatile dApp built for both Kusama and Polkadot. The dApp can allow users to select their preferred network. When a user opts for Polkadot, the dApp checks the availability of the Polkadot relay chain (which is likely present) and then the collectives parachain. If the latter is absent, it prompts the Provider to add it.
- It’s crucial to note that proposing a new chain doesn’t automatically mean the Provider will save or distribute that chain to other dApps. User consent is paramount. If the Provider is, for instance, an extension, it would ideally solicit user approval before connecting to a new chain. Thus, users hold the final decision on chain persistence and sharing across dApps.
- As a hypothetical example, many dApp users (myself included) would be inclined to persist the Polkadot collectives parachain in their provider. Yet, it’s entirely feasible for a Provider to neither retain nor share any user-added chains with other dApps.
Regarding ProviderStatus:
- Your interpretation is understandable. The ProviderStatus pertains not to the state of the light client but to the communication medium enabling the JsonRpcProvider. Examples could range from a WebSocket connection being terminated or a Worker process being unexpectedly halted.
- A connected status implies the JsonRpc API interface is primed for messaging. Conversely, a disconnected status signals that:
  a) No mechanism is attending to the messages dispatched via send.
  b) The onMessage callback will remain inactive.
- Essentially, it notifies the consumer about the unavailability of the medium and perhaps prompts them to initiate a fresh connection. Also, a ‘disconnected’ state is irreversible. I do recognize the importance of clarity in the spec on this, and maybe designations like ready/halted might be more intuitive.
Promise Error Specifications:
- Absolutely! This oversight will be addressed. Comprehensive documentation detailing potential errors is forthcoming. Thank you for flagging it.
Advocating for JSON-RPC Calls:
- While I acknowledge the consistency and predictability that JSON-RPC calls offer, especially when handling known errors, could you elucidate on the additional corner cases these calls could potentially rectify?
Minimizing Promises:
- I concur with your perspective on the complexity introduced by promises, especially when considering race conditions. In response, only two functions now return promises, notably connect and createTx, given their inherent asynchronous operations. The polling of accounts and networks has been restructured to be “synchronous”, eliminating the need for promises there.

Your keen scrutiny has been pivotal in refining this interface. I’m eager to hear your thoughts on these clarifications.

tomaka · October 17, 2023, 7:49am

Well, I think I’m a domain expert as well.

You could indeed identify a chain like that, but smoldot (and clients that support warp syncing in general) would have absolutely no way to know whether they’re on the right fork. They would have no way to make sure that the chain that you want to connect to is indeed the one you’re connected to.

At the moment there’s no way to relate old blocks to the head of the chain apart from downloading every single block.
Maybe this will change with Beefy, but Beefy has no clear timeline and it’s also not clear to me exactly what would be possible or not.

While it might require more efforts, what I would do IMO is:

Remove these concepts of connected/disconnected (consider that you are always connected), even for remote JSON-RPC nodes.
For JSON-RPC nodes, add a “local proxy” (I’m talking about just code, not a proper server) that parses JSON-RPC requests and handles the connected/disconnected states.

While this wouldn’t really be feasible with the old JSON-RPC API, the new JSON-RPC API is completely capable of doing this (otherwise there’s a design flaw and an issue should be opened).

For example, if the dApp calls chainHead_follow, the “proxy” would immediately return a subscription ID but wait until it is connected to the JSON-RPC server in order to generate the initialized event. If later the JSON-RPC server disconnects, it can generate a stop event.

It seems to me that handling being connected and disconnected would add a ton of complexity to dApps, and hiding this behind the JSON-RPC API would make things way easier.

Sorry, I actually don’t remember what I had in mind. I do think that the whole connect/addChain system has race conditions and corner cases, but when it comes to accounts I think it should be good.

The design of the connect/addChain system is highly dependent on the next point:

Since there’s no 1<->1 correspondence between chain specifications and chains, you can’t really do that.

If a dApp wants to connect to a chain, you have no way to know whether this chain is equal for example to Polkadot. You can think of a chain spec as a JavaScript NaN: they all compare different to each other.

I’m busy with other things at this very moment and I don’t really have a solution on the top of my head right now to suggest to you.

josep · October 17, 2023, 8:52pm

It seems we might have a communication gap, so let’s clarify.

The main purpose of introducing the “chain-id” concept, denoted as (hash_of_forked_block, block_number_of_forked_block), is to enable consumers to quickly determine if a Provider already supports the desired chain. We both agree that the current format effectively achieves this.

To delve deeper, it’s up to the Provider to associate this “chain-id” with its corresponding chain-spec. For instance, a Provider implemented using substrate-connect might “natively” support the so-called “WellKnownChains”. It achieves this by associating their genesis hashes with their respective chain-specs.

The pivotal query here is: Can a Provider deduce a “chain-id” from a provided valid chain-spec?

For non-forked chains, the answer is straightforward: “yes”. The identifier in this case would be the genesis-hash, which is accessible either directly within the chainspec or through a storage query, specifically System.Blockhash(0).

However, a more nuanced issue arises: Can we determine from a valid chain-spec if it corresponds to a forked chain? If yes, can we also extract the latest forked block hash/number from it?

While I’d like to believe that this information can be extracted directly from the chain-spec, I must admit I lack firsthand knowledge with the chainspec of a forked chain. Given that you have more expertise in this domain, perhaps you can shed light on this.

If it turns out that the Provider cannot directly deduce the “chain-id” from a chain-spec, considering the rarity of forks in substrate, I’d propose a pragmatic approach: We either:

Exclude support for forked-chains entirely.
Require consumers to supply the forked block number/hash as an additional, optional argument when using the addChain call with the chainspec.

This approach seeks to strike a balance between utility and practicality.

After thorough contemplation and experimentation with the proxy concept you proposed, I’ve drawn the conclusion that integrating a Proxy within the Provider is prudent. However, I advocate for a higher-level proxy rather than the granular approach you suggested. Here’s my rationale:

The connect function’s immediate return of a connected JsonRpcProvider, enabling consumers to instantaneously dispatch messages, is commendable.

Yet, I hold reservations about the Provider being intimately bound to the JSON-RPC spec’s API. This binding should be relegated to distinct libraries built atop the JSON-RPC spec that naturally specialize in creating such low-level proxies. Whenever the spec undergoes revisions, these libraries will inevitably need updates. In contrast, the Provider should be insulated from such changes.

Instead, I envision the Provider designing a proxy whose sole task is buffering the send/createTx messages until the communication channel is primed for message reception. Consequently, we can reimagine the current “onStatusChange” callback as a streamlined onHalt void-callback:

  connect: (
    onMessage: Callback<string>,
    onHalt: Callback<void>,
  ) => JsonRpcProvider

Through this revamped Proxy design, we can drop the ProviderStatus type. Plus, the connect function would synchronously yield a JsonRpcProvider, eliminating the cumbersome Promise. Furthermore, the singular onHalt callback would equip libraries (which already monitor the requests and subscriptions) with the capacity to reject or error out when necessary.

I genuinely hope we can establish a reliable method to derive a unique “chain-id” from a chain-spec, as previously outlined. If the primary obstacle to this is “forked-chains”, then I’m willing to accept the previously discussed compromises.

Thank you once more, @tomaka, for your invaluable insights and feedback. Truly appreciated!

tomaka · October 18, 2023, 8:11am

I strongly disagree with isolating the provider.

You take as assumption the idea that the spec will undergo revision and change often, and you conclude that it would be efforts for you and thus decide this should be someone else’s problem.

The one and entire point of the JSON-RPC specification is for the API to be stable. In other words, the entire objective of the API is to be the opposite of your assumption.

josep · October 18, 2023, 8:25am

To articulate it differently: It’s essential to distinguish between the transport and application layers, rather than conflating them.

Following your proposed method, the consumer of the Provider would face ambiguity. They’d be uncertain if a dropped event occurred due to an oversight in managing unpinned blocks, or due to a hiccup in the transport layer.

While the end consequence might appear identical—resulting in errors for all active requests or subscriptions—it’s crucial to pinpoint the root cause. Libraries constructed over the Provider must clearly indicate to consumers whether the error arose from a transport layer disruption or if it was genuinely a dropped event emitted by the JSON-RPC server. For reference, consider how it’s handled in the @polkadot-api/substrate-client.

tomaka · October 18, 2023, 8:36am

I’m going to explain the problem.

Blockchains such as Polkadot, Kusama, etc. can fork due to disagreements between nodes about which block is valid or not (it happened once in the history of Polkadot and there’s a post mortem about this somewhere online).
That’s not the problem I’m talking about here. Smoldot will always consider one fork to be valid and the other to be invalid. It might not consider the same fork to be valid as the rest of the network does, but there’s nothing we can do about that, it’s just the nature of blockchains. This problem can be ignored.

The actual problem we’re dealing with here is long range attacks.
It is possible for the same validators to create two different valid chains, and keep one of the chains hidden for 7 to 28 days (depending on the chain), then reveal the hidden chain. These two different valid chains are both on completely equal footing. There’s no technical way whatsoever to prefer one chain to the other.

Fortunately, in order to suffer from these long range attacks, you need to have been disconnected for 7 to 28 days. If you are online and that the hidden chain is revealed, you know that it’s an attack, and you just ignore the hidden chain. If however if you were completely offline for a month, and you reconnect, you will see two chains which, again, are on completely equal footing with no way to choose which one is the correct one.

When it comes to chain specifications, a chain spec can point to a chain either before the fork, or after the fork. If it points to after the fork, you can determine which fork.
But of course, the chain spec doesn’t tell you that there might exist a fork. You don’t even know that there’s a fork, as this fork is hidden by the attackers. In fact, there might be thousands or tens of thousands of forks.

In the API that you propose, a dApp can add a chain named “banana” and pass a chain spec to one fork. Later, a different dApp wants to connect to “banana” but actually wanted to connect to a different fork, but because the chain spec has been saved by the extension, it connects to the fork of the first dApp.
The first dApp has effectively redirected the second dApp to a potentially malicious chain.

The chain specification of the second dApp might actually point to after the fork, but as I mention, there’s no technical way to make two blocks relate to each other. There’s no way to know if two blocks are ancestors of one another.

Now that this is explained, I’m answering you:

You simply can’t create a “chain id” as a (hash_of_forked_block, block_number_of_forked_block) because you have absolutely no way to determine what the “forked block” is. Most of the time, you don’t even know that there exists a fork! You could bypass this problem by indicating the hash of every single block of the chain, but we won’t do that for obvious size reasons.

If a dApp says “I would like the chain whose genesis hash is <genesis hash of Polkadot>” and that the provider associates this to Polkadot, there’s no problem at all.

The problems arise if a dApp is able to suggest chains. As mentioned above, a dApp can suggest a malicious chain, and there’s absolutely no way whatsoever for the provider to know that it’s malicious, even with the help of non-malicious dApps that want to connect to the same chain.

There’s no such thing as a “forked chain” and a “non-forked chain”. The two chains are on a completely 100% equal footing. The core of the problem is precisely that it’s not possible to choose between these two equal chains.

The only way to bypass the problem is for the provider to choose which chain is the correct one.
This is what we’re doing with the automatic PRs in the substrate-connect repo. Normally, in addition to these PRs, we’re supposed to release a version of substrate-connect at most every month in order to avoid these attacks, and this is not done, but in theory it should.

But again, this only works for Polkadot/Kusama/Rococo/Westend. If a dApp is able to suggest chains, then this solution straight up doesn’t work.

Furthermore, this solution puts the trust in the hands of the provider. It only works for substrate-connect is created by Parity and that Parity is trusted to not be malicious. If the provider can be “anything”, then this doesn’t work either.

tomaka · October 18, 2023, 8:48am

Who says that?
It might be useful to know when an issue comes from the connection with the JSON-RPC server, but for a light client this is completely irrelevant as there’s no such thing as a transport layer.
If the interface is meant to be light-client-first, then this additional transport layer thing gets in the way and is IMO not useful.

That’s already the case. The light client might generate a stop event if it disconnects from all its peers then reconnects a few minutes later.

Another example is in the case of a load balancer or reverse proxy, the load balancer or reverse proxy might generate a stop event if the server behind it shuts down. In the case of a load balancer, this can be used in order to shut down servers when the load diminishes by forcing the JSON-RPC clients to re-subscribe and thus to connect to a different server.

stop simply means “unable to track the chain”. It intentionally doesn’t mention why in order to give more freedom to the implementation.

josep · October 18, 2023, 8:58am

Right! My bad! I forgot about the fact the stop event already serves both purposes Thanks for pointing this out! Yes, I agree, we should also get rid of the onHalt callback and Provider should proxy the messages like you suggested. For sure! thanks @tomaka!

josep · October 18, 2023, 10:32am

Firstly, I appreciate the effort you’ve made in elaborating on the issue at hand. Your detailed explanation has been incredibly insightful.

Clarifying the ‘addChain’ Method:

It’s essential to clarify that the addChain method doesn’t equate to the Provider permanently storing the supplied chain for subsequent use. Instead, it offers a channel for a dApp to notify the Provider about its intention to connect to a new chain not currently on the Provider’s list, along with the chain’s specifications.

For the vast majority of providers, including those designed for environments like node.js or our in-progress “legacy provider,” there’s no intention to ever retain the chains requested by dApps using addChain. These providers simply facilitate the connection by returning a Chain interface, but they won’t persist or share the chain with other dApps.

EDIT: in retrospective, addChain should be renamed to getChain.

User Consent and Trust Issues:

However, certain advanced providers, such as well-crafted extensions, might present users with a choice: if they trust a dApp’s request to connect to a new chain, and whether they permit the Provider to store this chain-spec for ongoing updates. The Provider must ensure up-to-date checkpoints and restrict access to this chain-spec, offering it only to those dApps specified by the user (identified by their Origin).

This approach requires a significant level of trust from the user, believing that the Provider is capable and reliable enough to prevent long-range attacks.

Preventing Long-Range Attacks:

Is it feasible for a Provider, such as a browser extension, to be engineered to avert long-range attacks effectively? I believe it’s not only achievable but also advantageous, and here’s how it could work:

A “browser Extension Provider”, should relentlessly monitor and regularly refresh the checkpoints of the stored chain-specs. If the checkpoints become outdated, the provider must act decisively by purging these chains—this would be reflected in an empty list response from the getChains function. Furthermore, if that happened, it would be sensible for such a Provider to prompt users to designate a trusted source for obtaining reliable, current chain-specs.

In essence, the Provider bears the crucial responsibility of ensuring that any chains offered through getChains or onChainsChange are secure and possess the most recent checkpoints.

Chain-Specs Scoping by ‘Origin’:

Moreover, “Extension Providers” could enhance security by scoping its persisted chain-specs by their Origin. When users are confronted with the decision to store a chain-spec, they should have the autonomy to dictate its accessibility—either restricting it to the initiating dApp’s Origin or allowing its use across different Origins.

Addressing Potential Risks:

I understand the apprehension regarding this proposal, given that a cunning dApp could still introduce a harmful chain-spec, which an unsuspecting user might authorize. To mitigate this, extensions should, by default, restrict the chain-spec to the Origin of the requesting dApp. Only with explicit user consent should the extension broaden the chain-spec’s availability to other dApps.

tomaka · October 18, 2023, 12:19pm

In the end I feel like the API is evolving to what substrate-connect proposes: one function to connect to a well-known chain given its name, and one function to connect to a chain given a chain spec.

I think it’s fair to offer as well-known chains only the “official” Polkadot relay chains (Polkadot, Kusama, Rococo, Westend) and not care about other standalone chains.

I’m a bit scared of the fact that something this complex is relying purely on documentation.

Since it’s a complex problem, my gut tells me that third parties who implement the Provider interface will simply ignore the whole thing and store chains.

FYI dApps served through IPFS or similar won’t have a proper Origin defined.

I’m a bit wary of relying on the Origin when the (future) “primary” way of serving dApps doesn’t have this header.

The only way to do this is how substrate-connect does it: connect to the chain in the background to update the checkpoint, plus periodically ship versions with an updated trusted checkpoint.

josep · October 18, 2023, 12:57pm

Following feedback from @tomaka, I’ve revisited and revised the PolkadotProvider interface’s initial draft. The updated version is simpler and more intuitive. Here’s the latest TypeScript interface, accompanied by a few clarifications:

type Callback<T> = (value: T) => void
type UnsubscribeFn = () => void

interface PolkadotProvider {
  // Retrieves the current list of available Chains
  // that the dApp can connect to
  getChains: () => Chains

  // Allows the dApp to request the Provider access to
  // a Chain not present in the list of available Chains
  getChain: (chainspec: string) => Promise<Chain>
}

// The key is the "chainId" of the chain.
type Chains = Record<string, Chain>

// `chainId` explanation:
// (hash_of_forked_block, block_number_of_forked_block)
// is the proper way of uniquely identifying a chain.
// Meaning that: for chains that haven't experienced any
// forks, the identifier would be (genesis_hash, 0).
// We represent this information as a hexadecimal string,
// where a non-forked chain will have a 32-byte long
// hexadecimal string representing the genesis hash.
// However, for a forked-chain, its identifier will be
// longer than 32 bytes. This extra length is attributable
// to the compact encoded block number, appended to the
// hash of the forked block.

interface Chain {
  chainId: string
  name: string

  // it pulls the current list of available accounts
  // for this Chain
  getAccounts: () => Array<Account>

  // registers a callback that will be invoked whenever
  // the list of available accounts for this chain has
  // changed. The callback will be synchronously
  // called with the current list of accounts.
  onAccountsChange: (
    accounts: Callback<Array<Account>>
  ) => UnsubscribeFn

  // returns a JSON RPC Provider that it's compliant with
  // the new JSON-RPC API spec:
  // https://paritytech.github.io/json-rpc-interface-spec
  connect: (
    // the listener callback that the JsonRpcProvider
    // will be sending messages to
    onMessage: Callback<string>,
  ) => JsonRpcProvider
}

interface JsonRpcProvider {
  // it sends messages to the JSON RPC Server
  send: (message: string) => void

  // `publicKey` is the SS58Formated public key
  // `callData` is the scale encoded call-data
  // (module index, call index and args)
  createTx: (
    publicKey: string, callData: Uint8Array
  ) => Promise<Uint8Array>

  // it disconnects from the JSON RPC Server and it
  // de-registers the `onMessage` and `onStatusChange`
  // callbacks that were previously registered
  disconnect: UnsubscribeFn
}

interface Account {
  // SS58 formated public key
  adddress: string

  // public key of the account
  publicKey: Uint8Array

  // The provider may have captured a display name
  displayName?: string
}

Latest Changes & Rationales:

Removal of onChainsChanged method: The decision to remove this method was based on the fact that reacting to changes in the list of available chains is likely an infrequent need. Moreover, even if a particular chain is absent, the consumer can always connect to their own via getChain.
Renaming addChain to getChain: The original name, “addChain”, implied that the Provider was responsible for saving the chain for future reference. However, that’s not the intended behavior. Although some providers might offer users the option to save a chain, this approach has inherent risks. The new name better reflects the method’s true purpose without inadvertently suggesting a potentially risky behavior.
Synchronous return for connect: The connect method now returns a connected JsonRpcProvider interface synchronously.
Asynchronous Nature of getChain: The getChain method can’t instantly provide the Chain interface due to tasks such as determining available accounts for a provided chain. While it’s conceivable to return an initial empty list and update it later through the onAccountsChange callback, this might complicate the developer experience. Plus, there might be situations where user consent is required, making an asynchronous approach more suitable, as that would produce a rejected promise with a known error.
List of Known Errors: I’m in the process of compiling a comprehensive list of potential errors for better error handling.

josep · October 18, 2023, 1:12pm

That’s fair. In the end this Interface is meant to provide the API of substrate-connect plus segregated accounts per consensus and a non-leaky createTx function that hides the complexities of how to create an extrinsic for the underlying chain.

It won’t. The purpose of the @polkadot-api/light-client-extensions-helpers is to provide a package that encapsulates all these complexities so that other extensions can have all these cross-cutting concerns already solved. In fact, in the coming weeks the substrate-connect extension itself will be implemented with this package.

That’s fair. Yet, again, that’s just a mitigation, not a solution. Ultimately, it would be the responsibility of the user to make sure that they don’t add malicious chains into their provider.

I agree.

tomaka · October 18, 2023, 2:46pm

josep:

// `chainId` explanation:
// (hash_of_forked_block, block_number_of_forked_block)
// is the proper way of uniquely identifying a chain.
// Meaning that: for chains that haven't experienced any
// forks, the identifier would be (genesis_hash, 0).
// We represent this information as a hexadecimal string,
// where a non-forked chain will have a 32-byte long
// hexadecimal string representing the genesis hash.
// However, for a forked-chain, its identifier will be
// longer than 32 bytes. This extra length is attributable
// to the compact encoded block number, appended to the
// hash of the forked block.

I don’t know if you skipped my explanations above, but I repeat once again: you can’t do that. This (hash_of_forked_block, block_number_of_forked_block) thing is just wrong. You simply can’t even know whether a fork exists.

Aside from this, you can make the interface even more simple by turning getAccounts, onAccountsChange, and createTx into JSON-RPC functions.

It’s debatable whether this a good idea, as there are pros and cons, but if it was up to me I would definitely do that as I think that simplifying the interface is worth it.

josep · October 18, 2023, 8:36pm

I understand your concerns regarding the (hash_of_forked_block, block_number_of_forked_block) structure. But I’d like to offer a perspective based on my experiences and the potential of the Polkadot Provider API:

Local Development Forks:
- For developers working on a local fork of networks like Polkadot, Kusama, Westend, etc., the ability to test their dApp against this local fork is crucial.
- With the Polkadot Provider API, developers can utilize an enhancer—a higher-order function—that takes the standard/injected Polkadot Provider and returns and enhanced one which can integrate their local fork into the list of recognized chains.
- This means that while canonical chains communicate with smoldot, the local fork connects to the developer’s local node.
- How do developers differentiate between the canonical and local fork? The canonical one uses the 32 bytes of the genesis hash. However, developers are aware of the block at which their local fork was created, hence they can identify its chain-id.
- I’ve had to navigate the complexities of working with local Ethereum forks. This proposed method is significantly more efficient than dealing with Ethereum’s chain-id system.
Sovereign Substrate Chain Integration:
- Imagine a sovereign substrate chain. They might want to offer their developers the benefits of the Polkadot Provider API’s tools.
- These substrate chains can offer a library consisting of a Polkadot Provider enhancer. This enhancer would modify the original Polkadot Provider to support their chain.
- If, for some reason, such a chain faces issues and needs to fork (a problem which could have been avoided if they had been a parachain, but who am I to judge?), their enhancer can adjust and provide the correct chain-id post fork.
Limitations and Opportunities:
- I acknowledge that smoldot might not support such a “chain-id” directly, of course. Smoldot can only give me the genesis hash, I get it.
- However, the Polkadot Provider API’s flexibility means that other providers or enhancers could incorporate forked chains when necessary.

In conclusion, while the vast majority of times the chain-id will align with the genesis hash, the flexibility to accommodate local or sovereign forks is quite valuable IMO. This makes the (hash_of_forked_block, block_number_of_forked_block) structure not just relevant, but potentially transformative.

tomaka · October 19, 2023, 7:37am

I’m telling you that doing this is just plain wrong, it’s not a matter of usability or trade-offs here.

christ · October 19, 2023, 8:32am

Thanks for the info, Josep.

My team and I are developing a JS bundle that uses PolkadotJS. The major issue we’ve hit is the size of the bundle after including PJS + deps. The bundle is for web pages and apps and not related to browser extensions. Therefore, it would be nice if some of the npm packages you develop are usable in the browser and not solely focused on browser extensions.

We had planned to use the smoldot JS integration for this but I suppose what you propose will be a higher level version of this. Is that right?

Topic		Replies	Views
Multichain friendly account abstraction Tech Talk	26	3683	November 24, 2023
Polkadot-API 2023-Q4 Update Ecosystem ux	4	969	February 5, 2024
ERC-20 like Standard for Polkadot Tech Talk	19	1937	February 18, 2023
What are some differences between Substrate and Cosmos SDK?	20	6471	February 20, 2023
The state of DotSama Ecosystem	16	3921	October 21, 2022