Hyperfridge - ZK-Web-Proofs for tradional financial backends (TradFi)

wstrametz · June 16, 2024, 1:51pm

Hi there!

We want to share and ask for comments on our onging work on “hyperfridge”.

Today blockchain integrates with the FIAT and banking world using stablecoins and Crypto-Exchanges. Hyperfridge adds a new technology layer to TradFi so that the Ecosystem can use any data stored on traditional backends in a trustless manner using zero-knowledge technology. Its possible to create proof of humanity, nationality and name, implement payment systems and trigger TradFi payments directly from smart contracts (“programmable money”).

We see Polkadot as a perfect fit because of its concept of “off-chain-workers”. But proofs can be used on any chain and integrated with any wallet (or client), making Polkadot an important pillar for other ecosystems as well.

Check out these links - the first one gives a high level description which screenshots how the enduser may use the service - but note that we want to provide all code as public good. The other links go into detail a bit further - and contain a link to our implementation using Risc0 ZK technology.

Google-Docs Link - feel free to comment in the google doc as well.
Whitepaper for bank integration
Our Web3 Grant Application
Our current risc0 implementation

Thanks for comments!
Walter

burdges · June 16, 2024, 3:19pm

How big are the proofs? Is that unwrapped or wrapped?

How long is the prover time on a phone or whatever?

What is the value of risc0 here? It’s only for privacy I guess?

It’s not really latency saving since the transaction could be aborted by the sender, right?

There is a simple but not-really-private solution: You post the camt53 messages directly onto the parachain, but only required by the reciever. You make the reciever escalate some dispute before this happens, so then then the reciever loses some similar about of on-chain tokens if the sender’s camt53 messages verifies on-chain, aka the reciever can cheat by dispute only if they beleive the sender would not sacrifice privacy. It’s basically a tiny state channel.

I doubt this dirty solution brings any advantages over the zkp though, because even if the zkp takes a lot CPU time the sender could always use an unused phone, tablet, or laptop for doing that proof. And disputes are terrible UX.

wstrametz · June 17, 2024, 6:47am

Hi - thanks for your questions!

Proof size:

As we use risc0, proof size depends on the framework; here are measurements of our implementation.
Here is data from risk0 - our proofs are quite heavy with 30 million cycles - see our comments below on why we used risc0. Proof generation times you see in the link above.
Wrapping-1: You can convert the STARK to a SNARK - Here an article by risc0 on Groth16 which was implemented using circom if I remember right. We did not test this yet, but risc0 provides that off-the-shelf.
Wrapping-2: Note that risc0 supports proof composition and recursive proofs which adds flexibility (e.g. proof the data, then another proof for nationality, balance, single transactions etc).

For implementing REST-API proofs (e.g. Stripe) we are currently playing with TLS notary (web-proofs) where we expect different data, but we do not have an implementation or data yet.

Prove time - see first link above. On a normal laptop it takes couple of hours for one proof, but we expect a couple of minutes with hardware acceleration. I tested on my lenovo legion 5 gaming laptop, but memory was too little to run the proof on our implementation. Our implementation consumes roughly 35 Million Cycles.

Why risc0: We looked into systems like circom for quite while to add soundness to the data, but risc0 was the first framework which could handle a complex protocol like EBICS where you need to parse XMLs and unzip payload. We already had a system that works with ethereum and substrate, but without ZK. So you needed to trust the backend - more precise, the client which is fetching data from the banking backend. But this is more an integration use case, because it easy to fake the payload (data from baning backend). Risc0/ZK adds “proof-of-computation”, means that you can verify how the data was fetched and finally trust the data. If the bankends would sign their data, we would not need all this (if we ignore privacy).

Abort Transaction:
But once you get the camt53 is issued by the bank the transaction is booked - you can only undo it with a reverse transaction. A daily statement is a legal document and represents “finality” on the banking ledger. Daily statements (camt53) are produced at End-of-day processing, so latency is always next (bank working) day. But I am not sure how this will be with the new “immediate settlement” where a wire-transer takes seconds, and not a day. This new protocoll element is currently rolled out throughout EU, my bank in Switzerland will implement that in August this year, then I will have a closer look. My guess is that the bottleneck of latency will be proof generation time, and not the banking backend any more.

Camt53 on parachain:
You can fake the camt53, because the payload in EBICS (in that case the camt53) is not signed by the bank… so there is no value of doing it that way. Also you would see all other transactions (names, amounts etc), which would be a real bummer for privacy. Personal financal data is the same category as personal medical data, so we need privacy there.

Disputes:
Yes - if we can trust the data then disputes would be a matter of application logic. ZK enables it to process only sound final data which can not be reverted - thats the actual idea. We want to create a “wrapper” which is able to provide exacly this property of soundness. A banking backend (transaction data, balances, buy/sell fiat or other assets) could be used in similar ways as we are bridging blockchain protocols, where we can rely on the finality of data. The CPU time is a issue; but that just how good it can get currently I assume.

Let me know if I have covered your questions.
Cheers, Walter

burdges · June 17, 2024, 8:01am

I’d missed this diagram with witness nodes. If the camt53 is not signed, then you need witness nodes that say they recieved packets that decrypt to the camt53 from the right IP address?

A TLS notary node can attest to where they think packets came from, but afaik they need not even know the packets contents. I’d expect the attestations to be seperate from the zk proof of decrypting and processing the packet.

Any idea if TEEs can attest to packet provenance like that? If so, that adds something, and alone the TEE maybe useful if the value is not high.

Can you recieve the camt53 over email? That’s not private, but it’s much easier to verify, just a DKIM signature by Google.

If normal users cannot generate the proofs, then there is not too much privacy here in the first place, although I guess many users could trust Amazon EC2. It’s useful for launching a small exchange or market maker or other small institution that’ll run provers for themselves, and can amortize the hardware costs.

burdges · June 17, 2024, 8:21am

Another crazy idea: What if the sender cares about privacy vs the public blockchain, but not about privacy vs the recipient? You could’ve TLS notaries sign merkle roots of the encrypted packets, which the sender places on-chain, and reveals to the recipient. The sender also encrypts the keys to the recipient on-chain, along with the indexes into the streams. It’s then trivial for the recipient to provably decrypt the keys and decrypt the payload on-chain, but if the sender sent the funds then this gains the recipient nothing, and likely costs them some fee. Anyways disputes suck for UX like I said above, but if the proofs take so long then maybe…

wstrametz · June 17, 2024, 8:44am

Yes - witness is needed. In the current implementation the witness sees the camt53 in cleartext, which is not ideal. That is why we are looking into TLS notary, where this is not the case - and I am quite confident we can take the “trick” of TLS notary to achieve same things with EBICs. For web-proofs based on REST APIs we could just use TLS notary - but eiher way a notary is needed, which is implemented as an off-chain-worker in a TEE where the address of the OCW is the witness/notary key. The sealed code for proof generation and commitments proof that the data was generated for the right client and the right bank; the banks keys are public. Anyway - in case of EBICS we probably can not use TLS notary out-of-the box, because the payload sticks inside the EBICS envolope (XML), but thats what I am looking at now.

Camt53 via Email?
That would work but it assumes that a bank is doing that - the banks I know do not offer that. But that would indeed life much easier. There is a project (ZK Mail) which does similar things - this podcast discusses it: Episode 302: ZK for web2 interop with zkLogin & ZK Email - ZK Podcast. Just fyi - hyperfridge also works with pain files (payment instructions) to create new transactions, so that you can trigger wire-transfers.

TEE: Thats what risc0 is providing and the implementation of the proof. The algo is sealed, you know exacly what it is doing. But a TEE is needed as long as we have private inputs. But we can trust computaion, no matter where it gets executed. The “normal users” would be smart contracts devs anyway, not an end user. The end user would only see a credit card payment or wire-transfer slip (QR code), thats supported out-of-the box already by browsers (auto-fill-in) or end-user e-banking clients (e.g. scan QR code).

wstrametz · June 17, 2024, 8:50am

To the crazy Idea: That was the initial concept - which is still documented in the whitepaper linked in my first post. But the ZK proofs are self-contained anyways, so its good enough. But if we need to show that we are not “missing daily statements” - means that all data is also complete, no transactions are missed, we would need sth like that. But we can achieve same property with proof composition, which is much easier to implement. You just add the previous proof to the new one, validate it and check the sequence number of the camt file to the previous one - job done. If sequence number is not consequitive it would be not able to generate a new proof. Similar to a hash-pointer to the previous block, the chain is broken. If its not broken, you know its correct.

burdges · June 17, 2024, 1:54pm

Interesting! You could prove the account was doing only the things the blockchain permitted? Nice

burdges · June 17, 2024, 2:08pm

You should figure out if you want the proofs wrapped or unwrapped.

You’ve likely assumed you want the wrapped, but remember polkadot itself is already a roll up. It maybe cheaper to just buy the extra blockspace than to produce the wrapped proof, especially if the wrapping prover is slower than the unwapped risc0 prover.

It looks like your proofs run off the end of the risc0 data sheet, maybe they could be optimized better, but anyways you should figure out the real sizes of unwrapped proofs, and the prover time for the wrapping.

You also care about wasm running time for the unwrapper risc0 verifier. What computations does a native unwrapped risc0 verifier do? Just hashes? We could discuss adding hostcalls depending upon all the above concerns, but zk host calls should be “generic”, not specific verifiers.

The first question is really what part of the pipeline takes what resources though.

wstrametz · June 18, 2024, 8:42am

Hi - many thx for your input - it triggered quite some thinking. In the MVP implementation everthing is unwrapped, because we just know that it can be wrapped into Groth16 out-of-the-box anyway and the focus was more to make it work with EBICS and learn on the way doing that. Especially we were convinced that other ZK frameworks would not work for generating proofs for EBICS data due to complexity of protocoll and the processing of XML and zipped content.

Anyway; I will run some tests with MVP and add resulting proof size to github, also adding Groth16 wrapping and checking verification times as well, it is an interesting question. I found an article which gives an idea where this will lead to:

On optimization:
Most of cycles come from one line of code:

let encrypted_recreated = rsa::hazmat::rsa_encrypt(&pub_key, &BigUint::from_bytes_be(decrypted_tx_key)).unwrap();

Which is encrypting session key with RSA (decrypting instead of encrypting would lead to plus 70 mill cycles) - for encrypting, MODPOW operation accounts to 80 % of the computation. That would be the spot to look next for optimization, but we did not do so far.

For the rollup-part of your question I m not quite sure if I understand your point. So I assume you meant rollups for change of state, e.g. of a merke tree wich contains all transactions on an account - there this would make sense, but probably I am missing your argument. If we have just proofs - if the argument is about on-chain storage I get the point, but i do not understand using rollups this might reduce computation on the ZK side. IMO most crucial point is proof of computation. We need to have a guarantee that exaclty the committed algo is running and does all the checks which are needed in order to create the proof. If proof exists, it means that this proof was generated by a sealed algo (and computation). I do not get how a rollup might help doing this job, maybe you help me understanding your point a bit better. Note that proof generation is fully done off-chain, in an off-chain worker.

burdges · June 18, 2024, 12:51pm

Really? An RSA decryption should be much faster using the CRT trick:

Also, where does client_key originate? Is it provided by the bank? Is it ephemeral? Can you make it yourself? RSA often takes e=65537 like in key.rs - source but you can choose much smaller if you’re careful:

It’s maybe worth asking someone who’s really an expert in RSA, not just me. It’s risky to reuse the same message like this, but if your message is an ephmeral key, and not reused across the TLS notary sessions, then likely everything is fine.

I suppose blind RSA could help here too, but it’s a messy cut & choose protocol.

Anyways…

Does some secure AEAD get used downstream from the RSA de/encryption? If so, you could likely skip the RSA entirely, just claim the symmetric key in the SNARK witness, and then decrypt using the AEAD. I’d expect your TLS notaries attest to the ciphertext x and mac t, so then AEADdecrypt(x,k1,t) and AEADdecrypt(x,k2,t) cannot both authenticate. Also, even if you’ve surpressed the mac authentication, then many ciphers would prevent them both being well formed XML.

In essence, you’d shift more of the soundness onto the TLS notaries, but that’s likely fine since you need them anyways.

“roll up” is an overloaded term.

I said: You could verify wrapped or unwrapperd proofs on polkadot, so you should figure out which make more sense. It depends upon the cost of the prover time for wrapping, vs the cost of checking the larger proof on a polkadot parachain.

On chain storage is not a concern because the proof gets dropped immediately.

If the wrapper prover costs 1 million times the cycles of the on-chain verifier, then on-chain verification maybe cheaper, but they’ve maybe optimized the wrapper.

wstrametz · June 18, 2024, 7:22pm

Cool - thx a lot, the links are great! Going to try this soon, should be quite straight forward. Keys are self-generated, usually with a commercial Ebics client, but we used open source and openssl for test data where we are in control. Curious how the effect/gain of lowering exponent will be - putting the danger of weak exponents aside for a moment. Also using CRT should be easy to measure. Keys are not ephemeral - my productive key is valid for years to come; but as it is used for one endpoint only it can be easily revoced anyway. There is no AEAD in Ebics - its quite an old protocol based on XML document signing standards.

burdges · June 18, 2024, 8:00pm

EBICS uses DES with no authentication? Are the banking idiots really shipping “new” protocols like ISO 20022 using DES? Jesus.

How could they prevent tamtering with messages in-transit?

DES is not allowed in TLS 1.2 or later, only AEADs, so maybe you could still use the AEAD trick on whatever the TLS notaries see? If so, that’d prove you have a ciphertext which DES decrypts to the desired XML

In theory, you’re worried about some collision attack where a symmetric ciphertext x decrypts to two meaningful messages y and z under different keys, ivs, etc.

As a rule, one should be more worried about the file formats’ malleability than the encryption here, but crap like XML tends to be hyper malleable. It maybe if one could exclude many large blobs then one could ensure only one valid decrpytion exists, but fuck if I know this XML mess.

In fact, there could be vulnerabilities in the underlying ISO 20022 standard, maybe only from the perspective of your use case. In particular, if I employ RSA keys with extra prime factors, then maybe multiple encryptions exist.

As an aside, there are a lot of Strings in your code, instead of Vec<u8>s, which maybe adds some constraints in the STARK if the unicode ever gets parsed, probably not important.

wstrametz · June 19, 2024, 6:46am

Hi - they are using an XML Standard for Signatures - my banks’ Ebics version still uses edition of the year 2000

Basically the standard defines “flags” for tags which are hashed and signed in the defined format, along with some XML canoninzation rules - for those flagged parts fo the XML, tampering is prevented.

Totally agree on the mess. But these standard wont change for many years, there way too many parties involved and it is working “somehow”. The ISO 20022 can be considered “safe” - because ISO20022 defines just messages (the payload) - “key ceremony”, security protocols and how payload is exchanged is defined e.g. in the EBICS protocol. Pain-Files (payment instruction) and camt (cash mgt = daily statements) are ISO20022. Like Ebic, SWIFT is another protocol used internationally which also uses the ISO message format (and there are others too). In contrast to Ebics which is provided by thousands of banks to customers, SWIFT is used between banks and not offered to end-users/bank-clients.

It seems to me that banks are quite lazy upgrading to newer TLS protocols - just until some months ago I had to install legacy support for openssl to access/process productive backend, and also on the java side I had to import outdated libs; but finally they catched up. Working for banks a lot, I see especially for small banks its hard to keep up, especially if they run their own systems in the basement. Likely this workload will go to a hyperscaler soon.

I agree with using TLS notary in Ebics - currently I am looking into TLS notary a bit closer and do some experiments, I just dont know enough yet. Also I guess it is safe to assume that banks will use TLS 1.2 or higher in near future. Maybe its good enough to use TLS notary library for downloading and proofing the payload and ignore the Ebics protocol - ideally TLS will use same Certs as in the Ebics file - Ebics Certs are very “stable” and not only used for client communication but also for comms between banks and the national bank; so its a quite a strong identity proof for the bank, which can be managed in a DAO on-chain (lookup of public certs). Today the origin of the Ebics envelope can be safely rooted back to the bank - but not the payload though, thats why we need notary.

I also had a talk with the guys who work on the standard a couple of months ago while I was implemented the prototype. Signing the payload will not happen soon - even is has been addressed in the standard for the last years already. Interestingly the use case for signing data would be to present bank data to financial authority to automate VAT calculations - but this was not pushed so far. But this is the same for other data provided by REST-interfaces of STRIPE & Co - nobody provides signed data, so we need a notary service and some infrastructure around it.

burdges · June 19, 2024, 12:17pm

I’m kinda shocked this AEAD trick looks so far away from workable, but oh well complex malleable formats strike again.

Really? That’s odd, but okay.

wstrametz · June 19, 2024, 1:24pm

yeah, that is very odd. It gets even odder if you look into the XML schema spec of Ebics, where you can see the xml tags for signing the workload are actually defined… BUT with a max-occurance of 0 (!!!) - means adding sth according to spec can not validate according to schema, and this was not touched for several realeases of Ebics.

wstrametz · June 26, 2024, 6:30am

Hello - just a fyi update - I m switching to discord of tlsnotary to clarify if the use-cases described here can be run in a trustless manner with tlsnotary (within a zkvm if needed) and REST-based APIs; I will post my finding here. cheers walter

burdges · June 27, 2024, 2:11pm

Meh, nothing is “trustless” really. You’ll always need one honest notary in TLS notary, but in theory you could commit to a result, and then have notaries reandomly assigned to you.

We could discuss that maybe over matrix/element if you like where I’m @ jeff:web3.foundation, because the underlying protocol vaguely resembles polkadot itself. I’d suggest you delay anything complex like these “commit & sample tls notaries” until later iterations.

Topic		Replies	Views
Implement Crypto Primitives and Confidential Transfer Pallet Ecosystem treasury	2	622	April 11, 2023
Decentralized DOT<>ETH Bridges: A Comparison Thread Tech Talk	33	4744	August 3, 2023
ZK Rollups as Parachains Tech Talk	21	4091	June 3, 2023
ChatGPT4 prompts to understand Polkadot protocol	0	391	April 1, 2023
Polkadot DA vs competition	25	2321	November 13, 2023

Hyperfridge - ZK-Web-Proofs for tradional financial backends (TradFi)

Related topics