Learning from Audit Findings to Scout with LLMs
Abstract
Join us in our quest to gather findings from audit reports on pallets, runtime, and node code! By storing, parsing, and tagging these findings in the Scout Substrate Dataset [1], we pave the way for security researchers to explore new detection techniques against known vulnerabilities in Substrate. This dataset marks our first step in researching LLM approaches to vulnerability detection in Polkadot—an area we have been actively working on in partnership with LAFHIS [2] for Solidity and are now eager to explore for Substrate.
Introduction
Imagine being able to learn from the collective wisdom of countless audit reports. What if you could spot vulnerabilities by studying the original findings, the specific code flagged, and its eventual remediation? What types of security weaknesses could you uncover and defend against?
Our Approach
With the support of the Polkadot Alliance Legion (PAL), we are enhancing our static code analyzer, Scout [3], to support Substrate pallets, runtime code, and node code [4]. As part of our first milestone and while identifying issues in audited Substrate code, we developed the initial version of the Scout Substrate Dataset [1]. This open dataset includes audit reports and mapped Substrate issues, along with corresponding audited and remediated code. It helps us identify issues that are approachable with static analysis detectors, improve their precision and recall, and set the stage for LLM research focused on Substrate code.
In recent months, we’ve conducted research (pending publication) inspired by GPTScan [5] on vulnerability detection in smart contract code (Solidity [6]) using LLMs, in collaboration with the LAFHIS laboratory at the University of Buenos Aires. This research, supported by a grant from the Sadosky Foundation, has yielded insightful takeaways [7] that we believe could be applicable to Substrate code.
We outline below some questions guiding our research in identifying vulnerabilities using LLMs:
- What vulnerabilities are detectable with static analysis?
- For the detectable vulnerabilities, can the precision and recall of static analysis detectors be improved using LLMs trained on audit reports/findings, audited code, and remediated code?
- For vulnerabilities not easily detectable with static analysis, is there an LLM approach that can learn from findings and audited code to detect them directly?
Positive results have recently emerged from LLM research in vulnerability detection, with some cases surpassing traditional approaches and uncovering new vulnerabilities [8] [9].
This is why we believe now is the right time to conduct foundational research—with a clear focus on developer usability—on using LLMs to identify issues in Substrate code or to improve the precision and recall of detectors built with traditional methods (i.e., static analysis with Scout).
As a side quest, we’ve also been exploring other questions related to detection techniques and how they could be enhanced with LLMs. Reach out to us if your team specializes in fuzzing or formal verification, and if any of these questions resonate with you!
- Can LLM approaches help automate harness generation for fuzzing setups?
- Can LLM approaches successfully identify invariants for formal verification based on project whitepapers?
Background
In 2023, CoinFabrik obtained a grant [10] from the Web3 Foundation to research Rust-based vulnerability detection techniques applicable to Polkadot code. We focused our efforts on ink! and, in collaboration with the LAFHIS research team [2] from the University of Buenos Aires, successfully identified initial issues and effective detection approaches using static analysis and linting techniques. This marked the birth of Scout as we know it. Since then, Scout has evolved with a focus on user experience and usability. Equipped with a CLI, VSCode Extension, and GitHub Action, it seamlessly integrates into the secure development workflows of developers in the Polkadot and Stellar ecosystems.
Two key factors were crucial for the success of this research-driven approach:
-
The Partnership between Industry and Accademia: CoinFabrik, with expertise in blockchain auditing, product development, and a deep understanding of client needs, partnered with LAFHIS, a specialized research team from the University of Buenos Aires working on state-of-the-art vulnerability detection techniques, particularly in Rust code.
-
The Polkadot DevRel Community: Initial learning materials kindly provided by Alberto Viera and the opportunity to participate in the Polkadot Academy at Berkeley gave us abundant resources to onboard our developers and security researchers into the ecosystem. This was pivotal for the development of Scout and our subsequent participation in the Polkadot Alliance Legion (PAL) as auditors.
Now, with the support of PAL, we are embarking on a new journey: learning to Scout vulnerabilities in Substrate pallets, runtime code, and node code. To identify detectors for Substrate issues, we began by reviewing and storing dozens of audit reports and their associated findings in the Scout Substrate Dataset. This dataset, which is under construction, is available on GitHub [1] and Hugging Face [11], and is valuable not only for our work but also for other security research teams in the ecosystem.
Our motivation for building this dataset stems from research conducted this year [7], also in collaboration with LAFHIS, using LLM approaches to identify security issues in smart contracts, supported by a Sadosky Foundation Research grant. Through this research, we identified potential issues (e.g., related to fee management) that seemed approachable with LLMs and developed a similar dataset [6]. However, this research focused on Solidity smart contracts.
We are working on a joint proposal with LAFHIS titled “Learning to Scout Substrate Issues with LLMs”, where we aim to outline clear approaches for developing proof-of-concept LLM or LLM-assisted detectors for issues included in the Scout Substrate Dataset. Our goal is to lay the groundwork for security- and QA-oriented developer tooling that applies LLM techniques to Substrate code.
Stay tuned for updates on the Scout Substrate Dataset [1], and we appreciate your support in discussing our upcoming research initiative!
References
References
1. Scout Substrate Dataset: https://github.com/CoinFabrik/scout-substrate-dataset
2. LAFHIS: https://lafhis.dc.uba.ar/home
3. Scout: https://www.coinfabrik.com/products/scout/
4. Scout Substrate: https://github.com/CoinFabrik/scout-substrate
5. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis: https://arxiv.org/pdf/2308.03314
6. CoinFabrik Dataset of Solidity Issues for RnD: https://github.com/CoinFabrik/solidity-rnd
7. CoinFabrik-LAFHIS Preliminary Research Notes before Publication in January: https://drive.google.com/file/d/1t7Fn_CCD4x9pIhAbYZRlAN78jD1OyrRU/view
8. From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code: https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html
9. Do you still need a manual smart contract audit?: https://arxiv.org/abs/2306.12338
10. Web3 Grant Delivery Repository: https://github.com/CoinFabrik/web3-grant
11. Scout Substrate Dataset Hugging Face (Milestone 1): https://huggingface.co/datasets/CoinFabrik/scout-substrate-m1