Hello Polkadot community!
We are excited to introduce Ink!jet, a VS Code plugin that aims to bring together generative AI and ink! smart contract development to improve development lifecycles in the Polkadot ecosystem.
Recognizing the technically intricate nature of smart contracts and the high level of expertise they demand, our project aims to simplify these complexities, thereby democratizing the creation process. Our goal is to enhance the productivity of existing developers through bootstrapping and assisted code iteration, while simultaneously lowering the barrier of entry for new developers.
Our team’s motivation for the project is twofold. First, we are supporters of decentralized technology and its potential to redefine the digital landscape. We believe that the future of blockchain depends on cross-chain interoperability, which places Polkadot at the center of this emerging need.
Second, we are deeply intrigued by the potential of generative AI in software engineering. Our aim is to push the boundaries of what these AI models can achieve in novel contexts, such as the ink! programming language, and to explore solutions that empower developers to build within the decentralized ecosystem.
Generative AI and RAG:
Existing generative AI have limitations in both the amount of Rust and ink! code in their training data. Through recent research, we have found that Retrieval Augmented Generation offers the ability to enhance existing Large Language Models with novel knowledge.
Our platform uses a retrieval-augmented generation pipeline with datasets of collected and generated ink! smart contracts to bridge this knowledge gap. Injecting vectorstore retrieved code into prompts, this system utilizes in-context learning to improve response quality.
This concatenation creates an opportunity to improve the ink! development process in two ways.
- Existing LLMs have proven to be accurate and efficient pair programming assistants in a vast number of programming languages. Bringing in ink! programming capability will extend this productivity benefit to the Polkadot ecosystem.
- ink! is an evolving programming language. As large language models are cost and time intensive to train with updated information, RAG provides a way to quickly update a pre-trained frozen model with new information. Updating the RAG vectorstore only requires performing database operations opposed to training an entire model from scratch. Outdated syntax can be removed from the vectorstore as well. This means that users will be able to use our plugin to assist in the latest version of the language without delay.
Other Features:
Observing the popularity and functionality of existing plugins such as Github Copilot, we aim to bring a similar user experience with ink! specific features added. The plugin will be able to perform to following:
- Explaining code
- Adding code comments
- Detect and fix bugs (detection through CoinFabrik Scout)
- Refactor / clean up code
- Step by step instruction breakdown
- Adding error handling
- Chunking and analysis
- Solidity to ink! translation
With the recent development of features such as @ workspace in Github Copilot, it is now possible that plugin development assistance can reach into the entire active codebase. We will integrate this functionality to allow live assistance within the current file and project of the user.
A chat feature is also provided if the user wishes to manually paste in code to a request or ask general questions.
Prototype UI:
The extension will be displayed in the primary/left sidebar of the VS Code window, replacing the file explorer when open. It will have vertically stacked sections, featuring a chat, chat settings, analysis, and templates. These will be resizable if the user wishes to view a certain section in a larger space.
The chat will be similar to the Copilot Chat feature, where the user can converse with our model, and have their prompts enhanced through the RAG pipeline. They can ask questions about documentation or general software engineering by directly messaging the model.
In order to provide coding suggestions, the user can select code within an open file and then ask a question to the model with the @ selected keyword in the prompt. This will provide the question and the selected code to the model.
The response will be populated in the chat window. After the response is returned, the user will be asked if they wish for the response to be populated into their file. If they respond yes, the extension will replace the selected code with the response. We want to maintain asking permission in case the user wishes to retain their previous code, or code it themselves.
For the chat settings, we are opting to have them within the extension UI instead of in a separate settings tab. The settings may need to be adjusted many times for the user to find what works best, so we are prioritizing ease of access. This includes model temperature, top-k results from the RAG retrieval, and other OpenAI parameters.
Analysis will have a button and results. The button will take the current smart contract (must be open in VS Code current tab), and run it through CoinFabrik’s Scout. If Scout is not installed, it will notify the user, and install it if they wish. After the vulnerability analysis is ran, the results will be displayed in the results part of this section.
Templates will have a simple dropdown, where the user can select a template to use. There will be a “Create” button, where when clicked, a new .rs file will be initiated in the current file directory with the template contract within it.
Stack:
We plan to build using the following technologies:
- Typescript/JS and Node.js (Electron Framework) for VS Code Extension
- Python and Javascript for RAG-LLM pipeline
- LlamaIndex and LangChain libraries for data loading, processing, embedding
- LlamaIndex and LangChain libraries for vectorstore retrieval and LLM interaction with retrieval results
- Milvus and Weaviate for Vectorstore
- OpenAI text-embedding-ada-002 Model for Embeddings
- OpenAI for LLM (GPT-4-32k, GPT-4-0125-preview, GPT-4-1106-preview)
- ink!/Rust for Smart Contracts
- CoinFabrik Scout for Vulnerabiity Detection
Ecosystem Fit
This project fits into the ecosystem as a developer tool. It is aimed at improving the smart contract development cycle through bootstrapping, assisted coding, and iterative feedback.
The target audience are existing smart contract developers and those looking to start writing smart contracts.
As the programming language ink! is built on top of Rust, there exists some barriers of entry, for both Rust and ink!.
Rust is a low level language, requiring manual management of memory and pointers. While the language has steady increases in adoption and high ratings, it remains challenging for those coming from other languages and especially new developers. In addition to this complexity, the ink! programming language introduces a handful of macros and does not rely on the standard std library. Moreover, managing environment variables is done with a different crate and the code is compiled into WASM instead of machine code.
Acknowledging these technical barriers, our tool aims to facilitate an easier transition towards developing in ink! through guided development. Those who already know Rust can easily step into ink! development, and have the differences in syntax explained. Those who already know ink! will be able to save time writing boilerplate and refining work-in-progress contracts. Users are continuously provided feedback on their code, which saves time spent searching documentation or posting on forums.
At this point in time, current LLMs provide limited assistance due to the scarcity of Rust and specifically ink! code in their training data, and we aim to bridge this gap through our approach.
Milestones
We will be building the features in this order:
- Creating and Curating ink! Dataset for the Vectorstore
We will scrape the source code from verified contracts on ink!/WASM enabled chains, such as Astar and Aleph Zero. Github repos of example contracts will be scraped, such as paritytech, Astar WASM Showcase, Metis, and others. To compliment these examples, we will also scrape all ink! documentations available from the ink! docs, Substrate docs, and others. Additionally, we will use a barebones RAG-LLM pipeline with only documentation loaded to create generated examples through guided prompts. We divide our approach into categories and subcategories of smart contract purposes, including payments, transfers, lending, borrowing, vesting, escrow, NFTs, tokens, and more. Every smart contract will be analyzed with CoinFabrik Scout and manual inspection to ensure that no vulnerable contracts are included in our dataset. - RAG - LLM Pipeline
We will build the pipeline that will connect GPT-4 with our RAG system. We will be using the latest 128k token limit model, which should provide ample room for in-context learning with retrieval results. The user prompts will be injected with the retrieval results. A conversation history will be kept in a JSON data structure to keep a running context. At this point, the pipeline will be usable through the command line. - Barebones VS Code Plugin
We will build the VS Code plugin and implement the chat feature in the plugin window. The user will be able to interface with our RAG - LLM pipeline in this stage. We will have options to adjust temperature and top-k retrieval results from our RAG vectorstore. - Templates
We will add template smart contracts that users can use for scaffolding. These will be available in a dropdown menu in the plugin window. - VS Code Context
We will add in the ability for the plugin to use selected code or files in the active codebase. The user will be able to ask questions by right clicking on a selection or using @ selected in the chat. - CoinFabrik Scout
We will next implement CoinFabrik Scout for vulnerability scanning. The currently open ink! smart contract will be checked for vulnerabilities and a report will be provided in the plugin window. Suggestions will be provided for possible fixes. - Refined Suggestions
We will perform live testing on the performance of the plugin at this stage, and use the results to improve accuracy. This stage will focus on code specifics (e.g. macros), code comments, error handling, and code explanations. Prompt engineering and retrieval modifications will be done to optimize performance based on the results. - Chunking
We will add a feature where the user can break their smart contract into chunks. An analysis will be provided to the user for each chunk. We will most likely segment the smart contract into its variables and individual functions. - Solidity to ink!
At this stage, the plugin will be capable of assisting with ink! development. We will pair this proficiency with Solidity knowledge, either in the model’s existing weights or adding to our RAG vectorstore, to provide Solidity to ink! translation. While one-to-one translation is not exact due to differences in the languages, this feature offers detailed explanations and near-exact translations to help Solidity users learn ink!. - Autocomplete
We will implement a feature that allows users to autocomplete code in their IDE, similar to Github Copilot. This feature requires live feedback for the code being typed, so we will optimize latency with our pipeline. Accuracy will also need to be optimized to ensure that suggestions are relevant to the contextual code.
Looking Ahead:
We are looking forward to maintaining this plugin long term and updating it as new ink! and LLM versions are released. Once these features are implemented, we will focus on broadening the scope of our plugin to other aspects of development.
We are looking into building a full substrate development environment available to the user right in their IDE, including running local nodes and testing contract deployments.
We also want to lower the barriers of entry for developers of other languages. While we already plan to add in Solidity translation, we will identify other ecosystems that are parallel to ink!, and improve our vectorstore to be able to onboard those developers.
What this project is NOT
We are not providing a for-profit product, and all usage will be free to any developers. The system and datasets will also be public, so developers can choose to build their own version of the system with any modifications if they please.
We are not providing full automation nor replacement for existing developers, this tool is designed to enhance the development cycle and increase efficiency.
We are not claiming that this system is without fault. Though the system is aimed at mitigating errors and vulnerabilities, there is a degree of inherent randomness when using LLMs for code generation. We will provide stringent disclaimers and advice to users to rigorously test their code before deployment, and advocate for contract auditing.
Remarks:
As users of the Polkadot ecosystem, we hope to contribute to the growth of the userbase of these decentralized technologies. We are continuously inspired by the rapid development of Generative AI and believe there is ample opportunity for ink! to expand in this field.
We look forward to your feedback on how to bring this vision to reality!
Please feel free to reach out with any questions or suggestions:
Discord: @ cheffy.
Email: jeff.yu@parallelpolis.llc