We need a vs code plugin that could save ChatGPT questions, responses and user corrections

alex-s-gardner · May 8, 2023, 8:34pm

I know that posting ideas really is not all the helpful but I’m going to throw it out there anyways as I really lack the skills to lead something like this but I think it’s really important.

The largest impediment to the growth of Julia is the reluctance for users to (a) learn a new syntax, (b) translate their existing code, (c) leave behind a rich ecosystem. PyCall goes a long way to addresses (c) and partially (b). ChatGPT offers an incredibly powerful tool for transcribing code between languages and can greatly help with (a) and (b). LLM projects like Alpaca and Vicuna have shown the open source projects can quickly catch-up to mega-models like ChatGPT with high quality training data.

Imagine if every “python” to “julia” code translation by ChatGPT was saved with corresponding corrections.

If Julia users had a standardized, low-barrier, approach to query ChatGPT, save questions, responses, and user corrections (if any) to a public database with an open license then the community could start building a valuable training dataset for future model tuning that would benefit the entire community.

Maybe someone with more background in this has ideas of tools that might already exist or what would be need to create one of our own.

Henrique_Becker · May 8, 2023, 9:42pm

What is (d) ?

Is your idea a more dynamic/embedded Roseta Code (https://rosettacode.org/wiki/Rosetta_Code)?

alex-s-gardner · May 8, 2023, 11:18pm

Apologies d => b … now fixed

As for Rosetta Code, no, not at all. Eventually I believe that large language models (LLM) , like ChatGPT, will be heavily used as assistants when programming, especially for those that are new to a language or that need to convert code between languages. New languages that are not supported by a LLM assistant because of a lack of training data will have a much higher barrier to adoption than those that do… Because of this I suspect that if Julia was introduced now it would have a much larger hurdle to overcome than it already has.

To stay competitive languages need to nurture the development of LLM assistants that are proficient in that language. To do this a language needs loads of training data, with one of the most valuable datasets being questions, responses, and user corrections generated from state of the art LLMs like ChatGPT. ChatGPT can already take simple python code and very impressively translate it directly into julia… but it still makes mistakes… and could be greatly improved with human informed training data.

tbeason · May 9, 2023, 12:52am

Tabnine is a company doing this that i saw someone post about. They do many languages but not Julia yet.

alex-s-gardner · May 9, 2023, 12:57am

Ya, I’ve seen that. The key is that the models will only be as good as the training data that they have access to and the Julia community could start collecting that now so that it’s ready when someone takes on the challenge of tuning a LLM for Julia… obviously docs and discord are a huge help… but you can add a ton more skill by adding questions, response, correction to the training dataset.

Topic		Replies	Views
Are there efforts to improve ChatGPT for Julia code? Tooling	26	3958	July 24, 2023
Julia ranking fairly good in code generating using ChatGPT Community	3	1368	November 19, 2023
Could a Julia fine tuned version of Llama 2 code be created General Usage question	14	1512	September 10, 2023
An LLM fine-tuned for Julia, call for comments + help Tooling llm , generative-ai	31	3366	July 16, 2025
Fine-tuning an LLM for Julia, updates Tooling generative-ai	1	720	December 31, 2024

We need a vs code plugin that could save ChatGPT questions, responses and user corrections

Related topics