I know that posting ideas really is not all the helpful but I’m going to throw it out there anyways as I really lack the skills to lead something like this but I think it’s really important.
The largest impediment to the growth of Julia is the reluctance for users to (a) learn a new syntax, (b) translate their existing code, (c) leave behind a rich ecosystem. PyCall goes a long way to addresses (c) and partially (b). ChatGPT offers an incredibly powerful tool for transcribing code between languages and can greatly help with (a) and (b). LLM projects like Alpaca and Vicuna have shown the open source projects can quickly catch-up to mega-models like ChatGPT with high quality training data.
Imagine if every “python” to “julia” code translation by ChatGPT was saved with corresponding corrections.
If Julia users had a standardized, low-barrier, approach to query ChatGPT, save questions, responses, and user corrections (if any) to a public database with an open license then the community could start building a valuable training dataset for future model tuning that would benefit the entire community.
Maybe someone with more background in this has ideas of tools that might already exist or what would be need to create one of our own.
As for Rosetta Code, no, not at all. Eventually I believe that large language models (LLM) , like ChatGPT, will be heavily used as assistants when programming, especially for those that are new to a language or that need to convert code between languages. New languages that are not supported by a LLM assistant because of a lack of training data will have a much higher barrier to adoption than those that do… Because of this I suspect that if Julia was introduced now it would have a much larger hurdle to overcome than it already has.
To stay competitive languages need to nurture the development of LLM assistants that are proficient in that language. To do this a language needs loads of training data, with one of the most valuable datasets being questions, responses, and user corrections generated from state of the art LLMs like ChatGPT. ChatGPT can already take simple python code and very impressively translate it directly into julia… but it still makes mistakes… and could be greatly improved with human informed training data.
Ya, I’ve seen that. The key is that the models will only be as good as the training data that they have access to and the Julia community could start collecting that now so that it’s ready when someone takes on the challenge of tuning a LLM for Julia… obviously docs and discord are a huge help… but you can add a ton more skill by adding questions, response, correction to the training dataset.