Could a Julia fine tuned version of Llama 2 code be created

As I’m the only person working in my team in Julia I find AI code assistance helpful (copilot, chat:copilot, Llama 2 code), effectively like talking to a colleague.

Although, I find chat:copilots results a mixed bag and often require re-factoring/irrelevant. I asked it “namedtuple to dict” and got this which doesn’t run or help

from collections import namedtuple

# Define a namedtuple
Person = namedtuple('Person', ['name', 'age', 'gender'])

# Create an instance of the namedtuple
p = Person(name='Alice', age=30, gender='female')

# Convert the namedtuple to a dict
d = p._asdict()

print(d)  # Output: {'name': 'Alice', 'age': 30, 'gender': 'female'}

Julia’s increase in popularity will require an increase in solo julia users within teams, it is from here that the word of it might spread.

With Code Llama there are general models but also Python specific models

Because Python is the most benchmarked language for code generation, and because Python and PyTorch play an important role in the AI community – we believe a specialized model provides additional utility.

However, I have to wonder whether python’s ubiquity makes a specific model superfluous as it will dominate the base code model. I’d suggest a model that doesn’t confuse Julia and Python would be particularly valuable in getting new Julia users in teams that don’t currently use it.

How could we go about having a Julia fine-tuned LLM? I’m soon going to be doing some fine-tuning of Llama for my work so would be open to contributing to this if it ever becomes a thing. I expect the lack of high-quality existing code could be a limitation.

I believe having this model integrated into the VS code extension would be a powerful combination.

1 Like

Is there any information out there about how much data you need for a successful (by some reasonable measure) fine-tuning? I’m wondering how much modern (post-1.0) Julia code is out there available in public, and whether that’s enough to create a good enough model.

Outside the IDE coding assistant context (which seems to be the focus of the question), there is an “Ask AI” feature on the JuliaHub website that’s supposed to be ChatGPT with a Julia focus - likely using “system prompts” since ChatGPT doesn’t allow actual fine-tuning. It’s ok, better than the base ChatGPT, but I generally find that Phind is significantly better.

Actually, looking at Phind’s About page just now, they claim to have a VS Code extension too, so perhaps you could give that a try too.

1 Like

I’m not sure, but this is also my main concern. I hoped someone here may have some insight.

I wasn’t aware of the JuliaHub option, I’ll give it a go over the coming weeks.

I also wasn’t aware of Phind’s solution, I’ll try this out too.

It seems like the trend is for more complex models which can use awareness rather than fine-tuning.

In my opinion, I suspect that fine-tuning will give you better results in the long run, by effectively overwriting the model’s bias to express itself in terms of Python and languages other than Julia. Prompt engineering can only get you so far, and doesn’t let you efficiently provide examples of the results that you really want (which may be domain-specific or not part of the public dataset that models like ChatGPT are trained on).

All this to say, I’d be interested in working on something like this, and putting together a workflow and code for gathering code samples from Julia-specific sources (GitHub, Slack/Zulip, Discourse, etc.) so that the fine-tuning can be made reproducible.

I’ve unfortunately not been blessed by Meta with the Llama2 weights (for some reason, they never contacted me back by email with the download link), but if you happen to have those weights available, we could look into setting this up with the PyTorch-based training code to start with (with the hope that we’ll eventually be able to do the training in Julia).


I am interested in optimization of prompt and currently playing with llama. I have good experience with Transformers.jl. I am taking gradients with respect to input and it works fine. So fine-tuning can be done within julia. Would be a nice project to test different things, like Dagger. I want to try model paralelism with Dagger, but I have now a bigger problems and I have to put it to a side track.


TheBloke on hugging face posts quantized versions of Llama 2 model which anyone can access, e.g. Llama2 7B chat.

1 Like

I’d be happy to help with this! @dhairyagandhi96 and I have gotten some great changes together for DaggerFlux.jl, and have some ongoing work for adding Dagger+MPI-based DDP support for arbitrary models (currently tested to work with ResNet, but should be applicable to any Flux model), which we’ll also extend to Distributed.

Awesome, thank you! These are working great with Llama2.jl (which supports the q4_K_S format).

Speaking of which, what do we want to do about the divergence between LanguageModels.jl, Llama2.jl, and any Transformers.jl-based implementation? I personally think it’s important to combine the features from each implementation (REPL mode and clean implementation of LanguageModels, quantized weights and training support of Llama2, and Flux-compatibility of Transformers) into a single implementation.

I’m sure @chengchingwen @cafaxo @jiahao would have some input on this suggestion!

1 Like

maybe forces could be joined here? JuliaHub: Ask AI with ChatGPT

1 Like

TLDR: Docs have answers but answers are not always super easy to find, we could start by implementing a docs summariser.

Clearly finding as much high-quality code is going to be pivotal. But I wonder how much can be done by simply including docs.

I have a date dt = DateTime(2022, 1, 1) but I want it to be the last day of the month, for python there is a SO which I find almost immediately and can copy and paste as we all do. But for Julia this is often not the case, in fact the top result I get are Dates docs - all 8000 words of docs. So in laziness I ask Github Copilot and it states last_day = Dates.lastday(dt) which isn’t a real function. So now I go back to the dates docs (which are wonderfully thorough :slight_smile: ) and have to start thinking of good find in page searches, cycle through these until I find it

Finding the answer for Julia took 30x longer than it took for Python.

When you’re new to a language or working in new domains this is a common scenario in my experience.

Incidentally, I wonder if the docs had more pages rather than headings if this would help search engines to send you to the relevant section, rather than 8k words on Dates

Hopefully, this little anecdote can help motivate the idea.

This appears to be closed-source and based on the ChatGPT API, which is not what we want here :slight_smile:

1 Like

Yeah, this is probably a good first use case to test with a trained Llama model, as it’ll definitely be what a large majority of users will want. I suspect just training the model on scraped package documentation, together with some hand-written examples of question-answer based on those same docs, would probably get the model pretty far (since the model should internally already understand things about Julia, we just need to bias it to think more in Julia terms than in Python terms).

Transformers.jl serves as a general framework for working with transformer models. It’s important that the model is first order differentiable on gpu, while I don’t pay much attention on optimizing inference. I guess there are some optimization in LMs.jl/Llama2.jl specific for inference and is not compatible with AD, but I haven’t check that. Otherwise we could try merging some of the functionalities into Transformers.jl.

1 Like

Just as a tangential tip, both and the JuliaHub AskAI provided the right answer when I asked “given a Date or DateTime value in Julia, how can I get the end of the month from that?”. I prefer Phind because it provides some sources for its answers, so it’s easier to confirm the answer or figure it out when it gets it wrong.

But as I said, this is tangential and meant as a suggestion for which AI tools currently work (okayishly) well for Julia. They’re both limited by the fact that ChatGPT provides only limited possibilities for data customization, and so get things wrong a significant amount of the time, especially when it comes to questions about packages in the ecosystem. So a custom tuned Llama2 or similar could potentially do much better.

1 Like

Looking into this further, creating the dataset will be the real challenge. While we have lots of thorough documentation, this must be meaningfully encoded which I’m not currently sure how to do. I wonder if exporting some of the QAs from this discourse would be possible.

I suppose an initial first port of call would be to utilise these methods to create a dataset for fine-tuning an LLM, we could add an Julia inference layer too, ensuring the responses are valid.