An LLM fine-tuned for Julia, call for comments + help

cpfiffer · April 27, 2024, 1:31am

Great, here’s the PR to format all the code. This should give us a ~4x sample size, though for some of the formatters the differences are going to be quite small. I also don’t filter out unique data which maybe we want to do here.

An additional thing we could do on the formatting side is to make the permutations of all the arguments to format_text:

format_text(
    text::AbstractString;
    style::AbstractStyle = DefaultStyle(),
    indent::Int = 4,
    margin::Int = 92,
    always_for_in::Union{Bool,Nothing} = false,
    for_in_replacement::String = "in",
    whitespace_typedefs::Bool = false,
    whitespace_ops_in_indices::Bool = false,
    remove_extra_newlines::Bool = false,
    import_to_using::Bool = false,
    pipe_to_function_call::Bool = false,
    short_to_long_function_def::Bool = false,
    long_to_short_function_def::Bool = false,
    always_use_return::Bool = false,
    whitespace_in_kwargs::Bool = true,
    annotate_untyped_fields_with_any::Bool = true,
    format_docstrings::Bool = false,
    align_struct_field::Bool = false,
    align_conditional::Bool = false,
    align_assignment::Bool = false,
    align_pair_arrow::Bool = false,
    conditional_to_if = false,
    normalize_line_endings = "auto",
    align_matrix::Bool = false,
    trailing_comma::Bool = false,
    trailing_zero::Bool = true,
    indent_submodule::Bool = false,
    separate_kwargs_with_semicolon::Bool = false,
    surround_whereop_typeparameters::Bool = true,
    variable_call_indent::Vector{String} = []
    short_circuit_to_if::Bool = false,
)::String

though I am not sure what people think about that.

cpfiffer · April 27, 2024, 1:32am

Regarding fine tuning, it would be interesting to see if we could construct “backward prompts”, i.e ask a high-quality language model to give us questions to generate a particular script or part of a script. Anyone have opinions there too?

ludgerpaehler · April 27, 2024, 5:53am

Hey everyone,

Just picking this up from having seen @cpfiffer post on Twitter. While not being useful for any alignment training training with RLHF/DPO/KTO yet, we have already released a pretraining-scale dataset which includes Julia as LLVM IR here are the paper, and the HuggingFace dataset:

You can pull just the subsection of it if you’d want to use it to seed efforts.

We are also going to release a version of Source Code → IR relatively soonish.

svilupp · April 27, 2024, 6:46am

Yeah, that’s a common practice in creating evals. You can mimic the functionality in PromptingTools for that.

function: Reference for RAGTools | PromptingTools.jl

template: PromptingTools.jl

There is a blog on Forem on how to use it for RAG. The tweaking to get it to work on source code etc would be quite minimal.

btw a quick tip - JSON doesn’t have multiline strings even when pretty-formatted, so it’s painful to read and edit.
BUT if you use VSCode (Cursor, …), you can install a few extensions to make it a breeze - “multiline string editor” (Multiline String Editor - Visual Studio Marketplace) and “JSON multiline viewer”(JSON multiline viewer - Visual Studio Marketplace).

photor · April 29, 2024, 3:49am

Awesome!

svilupp · April 29, 2024, 8:57am

Btw if people are lazy auto-saving their conversations in REPL, try using this GUI for LLM questions. It saves all conversations by default when you click “New chat”.

Disclaimer: I’m the author of the tool, but I do believe it can help since it requires no extra effort.

In the next few days or so, I’ll also publish a simple observability platform for LLM conversations (saved in JSON), so we can quickly review/filter/curate the saved conversations.

Palli · April 29, 2024, 4:54pm

That’s a very intriguing idea and paper. I’ve so far only scanned it, and it seems to me having register numbers in IR or assembly can obscure the meaning. I’m guessing LLVM IR was chosen since LLVM is a common backend.

Should @code_lowered, @code_typed down to @code_native be run on the Julia training code, and associated with it? Which, likely best, if not all? It seems problematic that Julia is generic, and there’s not just one set of types to run on, and you get different answers with different sets for each of those. I guess any one (or more) valid set could be ok.

I was also thinking of another language. I though stack-based Forth might be better (no register numbers), I’m not up-to-speed on much LLVM use for it, might not have lots of Forth code as training data, or users, since very old and people may not know of or care… but if you could compile Julia (or other language) to it then it might be good, and help the LLM see meaning in the (Julia) code.

I looked up what’s already being done with Forth, and I at least find some interesting papers:

A Neural Forth Abstract Machine

NEURAL PROGRAMMER-INTERPRETERS
https://arxiv.org/pdf/1511.06279

mnemnion · April 30, 2024, 2:48pm

I don’t know if or how the following observation is actionable in this project, but: we don’t want an LLM emitting code which assumes arr[1] is correct, it should be arr[begin] and so on, zero(T) instead of 0.

The proper generic accessors are probably underutilized in a general Julia data set, it would be good to give code which does use them correctly some prominence.

cpfiffer · April 30, 2024, 3:29pm

Ah this is a good point, we would also want to add stuff that the linter supports, such as using for i in axes(thing). Anyone written anything that fixes these little things?

svilupp · May 20, 2024, 6:01am

FYI.

From PromptingTools v0.26 onward you can achieve the auto-saving of your conversations by simply running this one line:

PT.register_model!(; name= "gpt-3.5-turbo", schema=PT.OpenAISchema() |> PT.TracerSchema |> PT.SaverSchema)

Apply for any model you use. It automatically logs necessary information (API kwargs, prompt templates and its versions, etc.) and then saves it to a disk (you can use LOG_DIR env or provide your own path). For docs see ?SaverSchema, ?TracerSchema

cpfiffer · May 20, 2024, 2:03pm

Oh incredible! That is so cool. Love the wrapping approach.

Kirby_Zhang · July 16, 2025, 5:32pm

It really should be metrics first. For the huge amount of effort to fine tune models that will be much smaller when deployed without cloud infrastructure, you have to be measuring something to motivate users.

Topic		Replies	Views
LLM AI just for Julia? A proposal: Julia plus science LLM? General Usage machine-learning	4	1652	June 24, 2023
Could a Julia fine tuned version of Llama 2 code be created General Usage question	14	1576	September 10, 2023
Fine-tuning an LLM for Julia, updates Tooling generative-ai	1	744	December 31, 2024
Are there efforts to improve ChatGPT for Julia code? Tooling	26	4044	July 24, 2023
[ANN] Julia LLM Leaderboard - Help us make it more relevant for every day problems! Package Announcements announcement , generative-ai , prompting	22	3630	April 5, 2024

An LLM fine-tuned for Julia, call for comments + help

Related topics