Embedding Julia through C conversion using LLVM IR and neural network transforms to convert Julia to embedable C

I would love to be able to use Julia to design embedable control systems and signal processing architectures. Since Julia can’t cross compile to embedable DSPs or SoCs and may not have the stability that critical systems may need, perhaps it’s better to convert the Julia to clang or c or something else.

I was wondering if some combination of LLM transformer could be trained by a neural network that generatively learned how to convert Julia syntax to clang syntax by using the llvm IR as a reference of goodness.

I asked ChatGPT about this and it seemed hopeful. What do you think? Here is the response.

Yes, it’s theoretically possible to train a neural network to convert code from one LLVM-compatible language to another by going through the LLVM Intermediate Representation (IR). Here’s a basic outline of how this could work:

  1. Understanding LLVM IR: LLVM IR serves as a middle ground, allowing different languages to be compiled into a common format before being further compiled into machine code. It’s designed to be a sort of universal assembly language, which makes it a good intermediary.
  2. Training Data: To train such a neural network, you would need a large dataset of code examples in both source languages (e.g., Julia and C) that compile to the same or very similar LLVM IR. This dataset would be used to teach the neural network the patterns of how different constructs in one language translate into constructs in the other language through LLVM IR.
  3. Model Architecture: The model architecture could take several forms, but sequence-to-sequence models or transformers, which have been successful in natural language processing tasks, might be particularly well-suited. These architectures can handle the complexity of code translation by considering the entire sequence of code when making translations.
  4. Training Process: During training, the neural network would learn to predict the target language code given source language code, optimizing for translations that compile down to the same or functionally equivalent LLVM IR. This process involves a lot of trial and error, tweaking the model architecture, and hyperparameters to improve performance.
  5. Challenges: One of the main challenges is ensuring that the translated code not only compiles to the same LLVM IR but also preserves the semantic intent of the original code. Another challenge is handling the vast diversity of code patterns and libraries across languages.

While the idea is theoretically feasible and very intriguing, it’s a complex challenge that would require a significant amount of specialized data and computational resources. Additionally, the practical applications would be fascinating, potentially streamlining cross-language development and integration tasks.

:sob:

10 Likes

I’m not at all hopeful this will work, with a neural network, but not you can do this already differently. Or at least you have a starting point, and some work ahead of you…

Julia requires an OS (for I/O and threading etc. and you do not need all of that support), and only supports some, and (some) 32- or 64-bit CPUs.

So this is all unofficial, without an OS and 16-bit:
https://seelengrab.github.io/articles/Running%20Julia%20baremetal%20on%20an%20Arduino/

The LLVM is to target different architectures, so you “just” need to ask it for that, ask for some hardware it supports, such as ARM (some DSP functionality), but it will not help you get rid of the OS dependency on its own, or get rid of libuv, or disable the GC or actually get it to work there.

You can look into StaticCompiler.jl as a starting point, it has many limitations, still requires an OS. One of is it doesn’t support GC (or threads).

All mainstream LLMs use a transformer model (the T in GPT). When I understood (at a high-level) what it meant the the name made sense. It comes from “sequence-to-sequence” transformer, as in to translate a sequence of words, in e.g. English to German. But NOT meaning next word/token prediction.

The universal approximation theory states that a neural network can approximate any function of certain kind (in a region), to any desired accuracy. The key is what functions, and approximate, because for compiler technology you want something exact. A function is mapping, and in math, it’s not limited, i.e. taking a sequence of anything English or LLVM, and outputting any other sequence is a mapping… But such a function has not been proven to be possible, that I know of.

Still the transformers are very neat, and seem to come very close, but you want 100% correct, and no sequence length-restriction.

I understand them to transform an N-length sequence to an M-length sequence, and I don’t know that N and M need to be the same length (or different). But when you have code you have it of arbitrary length, not N, or N max. And the output is neither of length M.

The development with transformers, or LLMs, such as [Chat]GPT, was that you DO NO longer get M-length sequence out. I.e. not M words or token, as you might think. I believe you get M possibilities out (likelihood of each possible token).

The neural networks would, ALWAYS, yes, give you the same answer for the same input (prompt, that could be LLVM code), assuming the temperature is 0. It’s usually set higher (but it’s tunable) to NOT get the same answer always, since otherwise would be boring, also it gets a bit more creative that way, and it’s also not set too high then it gets too high, too “creative”, or “crazy” (not my terminology, or a technical definition… but dare I say schizophrenic, I find the analogy with the brain interesting).

The work on language translation started out for “machine translation” of human language (text, not then speech yet), with the original N-to-M (I believe) transformer. It was a limitation that N and M were fixed I believe, why M outputs of possibilities were rather used, and then one (token thereof) chosen. That I believe is called autoregressive, since you then choose one iteratively, and then ask for more and more “one” tokens.

It’s not clear to me exactly what controls the length of a response now. I believe you can tune it, i.e. ask for so many words or tokens (paragraphs?), but you wouldn’t want the system to arbitrarily cut you off, e.g. mid-sentence. Maybe, very possibly the model has a stop token as one of the options?

Note, you still have N inputs, you do not feed in one word at a time, nor the full prompt only, but in addition the system prompt (usually hidden from you), and I believe you slide over that conversation, i.e. the model sees an N-length prefix of it, and eventually older parts in the conversation drop off (and maybe also the system prompt?) You can ask a model to summarize your conversation, and then if it’s long you might be able to see that happening. I understand some models, e.g. Mixtral (or the original Mistral?) get bad after a long conversation, and I think this is the reason, it forgets the past. Maybe it’s made sure the system prompt always stays in, or maybe not always, and that’s the reason?

It was I believe Yann LeCun if I recall that said that the probability of correctness from LLMs goes to 0 with longer lengths. The argument is let’s say the probability is a coin-toss, then it’s 1/2, 1/4, 1/8 etc. for three tokens, but even if the probability is 99.999% each time the same argument holds. We feel like we get very good answers sometimes (also from humans!), so the probability is very high for sure but never 100%. You could say there’s never the perfect sentence, or at least no other alternatives, with human language, and code you can phrase things differently so that helps. Even compiling from same (C++ or) Julia code, to assembly isn’t a 1-to-1, there are many correct output, e.g. optimized or not, but also many for either such option. But correct code is still a very exacting high standard so I’m not hopeful, without changes to LLMs/transformers.

Hope this helps, I would like to know if I’m explaining clearly, seemingly, or if you know transformers well, to confirm my understanding.

It is possible to run Julia on small computers

https://seelengrab.github.io/articles/Julia%20Abstractions%20for%20Arduino%20registers/

2 Likes

Pretty impressive things (even though declared as “not suitable for mission- or safety-critical applications”) can be already done with Julia in it’s current state - see e.g. ReactiveToolkit.jl and esp. video on their docs page.

You may also search for packages with “robotics” keyword.