Community Interest Check: LLMs from Scratch in Pure Julia

Those are huggingface’s terminology, and they are mostly regular MLP/Attention/Gelu. They prefer to define a new class for each component of a model instead of defining/using a compositional class. This is also the reason why we need to register a loader for each model in Julia. The class layout would affect the structure of the state_dict and thus we need to manually align the layout. The layers defined in Transformers.jl are designed to be as composable as possible to reduce the need to define new structs when registering a new loader.

There are a few things on my list of priorities. Two major parts I’m currently slowly working on are splitting out the workpiece/sentencepiece tokenizer into a separate package and GPU abstraction for attentions. Besides, I would also want to enhance HuggingFaceApi.jl to use huggingface’s dataset viewer API and use DuckDB.jl to load those processed datasets. Unfortunately, my current bandwidth is mostly allocated to surviving and job hunting, so you probably won’t be able to see them in the near future.

One package I would love to see and surely beyond my scope is a better data loader design/interface with distributed support. I only roughly scan through so this might not be precise, but it seems the data loaders we have are relatively naive compared to the distributed data loader in pytorch or huggingface.

6 Likes

I will be back, I want to do the fastest Llama2.jl gpu support I can do, I just couldn’t finish it ye.

I am curious how fast things would get if we fuse most kernel calls (I keep hearing it is already done, but I want to double test it).

2 Likes