A few additions to this today:
- Life is too short to spend on tokenizers, so we switched to using a julia wrapper of a python wrapper of Hugging Face’s rust tokenizers library (hat tip @AntonOresten for this!).
- Added low-rank (LoRA) finetuning.
- Added new samplers, including classics top-p & top-k, but also the more recent https://arxiv.org/pdf/2407.01082 and https://arxiv.org/pdf/2411.07641
- In response to a Bluesky conversation about structured sampling we added an example of a sampler that lets you restrict the logits to match a custom template.
Switching tokenizers unlocked some fun small open models, like the SmolLM2 series (more open than Llama3.2, which is behind a permissions wall, so this might reduce a barrier to getting started). With the LoRA addition, this is at a fairly decent point for someone wanting to tinker with LLMs. Cooking up new samplers is a fun sport (evaluating them is trickier), and you can finetune a 1.7 billion parameter model just on your CPU (see our example where we make one much stupider).