DearDiary.jl: A lightweight but powerful machine learning experiment tracking tool for Julia

After months of planning, and some weeks of development, the package is finally here and ready for use! As a solution for tracking machine learning experiments in Julia, DearDiary.jl aims to be lightweight, easy to use, and flexible enough to adapt to different workflows.

Motivation

After the unpleasant experience trying to maintain an interface for the REST API from Python’s MLFlow (MLFlowClient.jl, MLJFlow.jl), after finding out that it is poorly documented, incomplete, and has some abandoned or partially implemented features (and they are still adding new ones…); an idea came to my mind: why not write the same API but well designed and documented but in Julia? This package is that idea.

Core concepts

Architecture-first

Unlike the common monolithic architecture found in many Julia packages, my goal was to implement something that can be easily maintained and extended over time, focusing on developer experience and code readability (inspired by Alan Edelman’s TED talk and MLJ.jl “micro-package” architecture).

DearDiary.jl consists of an N-layered architecture, which bring us the possibility of encapsulating different functionalities to allow better collaboration and separation of concerns.
Now it is composed of the following layers:

  • Repository layer: responsible for data storage and retrieval.
  • Service layer: handles package logic and data processing.
  • Route layer: manages RESTful API endpoints and HTTP requests.

And having the idea of implementing a frontend layer in the future.

Simple types

One of the problems I found while working on the integration project was the overuse of complex types. Imagine a type that has a field that is another type with field that is another type that has a field with an integer. Well, that’s real and you can find it if you are curious enough.
DearDiary.jl tries to avoid that by keeping types simple and flat, totally immutable, and clear as possible. Never search for complexity when you don’t need it.

Flexible by design

DearDiary.jl is flexible enough to adapt to different workflows. You can use it as a standalone package, or integrate it with other tools in your ML pipeline, or call it from the “outside world” via its RESTful API.
In the case something is not implemented in the way you want it, you can always modify or extend it, thanks to its modularity.

Portability

DearDiary.jl is designed to be portable. You can run it locally, on a server, or in the cloud. Thanks to SQLite as the default storage backend, you can easily move your projects between different environments without worrying about compatibility issues.

Note: one of the main goals for next releases is to support more storage backends, coming from SQL and NoSQL databases, or cloud storage solutions.

Getting started

A Tutorial is available in the documentation. It covers installation, and a workflow example with MLJ.jl.

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository. Pull requests are also encouraged. Please make sure to follow the existing code style and include tests for any new features.

8 Likes

Hi Jose, this looks really interesting, thanks for working on it!

A Julia-native experiment tracking solution would be nice, however it’s also a big project. As you’ve already put quite a bit of work into MLFlowClient.jl and MLJFlow.jl, I’d be interested in hearing more about what your problems with that solution were. Could you expand a bit on what you said here:

after finding out that it is poorly documented, incomplete, and has some abandoned or partially implemented features (and they are still adding new ones…)

Which features are incomplete or abandoned, are these more niche or do you think MLFlow has problems at its core?

And having the idea of implementing a frontend layer in the future.

Is a frontend not so important for your own tracking workflows? It’s probably a lot of work to replicate what MLFlow has, user interfaces often take more time in my experience than core backend logic.

2 Likes

Hi @jules, thanks for your interest in the details behind the implementation.

Which features are incomplete or abandoned, are these more niche or do you think MLFlow has problems at its core?

Artifact tracking is something that is completely abandoned by the MLFlow team, and it’s a problem coming from its core design. There are no dedicated API endpoints for artifact uploading, and the existent routes related to that are buggy and hidden from the end-user. I consider that the main reason of this is a poor designed and a non-flexible solution to handle different backends correctly like they are promoting.

The other issue is their constantly changing data types. The explanation is coming from the new project sight that is mostly focused on LLM/GenAI usage, and I could confirm that this is the reason for abandoning the “core” functionality (you just need to look at their Issues page).

Is a frontend not so important for your own tracking workflows?

It’s important and being considered as the next immediate step after touching v1.0.0. I don’t have too much frontend development experience, so first I must find someone to help me with that.

HI Jose, this is a really exciting project. Also for Julia as a language, moving it outside its comfort zone of traditional scientific computing.

I share the frustration with using MLflow - logging artifacts is really useful. I tried looking into adding this to MLFlowClilent.jl at one point and gave up after I couldn’t find proper documentation.
For me as a user this is one of the reasons for picking python over Julia in this domain.

I’m wondering what your priorities for this project are. Are you prioritizing feature parity of DearDiary and MLflow first for a few select workflows, or making it easily adaptable to a broad range of workflow like traditional ML and deep learning?

Congratulations @pebeto on this exciting new project! Look forward to closer integration with MLJ.I think our previous work there on MLflow support should make that relatively easy.

I too have built a thin Julia api to push logs into mlflow via REST and it is really not a nice experience. I will try out your new endeavor. Great to see some momentum here in Julia with ML logging.