[ANN] ChoosyDataLoggers.jl: choosy practices for logging data in numerical experiments

Hello! I think many people may find a use in the package ChoosyDataLoggers.jl, especially those running Reinforcement Learning or Machine Learning experiments.

The ChoosyDataLoggers package is used to log various groups and variables in a large code base. The data can be sunk into an array sink which will populate a dictionary based on the group and variable being logged. This feature is quite simple and powerful, and revolves around the @data macro. When choosing what to log, you may also create pre-processors in your user code which can be chosen dynamically at run time.

Often code bases can get quite complicated. If you want to figure out what you can log w/o going through the entire code base ChoosyDataLoggers has support for automatic registration of the uses of @data which can be handy when interacting with an experiment a partner has written.

I use this package now quite frequently, and especially when used in conjunction with Reproduce.jl.

PRs, issues, and comments welcome!

2 Likes

This looks very interesting - is ChoosyDataLoggers thread-safe? And does it work across (remote) processes/workers?

Currently it isn’t explicitly thread safe or multi-proc safe. It can be used in an experiment framework which sends jobs to various self-contained processes (i.e. each which constructs its own logger internally), but if you are wanting to log across process and thread you will likely have trouble (you can look at GitHub - mkschleg/ActionRNNs.jl for how I use it).

This would all come down to the ArrayLogger at the root of the data loggers. There is likely quite a bit of work to deal with merging the data logged from other threads as well, but likely this would only change how the data logger itself works and all the macro code should work.

If you have a specific use case in mind I would be happy to take a look or look through a PR! Or if you have any ideas on how to do this from an interface point of view that would be great (doing the actual code should be straightforward though)/

I’m not sure how to tackle this myself. The logging system does transport across distributed worker boundaries though, and sends everything to process 1, right?

I’m pretty sure the logging system is contained in each process if we are doing distributed compute. For threads, I’m not exactly sure how it works :thinking: .