I’m doing experienments with my developing package, and runing experienment with it, running training loop in Pluto.jl. So I often need to change some source code and re-run the code to see the effect. This package also used CUDA.jl.
The strange thing is that, sometimes after doing the source code editing, the training loop can be unreasonablely slow.
And all the re-run are slow like this from 14s to 40s in this session. Same code.
But same code should run around 2s at normal time. Even at cold start, it takes 9s total for precomile.
Normal run is like this:
It seems like, after doing source code editing, even just add a empty linebreak, it is spending a lot of time doing macro_expansion/notify/unlock, and also task_done_hook. Even after exclude these time, the running is slower than normal. And re-run is not helpful to make it fast. I also notice that the time prolongation is scaling with the loop.
Does anyone have a clue why is this happening?