I am currently tracking progress during an iterative process in two different domains - numerical solutions of nonlinear equations, and machine learning. In order to track the progress, I have been simply pushing to a DataFrame in the iterative loop.
I have however just come across JuliaML/ValueHistories.jl and JuliaLogging/TensorBoardLogger.jl. It looks like such a logging approach is specifically made for what I am doing, which makes me feel like I should look more into it.
However - plotting the DataFrame after training is real simple. There is a noteworthy difference in overhead from tracking progress:
julia> @btime begin
mvh = MVHistory()
for i in 1:100
push!(mvh, :squared, i, i^2)
push!(mvh, :cubed, i, i^3)
end
end
11.200 ΞΌs (193 allocations: 10.92 KiB)
julia> @btime begin
df = DataFrame(i=[], squared=[], cubed=[])
for i in 1:100
push!(df, [i i^2 i^3])
end
end
44.500 ΞΌs (309 allocations: 17.98 KiB)
julia> @btime begin
df = DataFrame(i=Int64[], squared=Int64[], cubed=Int64[])
for i in 1:100
push!(df, [i i^2 i^3])
end
end
34.500 ΞΌs (309 allocations: 18.08 KiB)
, but this time scale is so small that it is not a problem for my application. Setting up TensorBoardLogger.jl
, or extracting data from the types defined in ValueHistories.jl
, seems like a bit of extra work. So before I do something like that, I was wondering:
What are the potential benefits of logging, as opposed to pushing to a DataFrame?