Memory allocation of keyword arguments


#1

Hello,

I was looking into keyword function arguments. I tried using @time to check the memory allocation for the function, and found that one single keyword argument of type Bool adds half an MB of memory. Is that expected?

function read_pcm(infile, T; little_endian=true)
    y = Vector{T}(undef, filesize(infile) ÷ sizeof(T))
    read!(infile, y)
    if little_endian == true
        y .= bswap.(y)
    end
    return y
end

The argument in question is the last one “little_endian”. The memory numbers for this argument being keyword vs. non-keyword respectively are:

read_pcm:
  0.087870 seconds (14.93 k allocations: 1.445 MiB)

read_pcm:
  0.035225 seconds (5.24 k allocations: 978.750 KiB)

The number of allocations has tripled and the total memory has increased by roughly 50%. Just a data point that I have profiled the function multiple times to rule out the first run which has the higher numbers due to compilation.

Can someone help to clarify why this could be the case?

Thank you!


#2

It’s not expected. I’m guessing a benchmarking artifact. Could you use @btime (from BenchmarkTools) and/or post a minimal fully reproducible example?


#3

@bennedich

I tried it with btime from BenchmarkTools and the results are now exactly the same. The full code I’m running in Juno is as shown below:

function read_pcm(infile, T; little_endian=true)
    # f = open(infile)
    y = Vector{T}(undef, filesize(infile) ÷ sizeof(T))
    read!(infile, y)
    if little_endian == true
        y .= bswap.(y)
    end
    return y
end

function read_pcm2(infile, T, little_endian)
    # f = open(infile)
    y = Vector{T}(undef, filesize(infile) ÷ sizeof(T))
    read!(infile, y)
    if little_endian == true
        y .= bswap.(y)
    end
    return y
end

using BenchmarkTools

println("read_pcm:")
@btime y=read_pcm("test_speech.pcm",Int16, little_endian=true)

println("read_pcm2:")
@btime z=read_pcm2("test_speech.pcm",Int16, true)

The output of btime is as shown below:

read_pcm:
  603.077 μs (11 allocations: 718.61 KiB)
read_pcm2:
  584.206 μs (11 allocations: 718.61 KiB)

This makes more sense in terms of memory allocation, but I’m baffled as to the difference in the number of allocations - they are different by two orders of magnitude! And the time is also much lower. Is there something fundamentally wrong with @time?

(I noticed btime takes much longer to get the results than time)

Thanks,
Srikar


#4

There’s nothing fundamentally wrong with @time: it does exactly what it says, which is measuring the time and memory allocation of a block of code. But execution of that block of code might include (a) compilation time, (b) time spent looking up global variables and dispatching based on their types, (c) noise due to other processes also running on your computer, etc. BenchmarkTools provides a more precise estimate by running your code multiple times (that’s why it takes longer and also why it provides more accuracy) and providing a mechanism for interpolating variables into the expression (to avoid spending time looking up global variables). So, @time is a perfectly valid measurement of how much time a particular operation took, but @btime is usually a better estimate of how “fast” a piece of code is (that is, how much time you would expect that code to consume, on average, in a particular context).


#5

I’m guessing that you did this by running include on the file multiple times? That triggers at least some of the compilation/setup cost to happen again. The proper way to run the function multiple times is to do it from within the same script, or from the REPL:

julia> function read_pcm(infile, T; little_endian=true)
           ...

julia> function read_pcm2(infile, T, little_endian)
           ...

julia> @time read_pcm("test_speech.pcm",Int16, little_endian=true);
  0.161487 seconds (407.01 k allocations: 22.494 MiB, 4.19% gc time)

julia> @time read_pcm("test_speech.pcm",Int16, little_endian=true);
  0.000723 seconds (31 allocations: 719.672 KiB)

julia> @time read_pcm2("test_speech.pcm",Int16, true);
  0.010079 seconds (5.26 k allocations: 981.810 KiB)

julia> @time read_pcm2("test_speech.pcm",Int16, true);
  0.000638 seconds (22 allocations: 719.266 KiB)

See how the second runs are very similar to each other. However, compare with @btime:

julia> @btime read_pcm("test_speech.pcm",Int16, little_endian=true);
  218.818 μs (18 allocations: 719.11 KiB)

julia> @btime read_pcm2("test_speech.pcm",Int16, true);
  227.400 μs (18 allocations: 719.11 KiB)

The memory allocated is about the same, but the execution time is 3x faster with @btime. To dig a bit deeper into what’s going on, we can use @benchmark:

julia> @benchmark read_pcm("test_speech.pcm",Int16, little_endian=true)
BenchmarkTools.Trial:
  memory estimate:  719.11 KiB
  allocs estimate:  18
  --------------
  minimum time:     218.646 μs (0.00% GC)
  median time:      497.964 μs (0.00% GC)
  mean time:        466.615 μs (10.13% GC)
  maximum time:     51.317 ms (98.54% GC)
  --------------
  samples:          10000
  evals/sample:     1

Here we can see that read_pcm is called and timed 10 000 times; the fastest execution time was ~218 μs, the median was ~497 μs, and the maximum was ~51317 μs (a garbage collection occurred during this run). The @btime macro returns the fastest execution time of this benchmark.

As for what is most accurate, it depends on your application and what you’re interested in timing. The minimum time is often interesting as a theoretical ideal time not affected by noise. But it’s not a realistic measure if in practice you only run the function once or a few times. In particular if you’re doing file IO, like in this example. By running the benchmark many times, the file will be cached, and reading it will be very efficient.

The median might be more representative of an expected run time, but again, if you’re doing file IO, your file will be cached and provided unrealistically fast.

If you’re interested in non-cached run-time, you could run read_pcm for some file first to trigger compilation, and then run it again on a non-cached file, using the simple @time macro. Or even better, run it for a large number of files. Note that you’ll mostly time the performance of your disk though.

Anyway, back to your original question, the difference you saw for keyword arguments had to do with compilation/setup cost. Unless your script is very short-lived and will be re-run (and hence re-compiled) repeatedly, I doubt that it matters in practice.


#6

No, I executed the script itself multiple times. I started by clearing the Julia REPL inside Juno (CTRL+J, CTRL+K). I could not find any other way to clear the workspace and cache in Julia.

Do you mean that the input file I provide to the function gets cached?? Or that the data read from the files and other workspace entries gets cached?

I agree that the file I/O is going to be an infrequent operation, mostly a one-time thing. But my original question started because I was trying to compare two different approaches to read PCM files, and then I noticed that one of the suggestions included a semicolon in the function calls - that’s when I learnt about keyword arguments. So just out of curiosity, I tried to run the @time macro to check that difference as well.

Thanks,
Srikar