I’m guessing that you did this by running include
on the file multiple times? That triggers at least some of the compilation/setup cost to happen again. The proper way to run the function multiple times is to do it from within the same script, or from the REPL:
julia> function read_pcm(infile, T; little_endian=true)
...
julia> function read_pcm2(infile, T, little_endian)
...
julia> @time read_pcm("test_speech.pcm",Int16, little_endian=true);
0.161487 seconds (407.01 k allocations: 22.494 MiB, 4.19% gc time)
julia> @time read_pcm("test_speech.pcm",Int16, little_endian=true);
0.000723 seconds (31 allocations: 719.672 KiB)
julia> @time read_pcm2("test_speech.pcm",Int16, true);
0.010079 seconds (5.26 k allocations: 981.810 KiB)
julia> @time read_pcm2("test_speech.pcm",Int16, true);
0.000638 seconds (22 allocations: 719.266 KiB)
See how the second runs are very similar to each other. However, compare with @btime
:
julia> @btime read_pcm("test_speech.pcm",Int16, little_endian=true);
218.818 μs (18 allocations: 719.11 KiB)
julia> @btime read_pcm2("test_speech.pcm",Int16, true);
227.400 μs (18 allocations: 719.11 KiB)
The memory allocated is about the same, but the execution time is 3x faster with @btime
. To dig a bit deeper into what’s going on, we can use @benchmark
:
julia> @benchmark read_pcm("test_speech.pcm",Int16, little_endian=true)
BenchmarkTools.Trial:
memory estimate: 719.11 KiB
allocs estimate: 18
--------------
minimum time: 218.646 μs (0.00% GC)
median time: 497.964 μs (0.00% GC)
mean time: 466.615 μs (10.13% GC)
maximum time: 51.317 ms (98.54% GC)
--------------
samples: 10000
evals/sample: 1
Here we can see that read_pcm
is called and timed 10 000 times; the fastest execution time was ~218 μs, the median was ~497 μs, and the maximum was ~51317 μs (a garbage collection occurred during this run). The @btime
macro returns the fastest execution time of this benchmark.
As for what is most accurate, it depends on your application and what you’re interested in timing. The minimum time is often interesting as a theoretical ideal time not affected by noise. But it’s not a realistic measure if in practice you only run the function once or a few times. In particular if you’re doing file IO, like in this example. By running the benchmark many times, the file will be cached, and reading it will be very efficient.
The median might be more representative of an expected run time, but again, if you’re doing file IO, your file will be cached and provided unrealistically fast.
If you’re interested in non-cached run-time, you could run read_pcm
for some file first to trigger compilation, and then run it again on a non-cached file, using the simple @time
macro. Or even better, run it for a large number of files. Note that you’ll mostly time the performance of your disk though.
Anyway, back to your original question, the difference you saw for keyword arguments had to do with compilation/setup cost. Unless your script is very short-lived and will be re-run (and hence re-compiled) repeatedly, I doubt that it matters in practice.