For loop performance vs "functional" performance

The difference is that in the broadcasting version you create dsi[1:end-ss] once and use it for all elements in ts, whereas in the loop version you create it for every element.

With AoC2024/day19/input.txt at main · asnoyman/AoC2024 · GitHub as input file (which does not seem to be the same one as you’re using as the output is not the same), and using f1, f1a as above, and f1b similar to f1a but with

dsi_end = dsi[1:end-ss]
for e in ts
    if endswith(dsi_end,e) 
        cm[length(e)+ss]+=cm[ss]
    end
end

I get

julia> @btime sum(f1($ds, $ts, i) for i in 1:length($ds); init=0)  # broadcasted version
  65.037 ms (250735 allocations: 17.22 MiB)
758890600222015

julia> @btime sum(f1a($ds, $ts, i) for i in 1:length($ds); init=0)  # dsi[1:end-ss] inside the loop
  101.799 ms (37804 allocations: 7.92 MiB)
758890600222015

julia> @btime sum(f1b($ds, $ts, i) for i in 1:length($ds); init=0)  # dsi[1:end-ss] outside of the loop
  61.943 ms (37804 allocations: 7.92 MiB)
758890600222015
1 Like