I’ve trying to compare a function, on slicing an array on cpp vs julia, I get different results as I wrote on the title and I don’t understand why, would appreaciate your help.
I’m using custom data for the vector which I cannot attach, but you will the the code logic.
(I tried comparing it in julia using views and in cpp changing std::vector to std::span which gives me similar results, both being fast, but I want to get the same speed on copy)
julia code
using Printf
function custom_test(data::Vector{Float64}, i::Int64)
if(i < 1001)
return 0
end
rates = data[i-1000:i]
end
mutable struct custom_struct
data::Vector{Float64}
end
function tests(d::custom_struct)
t = @elapsed begin
@inbounds for i in 1:length(d.data)
custom_test(d.data, i)
end
end
elapsed_ms = t * 1000
@printf("elapsed time: %.6f ms\n", elapsed_ms)
end
end
I think you mean this is slow rates = data[i-1000:i] and if you put @view in front it’s fast (too be expected). In addition you return this copied data, or return 0 so it’s not type-stable, but likely not the main problem(?).
Copying per se shouldn’t be slower than in C++, but you accumulate garbage. That might be the problem, and Bumper.jl of help? C++ will destruct/free memory early. How do you run this?
I think your Julia code is incomplete, there’s a lone end keyword at the end and you aren’t calling any function so I’m not sure what you’re measuring exactly. For what is worth, note that the custom_test function has a couple of performance gotchas: it’s type-unstable (the return value isn’t exclusively determined by the types of the input arguments) and slicing an array in Julia makes a copy, you may want to use a view instead.
Yes that part is the slow, using view is fast and equal in speed, but I want to compare only copying.
return 0 doesn’t affect much, but good point, I didn’t see that.
I will read about Bumper package, For running this, first I read a csv which I store in a struct, but it is a big csv, so I cannot attach it.
I removed unncesessary parts of the code, that’s why it looks incomplete.
I corrected the return, but that is not the key problem.
Using a view is good, I get the same speed in both languages, but I don’t understand why copying is slower.
This function doesn’t return any value, and doesn’t do anything observable. So a clever compiler could conceivably decide to just skip the whole thing to save time and space.
It might be that it’s not the copying that is slow, but, for example, garbage collection.
Optimizing compilers are an active adversary, especially when trying to write microbenchmarks. This is a great talk — it’s about C++ but it’s really true for all optimizing languages:
But if it is slow because of the GC, is there any solution ? Because my priority is performance, and I don’t think I would like to switch to cpp, at first it looks cool but as things grow larger it is hard.
I may have found the solution, c++ was copying the data as reference, while julia not.
if(i > 1001)
copy!(rates, @view d.data[i-1000:i]) //300ms
#rates = copy(@views d.c[i-1000:i]) this is 4000ms, maybe because of reallocation or gc, in cpp this doesn't slow down the code
end
The C++ version doesn’t return, which plays a role in Clang removing the entire copy at compile-time (earlier comment inspecting the two functions). The Julia version returns 0 or the vector, so that’s much harder to remove. Julia methods do have to return, but the closest thing to not returning is actually an unconditional return nothing, which despite its name is actually a value. You would need some reflection methods to really see what the compilers do, but you could make the versions closer first, probably leaning towards actions that would require the copy to survive compilation. Returning the vector would be such an action, and the if statement is getting in the way of that; if you want to benchmark array slicing specifically, then a simpler benchmark without any of the extra indices processing and custom_struct would be better.
data[i-1000:i] for a Julia Vector is equivalent to data[begin-1+(i-1000):begin-1+i], so that is only equivalent to C++'s endpoints (data.begin() + (i - 1000), data.begin() + i) if Julia’s i range is greater by 1, which is consistent with the for loops but doesn’t adjust for C++'s exclusive endpoint versus Julia’s inclusive endpoint. The Julia version’s (i < 1001) check branches to a copy if (i >= 1001), which is only equivalent to the C++ version’s (i > 1001) check if Julia’s i range is instead lower by 1. I would usually check some function calls before I conclude off-by-1 errors, but this really seems to be the case; for example, the lowest i = 1001 in Julia’s version would slice 1001 elements from begin to begin+1000, but the lowest i = 1002 in C++'s version would slice 1000 elements from data.begin() + 2 to data.begin() + 1001 (exclusive endpoint). If that also seems inconsistent to you, I’d suggest keeping the i ranges the same, use begin in Julia too, and adjust for inclusive vs exclusive endpoints.
What does this mean exactly? I looked up std::vector and the call looks like a copy constructor, which sounds like a shallow copy that Julia’s slices do and was corroborated by another earlier comment.
If you mean for benchmarking, you can measure and take into account GC. BenchmarkTools is useful because it measure multiple runs. Taking 1 start time and 1 end time doesn’t take into account random performance variation, especially if the GC needs to clean up sometimes.
If you mean for performance, GC only kicks in after enough heap allocations. You could write values to a reused preallocated vector (something like your copy! line, though in-place broadcasting .= is more idiomatic) instead of freshly allocating a vector for each shallow copy. Of course, if you need separate copies, then you do need to allocate for each one.
I have corrected that in both languages, but there is not a big difference, also I have tried @btime but it gives me similar results.
I get the same ms with that
I mean that cpp is taking data argument as a reference (std::vector& data) to make then the copy, but in julia I think this was not the case but I’ve tried inplace copy, which gives me 300ms