Slicing array on julia 4000ms vs c++ 400ms

tradingtsm · November 7, 2024, 6:06pm

Hi everyone,

I’ve trying to compare a function, on slicing an array on cpp vs julia, I get different results as I wrote on the title and I don’t understand why, would appreaciate your help.
I’m using custom data for the vector which I cannot attach, but you will the the code logic.
(I tried comparing it in julia using views and in cpp changing std::vector to std::span which gives me similar results, both being fast, but I want to get the same speed on copy)

julia code

using Printf

function custom_test(data::Vector{Float64}, i::Int64)
    if(i < 1001)
        return 0
    end
    rates = data[i-1000:i]
end

mutable struct custom_struct
     data::Vector{Float64}
end

function tests(d::custom_struct)
    t = @elapsed begin
        @inbounds for i in 1:length(d.data)
            custom_test(d.data, i)
        end
    end
    elapsed_ms = t * 1000
    @printf("elapsed time: %.6f ms\n", elapsed_ms)
end


end

cpp

#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <chrono>

struct custom_struct {
    std::vector<double> data;
};

void custom_test(const std::vector<double>& data, int i) {
    if(i > 1001)
    {
    	std::vector rates = std::vector<double>(data.begin() + (i - 1000), data.begin() + i);
    }
}

void tests(custom_struct& d) {
    start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < d.data.size(); ++i) {
        custom_test(d.data, i);
    }
    end = std::chrono::high_resolution_clock::now();
    elapsed_ms = end - start;
    std::cout << "elapsed time: " << elapsed_ms.count() << " milliseconds\n";
}

Best regards,
Oskar

Palli · November 7, 2024, 6:24pm

I think you mean this is slow rates = data[i-1000:i] and if you put @view in front it’s fast (too be expected). In addition you return this copied data, or return 0 so it’s not type-stable, but likely not the main problem(?).

Copying per se shouldn’t be slower than in C++, but you accumulate garbage. That might be the problem, and Bumper.jl of help? C++ will destruct/free memory early. How do you run this?

giordano · November 7, 2024, 6:24pm

I think your Julia code is incomplete, there’s a lone end keyword at the end and you aren’t calling any function so I’m not sure what you’re measuring exactly. For what is worth, note that the custom_test function has a couple of performance gotchas: it’s type-unstable (the return value isn’t exclusively determined by the types of the input arguments) and slicing an array in Julia makes a copy, you may want to use a view instead.

Elrod · November 7, 2024, 6:54pm

They’re also making a copy in C++, so I assume it is deliberate.

std::vector rates = std::vector<double>(data.begin() + (i - 1000), data.begin() + i);

Otherwise, they’d need std::span.

mbauman · November 7, 2024, 6:57pm

Given that you’re not doing anything with the result, it’s possible that C++ (or Julia or both) is optimizing this in a non-representative manner.

tradingtsm · November 7, 2024, 6:59pm

Yes that part is the slow, using view is fast and equal in speed, but I want to compare only copying.
return 0 doesn’t affect much, but good point, I didn’t see that.
I will read about Bumper package, For running this, first I read a csv which I store in a struct, but it is a big csv, so I cannot attach it.

tradingtsm · November 7, 2024, 7:02pm

I removed unncesessary parts of the code, that’s why it looks incomplete.
I corrected the return, but that is not the key problem.
Using a view is good, I get the same speed in both languages, but I don’t understand why copying is slower.

tradingtsm · November 7, 2024, 7:03pm

Yes, I used span (in julia @views) which gives me good results, but now I want to match the same speed using a copy.

tradingtsm · November 7, 2024, 7:08pm

Hi, do you know how to test that ?
I don’t understand this concept.

DNF · November 7, 2024, 7:31pm

tradingtsm:

void custom_test(const std::vector<double>& data, int i) {
    if(i > 1001)
    {
    	std::vector rates = std::vector<double>(data.begin() + (i - 1000), data.begin() + i);
    }
}

This function doesn’t return any value, and doesn’t do anything observable. So a clever compiler could conceivably decide to just skip the whole thing to save time and space.

It might be that it’s not the copying that is slow, but, for example, garbage collection.

Zentrik · November 7, 2024, 7:34pm

Looks like clang optimizes out the copy, see Compiler Explorer. Note that custom_test has very little assembly that is just checking the length.

EDIT: Though if they measured 400ms that seems like it wasn’t optimized out.

mbauman · November 7, 2024, 7:37pm

Optimizing compilers are an active adversary, especially when trying to write microbenchmarks. This is a great talk — it’s about C++ but it’s really true for all optimizing languages:

tradingtsm · November 7, 2024, 7:42pm

even returning in cpp, it is around 400ms

std::vector<double> rates_test(const std::vector<double>& data, int i) {
    if(i > 1001)
    {
    std::vector rates = std::vector<double>(data.begin() + (i - 1000), data.begin() + i);
    return rates;
    }

    return std::vector<double>();
}

But if it is slow because of the GC, is there any solution ? Because my priority is performance, and I don’t think I would like to switch to cpp, at first it looks cool but as things grow larger it is hard.

tradingtsm · November 7, 2024, 7:42pm

data is about 370000 rows.

tradingtsm · November 7, 2024, 7:48pm

thanks for that video, looks interesting,
so maybe if I add more random code, the benchmark will get closer to reality?

Vasily_Pisarev · November 7, 2024, 7:54pm

Unless you actually use the returned value, there is still no observable effect.
You need to do something like

void custom_test(const std::vector<double>& data, std::vector<double>& rates, int i) {
    if(i > 1001)
    {
    	std::swap(rates, std::vector<double>(data.begin() + (i - 1000), data.begin() + i));
    }
}

void tests(custom_struct& d) {
    std::vector<double> rates(1);
    start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < d.data.size(); ++i) {
       custom_test(d.data, rates, i);
    }
    end = std::chrono::high_resolution_clock::now();
    elapsed_ms = end - start;
    std::cout << "elapsed time: " << elapsed_ms.count() << " milliseconds\n";
    std::cout << "rates[0] is: " << rates[0] << "\n"; // to use the value
}

tradingtsm · November 7, 2024, 8:13pm

Still the same, 570 ms, I changed the code because it gives me errors :

if(i > 1001)
    {
        std::vector val = std::vector<double>(data.begin() + (i - 1000), data.begin() + i);
    	std::swap(rates, val);
    }

tradingtsm · November 7, 2024, 9:53pm

Hi everyone,

I may have found the solution, c++ was copying the data as reference, while julia not.

if(i > 1001)
       copy!(rates, @view d.data[i-1000:i]) //300ms
       #rates = copy(@views d.c[i-1000:i]) this is 4000ms, maybe because of reallocation or gc, in cpp this doesn't slow down the code
end

what do you think about this ?

Benny · November 8, 2024, 8:23am

tradingtsm:

void custom_test(const std::vector<double>& data, int i) {
    if(i > 1001)
    {
    	std::vector rates = std::vector<double>(data.begin() + (i - 1000), data.begin() + i);
    }
}

I’m a bit concerned that these aren’t equivalent.

The C++ version doesn’t return, which plays a role in Clang removing the entire copy at compile-time (earlier comment inspecting the two functions). The Julia version returns 0 or the vector, so that’s much harder to remove. Julia methods do have to return, but the closest thing to not returning is actually an unconditional return nothing, which despite its name is actually a value. You would need some reflection methods to really see what the compilers do, but you could make the versions closer first, probably leaning towards actions that would require the copy to survive compilation. Returning the vector would be such an action, and the if statement is getting in the way of that; if you want to benchmark array slicing specifically, then a simpler benchmark without any of the extra indices processing and custom_struct would be better.
data[i-1000:i] for a Julia Vector is equivalent to data[begin-1+(i-1000):begin-1+i], so that is only equivalent to C++'s endpoints (data.begin() + (i - 1000), data.begin() + i) if Julia’s i range is greater by 1, which is consistent with the for loops but doesn’t adjust for C++'s exclusive endpoint versus Julia’s inclusive endpoint. The Julia version’s (i < 1001) check branches to a copy if (i >= 1001), which is only equivalent to the C++ version’s (i > 1001) check if Julia’s i range is instead lower by 1. I would usually check some function calls before I conclude off-by-1 errors, but this really seems to be the case; for example, the lowest i = 1001 in Julia’s version would slice 1001 elements from begin to begin+1000, but the lowest i = 1002 in C++'s version would slice 1000 elements from data.begin() + 2 to data.begin() + 1001 (exclusive endpoint). If that also seems inconsistent to you, I’d suggest keeping the i ranges the same, use begin in Julia too, and adjust for inclusive vs exclusive endpoints.

What does this mean exactly? I looked up std::vector and the call looks like a copy constructor, which sounds like a shallow copy that Julia’s slices do and was corroborated by another earlier comment.

If you mean for benchmarking, you can measure and take into account GC. BenchmarkTools is useful because it measure multiple runs. Taking 1 start time and 1 end time doesn’t take into account random performance variation, especially if the GC needs to clean up sometimes.

If you mean for performance, GC only kicks in after enough heap allocations. You could write values to a reused preallocated vector (something like your copy! line, though in-place broadcasting .= is more idiomatic) instead of freshly allocating a vector for each shallow copy. Of course, if you need separate copies, then you do need to allocate for each one.

tradingtsm · November 9, 2024, 5:20pm

I have corrected that in both languages, but there is not a big difference, also I have tried @btime but it gives me similar results.

I get the same ms with that

I mean that cpp is taking data argument as a reference (std::vector& data) to make then the copy, but in julia I think this was not the case but I’ve tried inplace copy, which gives me 300ms

copy!(rates, @view d.data[i-1000:i]) //300ms

Topic		Replies	Views
Creating the same matrix in Julia as in Matlab takes longer time New to Julia	28	2187	June 28, 2017
Porting code from MatLab - performance tips New to Julia	18	425	June 26, 2024
Julia 1.0, tight-binding benchmark and array slices Performance	9	1903	September 22, 2018
Julia fn clearer than C++/Fortran (examples sought) General Usage	38	3220	May 30, 2021
Compare julia sum to a cpp implementation - julia is extremely slow?! Performance question	35	1795	October 7, 2019

Slicing array on julia 4000ms vs c++ 400ms

Related topics