Thanks alot! I’ll look into that and let you know my solution, or post if I find further problems.
Could you edit your post to use triple backquotes to quote the code?
It’s rather hard to read the way it is! Thanks!
Can you list what operations you need to perform?
Are there any of the fields that don’t change, once set at a particular index?
For example, you might have a function, that has the following structure:
struct AdultVal
strain::String
hind::Float64
gind::Float64
tradeoff::Float64
end
struct AdultVec
ids::Vector{Int64}
values::Vector{AdultVal}
end
function InitAdults(ft::Int64, fhind_mean::Float64, fhind_std::Float64,
fgind_mean::Float64, fgind_std::Float64, tcost::Float64)
ids = Vector{Int64}(uninitialized, ft)
adults = Vector{AdultVal}(uninitialized, ft)
for i = 1:ft
ids[i] = i # Why is this really needed? Do you really have another id for this?
adults[i] = AdultVal(string(i), # Why do you want a string form of the id?
rand(Normal(fhind_mean,fhind_std)),
rand(LogNormal(fgind_mean,fgind_std)),
exp((-gind^2) / (2 * tcost^2)))
end
AdultVec(ids, adults)
end
idvector(fAdults::AdultVec) = fAdults.ids
hindvector(fAdults::AdultVec) = [a.hind for a in fAdults.values]
I usually accomplish this with broadcasting getfield
getfield.(array, :fieldname)
What you see there is the difference between running in a function and running in global scope. The comprehension is faster than your function but it can be matched if you pre-allocate the output and eliminate bounds checking. Here is a simplified benchmark between some of the discussed approaches for mutable and immutable structs.
using BenchmarkTools
mutable struct A
id::Int
end
struct B
id::Int
end
N = 10000
a = [A(i) for i = 1:N];
b = [B(i) for i = 1:N];
function f1(x)
y = Int[]
for i = 1:length(x)
push!(y, x[i].id)
end
return y
end
f2(x) = [x.id for x in x]
function f3(x)
y = Vector{Int}(length(x))
for i = 1:length(x)
@inbounds y[i] = x[i].id
end
return y
end
f4(x) = getfield.(x, :id)
# Check consistency.
f1(a) == f2(a) == f3(a) == f4(a) == f1(b) == f2(b) == f3(b) == f4(b)
@btime f1($a);
@btime f2($a);
@btime f3($a);
@btime f4($a);
@btime f1($b);
@btime f2($b);
@btime f3($b);
@btime f4($b);
I get the results (in microseconds):
mutable immutable
f1 45 42
f2 11 5
f3 13 5
f4 117 138
For this specific purpose the comprehension is both the simplest and fastest solution.
That said, I strongly doubt field accesses or loops are your real problem. If those seem slow it’s often an effect of type instability, which can degrade performance drastically. The performance tips section of the Julia manual is highly recommended and of course profiling to see where time is actually spent.
Just a couple of small comments:
-
push! is kinda slow (does a ccall instead of a single store).
-
Datastructure layout: whether to use an array-of-structs or struct-of-arrays depends on your memory access patterns. However, you want inline storage, hence most of the time a hybrid array-of-bitstypes plus arrays-of-non-bitstypes is best if you care about random access. Hence
struct TAdult_bits
id::Int64
hind::Float64
gind::Float64
tradeoff::Float64
end
adult_bits::Vector{TAdult_bits}
adult_strain::Vector{String}
In order to modify, you overwrite with the new value; most of the time the compiler will remove the loads and stores of unmodified fields. Unfortunately, it is currently a real pain if you want multiple threads to simultaneously modify different fields of the same struct; then you need to either use struct-of-array, play pointer games or rely on the undefined behavior that julia does the right thing most of the time.
The problem with array-of-non-bitstype is that every access has indirection, and cache-friendlyness depends on allocation-order. This is unavoidable for the string, but not unavoidable for the bitstypes. Consider pulling a deepcopy of the strings (ie adult_strain=deepcopy(adult_strain)
) if you construct once and sequentially iterate often; this gives you a pretty OK layout (I tested this with bigints, not strings).
Struct-of-array on the other hand sucks if you need several of the fields at the same time (because e.g. adult_bits[i].id
and adult_bits[i].hind
are always on the same cache-line) for random i
(sizeof(TAdult_bits)
divides 64).
If you only iterate sequentially then struct-of-array is totally fine, even if you need multiple fields. If you always need only one field at a time and iterate sequentially (or your array is small compared to the size of your cache), then struct-of-array is best.
Also, you could in principle define a view that allows you to lazily consider the array-of-bitstypes as struct-of-abstract-arrays (you just do the necessary pointer arithmetic and implement getindex/setindex! via unsafe_load/unsafe_store!).
Thank you all for your incredible help! I did some profiling over the weekend and came to the conclusions, that (for my puposes) it’s probably best, if I re-write my simulation, so that it does not depend on the mutable struct TAdult
, because most of my following code depends on that and the problem is more basic than just accessing the fields (as you all pointed out ) That said, I learned a lot from all your replies, and can hopefully continue to do my performance enhancement more systematically
Thanks again!
Adults…x syntax would be very convenient if existed
Please don’t necropost without substantial contribution.