How to access all field values for one field of a composite type array

question

#1

Hi there! I’ve created a composite type with:

mutable struct TAdult
id::Int64
strain::String
hind::Float64
gind::Float64
tradeoff::Float64
end

Then I create an array of that type with:

function InitAdults(ft::Int64, fhind_mean::Float64, fhind_std::Float64, fgind_mean::Float64, fgind_std::Float64, tcost::Float64)
Adults=Array{TAdult}(0)
for i in 1:ft
id=i
strain=string(id)
hind=rand(Normal(fhind_mean,fhind_std))
gind=rand(LogNormal(fgind_mean,fgind_std))
tradeoff=exp((-gind^2)/(2*tcost^2))
nAdult=TAdult(id, strain, hind, gind, tradeoff)
push!(Adults, nAdult)
end
return Adults
end

In a new function, I need to access certain fields of all TAdult in that array. I can achive this by looping over the array (e.g. in a function)

function idvector(fAdults::Array{TAdult,1})
idvector=Float64[ ]
for i in 1:length(fAdults)
push!(idvector, fAdults[i].id)
end
return idvector
end

However, I think this takes too long and I want to improve my performance. So I was wondering whether there was a possible way to do something similar to
idvector= Adults[1:end].id
I thought about using a tuple type, but if I understand it correctly, tuple types are immutable? And also would only be accessible by index, not fieldname. Does anybody have an idea how to solve this?
Thanks alot!


#2

Did you measure and found out that the loop was slow? It looks perfectly fine to me and any alternative way of writing this will, in the end, have to do the same as what you have written, loop over all elements and extract the id field.


#3

Try a comprehension.

idvector = [x.id for x in Adults]

#4

The loop itself isn’t slow, but I need to access different fields of several thousand instances repeteadly and my whole simulation is going to take ten hours if I do not find a faster way. So there is definetly the need for performance improvement. I was just wondering, whether there is a fast track to access an indexed value of an indexed array in an array…Apperently there is none (?), so I’ll probably have to change the simulation at a more basal level. Thanks for your input! :slight_smile:


#5

@GunnarFarneback I’ll try this and see if it’s faster. Thank you :slight_smile:


#6

Perhaps using a “Struct of Arrays” would work better e.g:

struct TAdults
    ids::Vector{Int64}
    strains::Vector{String}
    hinds::Vector{Float64}
    ginds::Vector{Float64}
    tradeoffs::Vector{Float64}
end

#7

Did you in fact profile your code and ascertain that this is the bottleneck? When the types are known to the compiler, value.field accessors are very fast.


#8

That’s a good idea, but would not represent what I want to achieve. Every instance of TAdult needs to accessed later on again, replicated and even changed sometimes.


#9

No I didn’t find the actual bottleneck yet.
But I have an idea, about what could take so long (which is not the access to the field, but changing it later on), which I thought I could shorten up with a different way to access the fields. However, I’m rather new to all this (especially performance improvement hasn’t been an issue yet), but before I expand my simulation, I want to make sure, it runs as fast as possible… It’s good to know that value.field is usually fast… Thank you so much!


#10

You can automate that process with: https://github.com/simonster/StructsOfArrays.jl

Btw, iterating Vector{<: mutable struct} is quite a bit slower then Vector{<:struct}, since you end up iterating a linked list. So trying out the struct approach and if it’s lots faster figure out how to work with the immutability could be the best way to deal with this.


#11

Thanks! I know that it would be faster to have immutable struct, but I think that’s not really a possibility here. I need to be able to change everything rather often (with a probability of 0.03% of ~10 000 *100 instances). Or do you have a good idea how I could deal with immutablity in this case?
I’ll have a look into StructsOfArrays. This sounds interesting. Thank you!


#12

So far, this didn’t improve the performance. If I call the above mentioned function I get:

@time fooidvector(Testpop)
0.000002 seconds (8 allocations: 512 bytes)

and if I use the comprehension it’s

@time idvector = [x.id for x in Testpop]
0.039866 seconds (8.72 k allocations: 488.114 KiB)

Is this something you’d expect? Thanks alot!


#13

The compiler is very good at optimizing immutables. Try it out and benchmark :wink: If the approach works, you might want to have some helpers to “modify” the immutables. Hopefully, this should be a lot nicer on 0.7!


#14

Thanks a lot. I’ll try to figure out if there is a way for me to use immutables. with helpers, do you mean find/create functions to change the values? I need them permanently changed (because of evolution :wink: ) and can’t have them show up changed for certain parts of the simulation and then not be changed in the initial instance.
Or did you already have something in mind to help achieve this? Thank you so much!


#15

Iterating an isbits is faster. Even with immutable this wouldn’t be isbits due to the string field.

This is the first thing to do though. How can you try optimize code if you don’t know what is slow?


#16

I would have thought the array is the shared state :wink:
It won’t work if you pass around the instances outside the array and mutate them at multiple stages and expect everything to update.


#17

Iterating an isbits is faster. Even with immutable this wouldn’t be isbits due to the string field.

Good point, I overlooked the String… Would be interesting if that really needs to be a string though


#18

I think, the string rellay need to be a string. It contains a mixture of characters and numbers and can be rather long (like 1h23m45h68h79m50 or something similar). Is there a possibility to do this without a string?


#19

True, I should have done that first. Do you have any tips on doing this correctly? Or a nice link? There is the BenchmarkTools.jl package, for benchmarking. What exactly did you mean before with profiling? Can you recommend a homepage to read? Thanks again!


#20

Well, @kristoffer.carlsson is right, you should first find your actual bottleneck, before you dive into further optimizations :wink:
But if the string is fixedsize you could use something like NTuple{N, UInt8} or NTuple{N, Char} or FixedSizeStrings.jl.
Or just save an index into a string array - but that will just get you down further the premature optimization rabbit hole, and you should really first pin down the actual bottlenecks :slight_smile:

About the profiling: https://docs.julialang.org/en/stable/manual/profile/