Long parameter for type seems very slow

nabla · June 21, 2021, 12:12pm

I construct millions of objects with some particular associated data (essentially, a transition table). I’d like to avoid storing this data in each object (even if it’s an extra pointer), since I can assume it’s constant throughout a computation. I’d also rather avoid making it a global, since (1) it’s known at compile time so could lead to optimizations and (2) I was taught to avoid globals. Here is a stripped-down version of my structure:

using StaticArrays
struct Data{T,Junk} x::T end

where Junk should hold my transition table. Now

julia> Data{Int,[1]}
ERROR: TypeError: in Type, in parameter, expected Type, got a value of type Vector{Int64}

while

julia> using StaticArrays
julia> Data{Int,SVector{1}([1])}

works fine. However,

julia> Data{Int,SVector{10000}(ones(10000))}

is unbelievably slow.

Is there some better way of doing what I want?

SteffenPL · June 21, 2021, 12:23pm

Do you have a minimal example to do benchmarks on?

Depending on your setting, it might actually be smart to store it with the data, since that makes it easier for the compiler to do the corresponding optimizations. If your data is very small, you can use Val to pass compiler-time constants.

edit: sorry, somehow I deleted half of my post with the edit:

You could consider using a global constant const data = ... which doesn’t suffer from the points you mention.

dpsanders · June 21, 2021, 12:55pm

Just store a pointer inside the objects to a standard Julia vector.

nabla · June 21, 2021, 1:16pm

That’s a possibility. However, it multiplies memory usage by a factor of 2. Are you saying there’s no other solution?

Storing data as a constant is fine if it never changes. However, changes to a constant do not force a recompile so it’s definitely a suboptimal solution (and it makes it much more complicated to nicely package the code in a module).

SteffenPL · June 21, 2021, 1:50pm

The main problem is that the compiler cannot really optimize your code for 10000 data points from the transition table (it doesn’t fit into the cache etc), hence, the performance boost will not be that large by telling the compiler that it is a constant.

I would bet that for many cases one more pointer doesn’t matter much. If your struct contains more than just an Int that it is not that dramatic anymore

Anyway, if you really don’t want the extra pointer per data entry, then why don’t you pass the transition table just as a separate object?

your_data = [ YourData(i) for i = 1:1000 ]
transition_table = rand(1e6)

function your_simulation( your_data, transition_table )
    ... # here you can access transition_table and your data at full speed (function barrier)
end

Of course, that means that you have to pass transition_table forward everywhere, but on the other hand that’s a clear design which also makes it obvious to others which data is used by the function. Maybe you have parameters anyway, to which you can bundle the transition_table?

nabla · June 21, 2021, 2:13pm

@SteffenPL Here is an extreme case of what I could be interested in doing: semigroups represented by their multiplication table. Let’s say the semigroup has 100 elements, so they nicely fit in a UInt8. The multiplication table, OTOH, has 100^2 elements. Now the command * needs access to the multplication table. The options:

Store a pointer to the table. This multiplies the footprint by 16 (1 byte to 1 byte + 1 Ptr, aligned)
Make the table constant. This means only one kind of semigroup can exist at the same time
Pass the table to each call. This means I can’t implement a standard * for my objects, and have to rewrite many things downstream

are not appealing to me. Is there any other one?

jling · June 21, 2021, 2:22pm

Of course this is slow, this is ~ a C struct with 10000 individual fields:

julia> a = SVector{5}(ones(5))
5-element SVector{5, Float64} with indices SOneTo(5):
 1.0
 1.0
 1.0
 1.0
 1.0

julia> dump(a)
SVector{5, Float64}
  data: NTuple{5, Float64}
    1: Float64 1.0
    2: Float64 1.0
    3: Float64 1.0
    4: Float64 1.0
    5: Float64 1.0

SteffenPL · June 21, 2021, 3:49pm

I would still argue that option 1. makes sense since the group table is logically part of the data and simplifies things a lot.

Sure 16 times more memory consumption sounds like a lot, but if you look at the absolute value, it is: For storing n group elements, we need let’s say n * 16 additional bytes. If you aren’t going to have millions of group elements in your workspace then that extra memory footprint is no big deal on computers nowadays.

(Also, option 1 is by far the easiest solution. No world-age issues, not mutable global state of your module…)

A few other ideas (if you still don’t like option 1):

Option 4:
You could use a macro to insert the group table into computations,
like:

G = SemiGroup()
a, b = G(1), G(2)
x = @sg G a * b

Option 5:
You multiply tuples:

i = 1
j = 10
x = (G, i) * (G, j)

By defining function (G::SemiGroup)(i) = (G,i) you could also write

x = G(i) * G(j)

This allows you to swap between both representations on the fly:
If you have millions of elements, just store the indices.
If you don’t care or if you need to compute stuff, switch to the tuples.
Creating a tuple is also very fast, probably faster than “somehow” finding the group table.

Option 6:
(I don’t like it too much. Since it mutates the global state, which can cause trouble like world-age issues. But maybe it could be improved.)


struct SemiGroupElement{GroupName}
    i::UInt8
end

# define a group:
G = :F2
const F2_grouptable = UInt8[1 2; 2 1]
@inline group_table(::SemiGroupElement{G}) = F2_grouptable


function *(a::T, b::T) where {G, T <:SemiGroupElement{G}}
  return SemiGroupElement{G}( group_table(a)[a.i, b.i] )
end

e = SemiGroupElement{:F2}(1)
b = SemiGroupElement{:F2}(2)

e * b == b
b * b == e

The names are not chosen well etc.

nabla · June 21, 2021, 4:13pm

Thanks @SteffenPL!

Sure 16 times more memory consumption sounds like a lot, but if you look at the absolute value, it is: For storing n group elements, we need let’s say n * 16 additional bytes. If you aren’t going to have millions of group elements in your workspace then that extra memory footprint is no big deal on computers nowadays.

Yes, that was my point: I DO need to store millions (actually, tens of millions) of group elements. They’re small building blocks into some larger structures, so I’m afraid the *16 penalty doesn’t get amortized.

I like option 6

@jling: thanks for the remark, indeed the slowdown already occurs when I create the 10k-sized SMatrix.

I had made it an SMatrix so that I could pass it as a parameter to Data; somehow Julia lets me use static arrays as parameters but not regular arrays. I do understand it’s an off-label use of SMatrix’es.

jling · June 21, 2021, 4:59pm

may I as why Junk’s *content needs to be inside the type information?

nabla · June 22, 2021, 10:37am

@jling I could pass a name, reference or pointer. However, the content has to be used by the * command, so why not pass it?

@SteffenPL I came up with a 7th solution, which is maybe not to your taste, but does solve all my efficiency issues: rather than pass a usual matrix as parameter (forbidden, since it’s not POD) or a SMatrix (too slow for a 100x100 grid), I can pass as parameter (matrix indices, matrix entries as a tuple). For demonstration, here are five options; they all generate almost-optimal code (5 or 6 instructions):

using StaticArrays

struct Elt{X} x::UInt8 end

const F2_grouptable = UInt8[1 2; 2 1]
const SF2_grouptable = SMatrix{2,2}(F2_grouptable)

mul1(a::Elt{T},b::Elt{T}) where T = Elt{T}(@inbounds T[a.x,b.x])
b1 = Elt{SF2_grouptable}(2)
         
@inline group_table(::Elt{:F2}) = F2_grouptable
mul2(a::T,b::T) where {G, T <:Elt{G}} = Elt{G}(@inbounds group_table(a)[a.x,b.x])
b2 = Elt{:F2}(2)

mul3(a::T,b::T) where T = T(@inbounds F2_grouptable[a.x,b.x])

mul4(a::T,b::T) where T = T(@inbounds SF2_grouptable[a.x,b.x])

mul5(a::Elt{T},b::Elt{T}) where T = Elt{T}(@inbounds T.second[T.first[a.x,b.x]])
b5 = Elt{LinearIndices(F2_grouptable)=>tuple(F2_grouptable...)}(2)

@code_native(mul1(b1,b1))
@code_native(mul2(b2,b2))
@code_native(mul3(b1,b1))
@code_native(mul4(b1,b1))
@code_native(mul5(b5,b5))

jling · June 22, 2021, 11:08am

why can’t you pass data like data? and not passing data like they are type

SteffenPL · June 22, 2021, 11:44am

Do I understand it correctly, that the advantage of option 7 is that it does not reply on a global constant anymore? I see that this has it’s advantages for you (btw: I used SizedMatrix instead of SMatrix. That works as well in your variant and doesn’t crash for large grouptables.)

A small disadvantage might be, that whenever someone wants to see the type, one gets a very lengthy type description and I don’t know if it has negative impact on the compilation time if types suddenly have tons of parameters. But at runtime, both options (6 and 7) seem to be equally fast in my benchmark (group with 100 elements).

@jling You are probably more experienced than me, please feel free to correct me if I say stupid things

SteffenPL · June 22, 2021, 11:50am

I guess from a math point of view, it is neat if group elements from different groups also have different Julia types? It forbids a * b if they are not from the same group etc and one gets maybe more optimized code. [and for @nabla another reason is the memory efficiency as he detailed in the 6th post]

nabla · June 22, 2021, 3:10pm

Yes, I think that even if it’s a bit unconventional it’s the most elegant way to have multiplicative objects without polluting the global environment.

However, a SizedMatrix wraps sizes around usual matrices, so isn’t POD and can’t be passed as a parameter.

Side question: do you know a way of pretty-printing types to avoid the very long strings? I don’t seem to be able to define show(::IO, ::Type{Elt{X}}) where X.

SteffenPL · June 22, 2021, 3:16pm

Oh, yes, you are right. I was using the mol5 variant (which unwarps the SizedMatrix).

About the show. It looks correct. I always forget to do import Base.show,
this works for me

import Base.show
show(io::IO, ::Type{Elt{X}}) where X = print(io, "Blub")

simeonschaub · June 22, 2021, 3:19pm

Note that you still need to ensure the vector is rooted somewhere, because otherwise Julia’s GC will just clean it up.

nabla · June 22, 2021, 9:32pm

Nice! Where can I find this variant? I guessed it was a branch on StaticArrays.jl but was wrong.

SteffenPL · June 22, 2021, 9:57pm

Sorry, I think my last post was unclear (and a typo: I wanted to write mul5 referring to your code, not mol5). For testing your options, I just did

const SR1_grouptable = SizedMatrix{100,100}(rand(1:100, 100,100))
Elt1 = Elt{LinearIndices(SR1_grouptable)=>tuple(SR1_grouptable...)};

like you wrote in your code. But I guess it doesn’t really matter if normal matrices or sized ones are used. Since everything is transformed the type anyway.

Topic		Replies	Views
Efficient global parameters Performance	12	749	July 14, 2022
Efficient reflection on structs Performance	54	2174	February 17, 2021
Constant type but changing value Performance	5	872	July 9, 2019
Global Type-Locking Internals & Design	8	644	September 28, 2018
Optimising function with const instead of struct? Performance	9	747	July 7, 2018

Long parameter for type seems very slow

Related topics