Memory allocation usage high with power (^) compared with x*x


#1

I’m running v0.5 memory allocation analysis of the following code and cannot understand the allocations occurring on the line with ^ compared with using *

function myTest(n)
  a=zeros(Float64,100)
  for i=1:100
    a[i]=2.5^2.
  end
end

function myTest1(n)
  a=zeros(Float64,100)
  for i=1:100
    a[i]=2.5*2.5
  end
end

myTest(1)
myTest1(1)
Profile.clear_malloc_data()

myTest(1)
myTest1(1)

Memory allocation output as follows:

    - function myTest(n)
    -   a=zeros(Float64,100)
 1440   for i=1:100
 6400     a[i]=2.5^2.
    -   end
    - end
    - 
    - function myTest1(n)
    -   a=zeros(Float64,100)
 1440   for i=1:100
    0     a[i]=2.5*2.5 
    -   end
    - end
    - 
    - myTest(1)
    - myTest1(1)
    - Profile.clear_malloc_data()
    - 
    - myTest(1)
    - myTest1(1)
    - 
    -

#2

Further information. Output given from memory allocation was with --inline=no. When not including the inline flag there is no allocation on the ^ line. So interested to know what is the correct answer?

Using v0.5.


#3

I looked at the @code_llvm, and in the first function I see

%17 = call double @"julia_^_70720"(double 2.500000e+00, double 2.000000e+00)

and in the second

store double 6.250000e+00, double* %23, align 8

For some reason the product operation gets optimized into a constant number, but the power doesn’t here.

I would note if you left out the dot on the 2., so it was just 2.5^2, then they become the same. The optimization seems to work for integer powers but not float powers.


#4

This is due to the fact that constants inline, and function calls whose values can be determined by type information will inline the values as well (since then the values are determined not at runtime, but at compile time), but function calls whose output depends on the input values cannot inline. So Julia actually will compile away:

a=zeros(Float64,100)
for i=1:100
  a[i]=2.5*2.5 
end

to be the value since that value can be computed during compilation, whereas

for i=1:100
  a[i]=2.5^2.
end

is not computable at compilation time.

So yes, what you’re actually measuring here isn’t * vs ^, rather it’s compiler optimizations on constant values. This is why you should always double check microbenchmarks to make sure you don’t have everything constant in a way the compiler can just “solve” it. For example, square using a random number instead of 2.5.

As an extra little note, ^ actually has a branch for low powers to do iteration. Here’s the source for it: https://github.com/JuliaLang/julia/blob/master/base/intfuncs.jl#L159

So other than the branch (the conditional: in very very performant code you can sometimes measure a difference due to branch predictions), the code exactly just multiplying when the power is 2.


#5

Thanks. My main concern is the memory allocation. I’m running more complex calculation with powers and the memory allocation is strangely large on the line - when I expected no memory allocation.


#6

That sounds like a type-instability.


#7

Heres the equation

 s.Sf[i]= g^2*(2.*pi)^-4* s.f[i]^-5* exp(-5/4.*(s.f[i]/fp).^-4)*
          Gamma^(exp(-1 * (((s.f[i]-fp)^2) /
          (2*(SigmaA*(s.f[i]<fp) + SigmaB*(s.f[i]>=fp))^2*fp^2)) ))

What do you think?


#9
s.Sf[i]= g^2*(2.*pi)^-4* s.f[i]^-5* exp(-5/4.*(s.f[i]/fp).^-4)*
          Gamma^(exp(-1 * (((s.f[i]-fp)^2) /
          (2*(SigmaA*(s.f[i]<fp) + SigmaB*(s.f[i]>=fp))^2*fp^2)) ))

Are any of these arrays? e.g. why are you using .^ rather than ^? “Vectorized” operations on arrays generally allocate new arrays for their results.

It could also be a type instability, as someone mentioned, e.g. maybe you’ve changed the type of fp or Gamma somewhere (or they are non-const globals, etc.). @code_warntype is useful here to detect whether type inference has succeeded.


#10

Only s.f is any array which I’m looping over. I’ve run code_warntype - see below on the full function. It seems to be ok. I’ve also pulled out the boolean selectors and used an if statement on the SigmaA and SigmaB, but that didn’t change anything.

Should a line like this have any memory allocations?

Full loop

 for i=1:nf
    if s.f[i]<fp
      sigma=SigmaA
    else
      sigma=SigmaB
    end
    s.Sf[i]= g^2*(2.*pi)^-4* s.f[i]^-5* exp(-5/4.*(s.f[i]/fp).^-4)*
          Gamma^(exp(-1 * (((s.f[i]-fp)^2) /
          (2*sigma^2*fp^2)) ))
  end
Variables:
  #self#::MetoceanTools.#makeSpecJONSWAP
  inParm::Array{Float64,1}
  Hs::Float64
  Tp::Float64
  Gamma::Float64
  SigmaA::Float64
  SigmaB::Float64
  #temp#@_8::Int64
  s::MetoceanTools.waveSpec
  g::Float64
  fp::Float64
  nf::Int64
  sumS::Float64
  alpha::Float64
  #temp#@_15::Int64
  i@_16::Int64
  #temp#@_17::Int64
  i@_18::Int64
  #temp#@_19::Int64
  sigma::Float64
  i@_21::Int64

.mem output of loop

        0   nf=length(s.f)
266831936   s.Sf=zeros(nf)
        0   for i=1:nf
        0     if s.f[i]<fp
        0       sigma=SigmaA
        -     else
        0       sigma=SigmaB
        -     end
2050585152     s.Sf[i]= g^2*(2.*pi)^-4* s.f[i]^-5* exp(-5/4.*(s.f[i]/fp).^-4)*
        -           Gamma^(exp(-1 * (((s.f[i]-fp)^2) /
        -         (2*sigma^2*fp^2)) ))
        -   end

NB. All happening in functions - not in global scope


#11

What’s this type? Could you please share its definition? Is there a reason why it’s not parameteric? I would assume it would be parameterized in order to be strictly typed, and the typing of its fields might be the issue.


#12

You should . broadcast this as much as possible, like exp.() instead of exp so that way it will fuse (and it will work on v0.6). On v0.5, not all of these will fuse though, so this is a case where moving to v0.6 might give a performance boost (unless you use things like (^).(x,y) a bunch of places, which just looks nasty)


#13

Type defintion

type waveSpec
 f::Array{Float64,1}
 df::Array{Float64,1}
 th::Array{Float64,1}
 dth::Array{Float64,1}
 Sf::Array{Float64,1}
 Sth::Array{Float64,1}
 S::Array{Float64,2}
 function waveSpec()
   t=new()
   t.f=makeWaveSpecFreq()
   t.df=makeWaveSpecDelFreq(t.f)
   return t
 end
end

Constructors

function makeWaveSpecFreq()
  f=[1/x for x=30:-0.05:2]
  return f
end
function makeWaveSpecDelFreq(f::Array{Float64,1})
  df=copy(f)
  nf=length(f)
  df[1]=f[2]-f[1]
  df[2:(nf-1)]=( f[3:nf]-f[2:(nf-1)]     )/2 +
  ( f[2:(nf-1)]-f[1:(nf-2)] )/2
  df[nf]=f[nf]-f[nf-1]
  return df
end

#14

I see. For future reference I would make this a little less strict:

type waveSpec{T<:Number}
 f::Vector{T}
 df::Vector{T}
 th::Vector{T}
 dth::Vector{T}
 Sf::Vector{T}
 Sth::Vector{T}
 S::Matrix{T}
 function waveSpec()
   t=new()
   t.f=makeWaveSpecFreq()
   t.df=makeWaveSpecDelFreq(t.f)
   return t
 end
end

since if you don’t restrict to Float64 everywhere, then your code will naturally work with other numbers (arbitrary precision, etc). That’s why I found it odd there was no type parameter on it when it was holding numbers.

But okay, this is strictly typed so there is no instability here. It must be the temporary allocations due to broadcasting then.


#15

It sounds like this is not an array operation (the computations of s.Sf[i] are purely scalar), in which case admonitions about broadcast are inapplicable.


#17

^ between scalars does not cause any allocations in normal use. It happens to allocate when you turn off inlining but why do you worry about that?


#18

Oh, didn’t even realize this was part of it.

Compiler inlining is an optimization which will pretty much always be done to stop this exact problem. So you should profile with inlining on because that’s what users will see.


#19

Ok. That is clear. I’m running with version v0.5 and with inlining turned on its causes memory allocations to pop up on lines that are not correct (this came up in one of my previous posts). So I guess I have to wait until I can get a version 0.6 working for which I understand this issue has been corrected.