I’m running v0.5 memory allocation analysis of the following code and cannot understand the allocations occurring on the line with ^ compared with using *
function myTest(n)
a=zeros(Float64,100)
for i=1:100
a[i]=2.5^2.
end
end
function myTest1(n)
a=zeros(Float64,100)
for i=1:100
a[i]=2.5*2.5
end
end
myTest(1)
myTest1(1)
Profile.clear_malloc_data()
myTest(1)
myTest1(1)
Memory allocation output as follows:
 function myTest(n)
 a=zeros(Float64,100)
1440 for i=1:100
6400 a[i]=2.5^2.
 end
 end

 function myTest1(n)
 a=zeros(Float64,100)
1440 for i=1:100
0 a[i]=2.5*2.5
 end
 end

 myTest(1)
 myTest1(1)
 Profile.clear_malloc_data()

 myTest(1)
 myTest1(1)


Further information. Output given from memory allocation was with inline=no. When not including the inline flag there is no allocation on the ^ line. So interested to know what is the correct answer?
For some reason the product operation gets optimized into a constant number, but the power doesn’t here.
I would note if you left out the dot on the 2., so it was just 2.5^2, then they become the same. The optimization seems to work for integer powers but not float powers.
This is due to the fact that constants inline, and function calls whose values can be determined by type information will inline the values as well (since then the values are determined not at runtime, but at compile time), but function calls whose output depends on the input values cannot inline. So Julia actually will compile away:
a=zeros(Float64,100)
for i=1:100
a[i]=2.5*2.5
end
to be the value since that value can be computed during compilation, whereas
for i=1:100
a[i]=2.5^2.
end
is not computable at compilation time.
So yes, what you’re actually measuring here isn’t * vs ^, rather it’s compiler optimizations on constant values. This is why you should always double check microbenchmarks to make sure you don’t have everything constant in a way the compiler can just “solve” it. For example, square using a random number instead of 2.5.
So other than the branch (the conditional: in very very performant code you can sometimes measure a difference due to branch predictions), the code exactly just multiplying when the power is 2.
Thanks. My main concern is the memory allocation. I’m running more complex calculation with powers and the memory allocation is strangely large on the line  when I expected no memory allocation.
Are any of these arrays? e.g. why are you using .^ rather than ^? “Vectorized” operations on arrays generally allocate new arrays for their results.
It could also be a type instability, as someone mentioned, e.g. maybe you’ve changed the type of fp or Gamma somewhere (or they are nonconst globals, etc.). @code_warntype is useful here to detect whether type inference has succeeded.
Only s.f is any array which I’m looping over. I’ve run code_warntype  see below on the full function. It seems to be ok. I’ve also pulled out the boolean selectors and used an if statement on the SigmaA and SigmaB, but that didn’t change anything.
Should a line like this have any memory allocations?
Full loop
for i=1:nf
if s.f[i]<fp
sigma=SigmaA
else
sigma=SigmaB
end
s.Sf[i]= g^2*(2.*pi)^4* s.f[i]^5* exp(5/4.*(s.f[i]/fp).^4)*
Gamma^(exp(1 * (((s.f[i]fp)^2) /
(2*sigma^2*fp^2)) ))
end
What’s this type? Could you please share its definition? Is there a reason why it’s not parameteric? I would assume it would be parameterized in order to be strictly typed, and the typing of its fields might be the issue.
You should . broadcast this as much as possible, like exp.() instead of exp so that way it will fuse (and it will work on v0.6). On v0.5, not all of these will fuse though, so this is a case where moving to v0.6 might give a performance boost (unless you use things like (^).(x,y) a bunch of places, which just looks nasty)
type waveSpec
f::Array{Float64,1}
df::Array{Float64,1}
th::Array{Float64,1}
dth::Array{Float64,1}
Sf::Array{Float64,1}
Sth::Array{Float64,1}
S::Array{Float64,2}
function waveSpec()
t=new()
t.f=makeWaveSpecFreq()
t.df=makeWaveSpecDelFreq(t.f)
return t
end
end
Constructors
function makeWaveSpecFreq()
f=[1/x for x=30:0.05:2]
return f
end
function makeWaveSpecDelFreq(f::Array{Float64,1})
df=copy(f)
nf=length(f)
df[1]=f[2]f[1]
df[2:(nf1)]=( f[3:nf]f[2:(nf1)] )/2 +
( f[2:(nf1)]f[1:(nf2)] )/2
df[nf]=f[nf]f[nf1]
return df
end
I see. For future reference I would make this a little less strict:
type waveSpec{T<:Number}
f::Vector{T}
df::Vector{T}
th::Vector{T}
dth::Vector{T}
Sf::Vector{T}
Sth::Vector{T}
S::Matrix{T}
function waveSpec()
t=new()
t.f=makeWaveSpecFreq()
t.df=makeWaveSpecDelFreq(t.f)
return t
end
end
since if you don’t restrict to Float64 everywhere, then your code will naturally work with other numbers (arbitrary precision, etc). That’s why I found it odd there was no type parameter on it when it was holding numbers.
But okay, this is strictly typed so there is no instability here. It must be the temporary allocations due to broadcasting then.
It sounds like this is not an array operation (the computations of s.Sf[i] are purely scalar), in which case admonitions about broadcast are inapplicable.
Compiler inlining is an optimization which will pretty much always be done to stop this exact problem. So you should profile with inlining on because that’s what users will see.
Ok. That is clear. I’m running with version v0.5 and with inlining turned on its causes memory allocations to pop up on lines that are not correct (this came up in one of my previous posts). So I guess I have to wait until I can get a version 0.6 working for which I understand this issue has been corrected.