@allocate in REPL, different results, same code

function ha(x)
  return (x, x)
end

function hs(x::StridedMatrix{Float64})
  return (x, x)
end

In a global scope and/or REPL:

x = eye(1);

@show @allocated ha(x)
@show @allocated ha(x)
@show @allocated hs(x)
@show @allocated hs(x)

produces

#= /Users/goretkin/tmp.jl:11 =# @allocated(ha(x)) = 0
#= /Users/goretkin/tmp.jl:12 =# @allocated(ha(x)) = 0
#= /Users/goretkin/tmp.jl:13 =# @allocated(hs(x)) = 9373
#= /Users/goretkin/tmp.jl:14 =# @allocated(hs(x)) = 32

(this is on master, 0.7.0, but basically the same holds for 0.6.0, though @show @macro prints differently I guess.)

code_native and code_llvm for ha(x)and hs(x) look identical, but the allocation is different.

If you wrap the code in a function, then all 4 @allocateds report zero:

function doit()
  x = eye(1);
  @show(@allocated ha(x))
  @show(@allocated ha(x))
  @show(@allocated hs(x))
  @show(@allocated hs(x))
end

doit()

So I guess the difference is whether the argument x is a local or global variable. The access of x when it is global must be causing some allocation (since the code_llvm/code_native are the same), though I don’t understand why it only affects the more specific method.

Second, I don’t understand why trying to run doit before ha and hs are defined, and then running it again once ha and hs are defined produces different results. Possibly ha and hs don’t get inlined?

function doit()
  x = eye(1);
  @show(@allocated ha(x))
  @show(@allocated ha(x))
  @show(@allocated hs(x))
  @show(@allocated hs(x))
end

try
  doit()
catch E
  @show E
end

function ha(x)
  return (x, x)
end

function hs(x::StridedMatrix{Float64})
  return (x, x)
end

doit()

(Came across this while investigating allocations of potrf, expecting it to be 0 in 0.7.0 (which it is). See: Storage allocation in LAPACK.potrf! - #3 by dmbates)

Don’t do it in global scope. None of these are measuring the allocation of the function.

No, accessing global do not cause any allocation, but the type can’t be inferred.

Because a valid but inefficient version of doit is compiled.

Where does the allocation come from, then? And why is it different between the two methods?

Thanks!

The function itself in this case.

Because “the type can’t be inferred”