Why Julia doesn't throw error if pre-declared array is reassigned to a new differently sized array?

Another confusing thing with Julia. Why does julia not throw an error if I declare an array as

x::Vector{Float64} = zeros(100);

and then reassign as

x = rand(50);

How can I ensure bound-checking with this kind of freedom in reassignment? Is the memory reference also being reassigned if I instead do

x = rand(100);

This can create so many problems when trying to write a rock-solid memory-leak proof code. Am I missing something? Thank you so much.

Few things going on

  1. You assign to variables, not instances like arrays. This isn’t like the languages where variables are tied to data of a certain type, they’re more like labels; you can assign the same instance to multiple variables, just like putting multiple labels on something.
  2. When you do x::Vector{Float64} on the left side of an assignment, you force every assignment to that variable to semantically convert the right hand value to the annotated type and typeassert to let the compiler assume the type in following code (if the conversion fails, an error stops you from reaching this code anyway). If the compiler can infer the value must already have that type, it will remove these steps in optimization.
  3. The concrete type Vector{Float64} does not distinguish size. [1.0] and [1.2, 3.4] are both instances of this type. If you want a type that does distinguish size and would throw an error upon a mismatched assignment, use MVector from StaticArrays.jl, though that could be less performant at larger sizes. Either way, you would have to assign an instance with a matching size manually, there’s no way around that.
  4. Instead of assignment, you could mutate the vector (via a variable) at its indices. If you don’t push!, append!, or anything else that changes the size, then the size won’t change.

Basically because “reassigning” just gives a name to data and the type of vectors does not know its length.

Well, Julia has a garbage collector, thus there should be no memory leaks. Am I missing something?

2 Likes

Arrays are stored on the heap, so they’ll need to be freed by a garbage collector. The garbage collector would clear up the memory for rand(50) that x was pointing to when you run the line x=rand(100). Some Julia developers explicitly do x=nothing to force indicate to the Julia compiler that x is no longer used at that point in the code (therefore the garbage collector should remove the memory x was associated with the next time it runs), but I don’t know if this is really necessary most of the time unless you’re doing distributed computing and have garbage collection-related bugs. This video might help if you want a review on garbage collection Three garbage collectors: Java, Python, and Julia .

You can do pointer(x) to see the memory address that x (if it is of an Array type) is using.

This might help you with this topic: JuliaNotes.jl: assignment

Excepting extreme circumstances and bugs, the garbage collector will reclaim instances with no references (variables, fields, elements) left in the code, yes. But memory leaks can still happen when uncontrolled amounts of memory are allocated and kept alive by possibly unwanted references. For example, someone could make a memoization cache for recent results, but they forget to refresh it so it just builds up obsolete entries.

I ran into this issue in the context of distributed computing. And sorry, I meant if I do the following:

x::Vector{Float64} = zeros(100);
x = rand(100);

is the memory reference being reassigned or memory reallocated? I am wondering what is the internal difference from the memory point of view between running x = rand(100) and x = rand(50) after pre-declaration x::Vector{Float64} = zeros(100). And what can I do to ensure I am not reallocating memory?

Yes, I had some issues with distributed workloads in Julia v.1.9.x. I haven’t tried it that workload with the latest version of Julia. I was in a rush so I manually invoke the GC via GC.gc() instead of changing my software architecture or investigate further. I never had a problem with the garbage collector for non-distributed workloads so far.

@singularity If you run into heap memory management-related bugs, file a bug report. If you really need to get code running without changing the code, try setting the variable you are ready to free to something like nothing (or something that doesn’t allocate heap memory) then call GC.gc() in the following line in your code.

1 Like

Yes the reference x is being reassigned. New heap memory was created in the line x=rand(100), you can check it yourself via pointer. E.g.

julia> x::Vector{Float64} = zeros(100);
julia> @show pointer(x)
pointer(x) = Ptr{Float64} @0x00007ffbd0f47840
Ptr{Float64} @0x00007ffbd0f47840
julia> x = rand(100);
julia> @show pointer(x)
pointer(x) = Ptr{Float64} @0x00007ffbd0a3e6c0
Ptr{Float64} @0x00007ffbd0a3e6c0

The theory is that the garbage collector (GC) will deallocate the heap memory used to store the contents of zero(100) IF no other variable in scope is using it. For example:

julia> x = randn(3)

3-element Vector{Float64}:
  0.02116688818771611
 -2.496066399533173
  1.6321445374322279

julia> @show pointer(x)
pointer(x) = Ptr{Float64} @0x00007f555e4a3ec0
Ptr{Float64} @0x00007f555e4a3ec0

julia> y = x
3-element Vector{Float64}:
  0.02116688818771611
 -2.496066399533173
  1.6321445374322279

julia> x = nothing

julia> @show pointer(y)
pointer(y) = Ptr{Float64} @0x00007f555e4a3ec0
Ptr{Float64} @0x00007f555e4a3ec0

julia> @show pointer(x) # throws error
ERROR: MethodError: no method matching pointer(::Nothing)

The GC will not deallocate y next time it runs because y is still pointing to the location 0x00007f555e4a3ec0.

Stack-allocated variables like SVector from StaticArrays.jl won’t need the GC to free up. If you really want a C-like experience without using Array types for some special cases, then there is always the C library in Julia to allocate and free memory, e.g. Libc.free() and Libc.malloc(). An example usage case for coding like C back in 2022 (whichever Julia version that was) is to do static compilation, e.g. see Successful Static Compilation of Julia Code for use in Production.

3 Likes

Why is this important in your application? Would it really be a problem if you allocated a new array, and then the old array was garbage collected?

If it is important for some reason, then you must avoid creating any new arrays, and only work with the x you already have. This may be quite tricky for someone with little experience. In this case it’s simply

for i in eachindex(x)
    x[i] = rand()
end

@tomerarnon, I wrote a fully parallel physics code in Julia a year ago. But it was fully of memory leaks and very slow causing computers to crash. I didn’t notice any problems myself right away because I have a large desktop and access to a big supercomputer but other people couldn’t run the code on their personal computers. I assumed Julia was taking care of my naive array assignments like Fortran (which is what I used to code in long time ago). Now I am learning the hard way about in-place functions, views, and .= or [:] notation to avoid re-allocating memory. The performance gain is quite significant as garbage collection was slowing everything down earlier. In the context of this post, I just wanted to check if array re-assignments are causing memory leaks or not.

This will not work. Here you allocate a new array on the right-hand-side, and then afterwards copy it into x.

Instead do

randn!(x) 

or

x .= randn.()

The OP is on the wrong track, allocation has nothing to do with assignment. The allocation happens on the right-hand-side, when you do zero(50) or randn(50). Which variable you assign to is irrelevant.

7 Likes

Thanks, I didn’t know that. I’ll edit my previous post.

If you want to put dimensions in the type domain, you can use SizedArray or SizedVector:

julia> using StaticArrays

julia> x::SizedVector{100, Float64, Vector{Float64}} = zeros(100);

julia> x = rand(50)
ERROR: DimensionMismatch: expected input array of length 100, got length 50
Stacktrace:
 [1] dimension_mismatch_fail(::Type{SizedVector{100, Float64, Vector{Float64}}}, a::Vector{Float64})
   @ StaticArrays ~/.julia/packages/StaticArrays/85pEu/src/convert.jl:196
 [2] convert(::Type{SizedVector{100, Float64, Vector{Float64}}}, a::Vector{Float64})
   @ StaticArrays ~/.julia/packages/StaticArrays/85pEu/src/convert.jl:201
 [3] top-level scope
   @ REPL[18]:1

julia> x = rand(100)
100-element Vector{Float64}:
 0.20602787252356625
 0.5326407584784596
 0.6315752664438412
...

Regarding allocations, you probably want functions that end with a ! to reuse memory that you have already allocated.

julia> using Random

julia> rand!(x)
100-element SizedVector{100, Float64, Vector{Float64}} with indices SOneTo(100):
 0.6011787356901267
 0.7390623881486247
 0.9652497761882068
...

In the context of Julia, it is quite hard to leak memory. The garbage collector is keeping track of memory allocations and can still reclaim them. To leak memory, you would need to manually allocate memory via Libc.malloc or calling out to C or Fortran.

Perhaps what you mean is if you are needlessly allocating memory?

As the @time and @allocated macros show, the calls to rand allocate new memory.

julia> @time x = rand(100);
  0.000008 seconds (3 allocations: 1.766 KiB)

julia> @allocated x = rand(100)
1808

Rather if you want to reuse already allocated memory use rand! as I suggested above.

julia> @time rand!(x);
  0.000005 seconds

julia> @allocated rand!(x)
0

Also comapare the following approaches to zeroing out memory.

julia> @time x = zeros(100);
  0.000007 seconds (3 allocations: 1.766 KiB)

julia> @time x .= 0;
  0.000004 seconds

julia> @time fill!(x, 0);
  0.000004 seconds
3 Likes

@singularity If you need to swap “references” x and y without heap allocation, you could try x, y = y, x and other variants. It could be useful for swapping (instead of explicit overwriting) book keeping buffers inside a loop.

julia> x = randn(1)
1-element Vector{Float64}:
 -0.8474211782430282

julia> pointer(x)
Ptr{Float64} @0x00007f506d328900

julia> y = randn(1)
1-element Vector{Float64}:
 -1.042655641355737

julia> pointer(y)
Ptr{Float64} @0x00007f506ca2f500

julia> @allocated x, y = y, x # @allocated shows the heap allocation amount for this line of code, and also evaluates that line of code.
0

julia> pointer(x), pointer(y)
(Ptr{Float64} @0x00007f506ca2f500, Ptr{Float64} @0x00007f506d328900)

julia> x, y
([-1.042655641355737], [-0.8474211782430282])

Thanks, was simplifying a bit here. You can surely run into memory leaks by holding references too long – have been there in Haskell. Was just assuming that the OP comes from Fortran, C++ etc and might be worrying about memory too much – after my first language with a GC, I never looked back and just wish I had a GC for my home as well :wink:

This is a nice convenient syntax, that I like to use, but under the hood it’s no different from

temp = x
x = y
y = temp

As mentioned, assignment is free, it is object creation, like zeros(100) that causes allocations.

Not necessarily. There is a potential implicit conversion if one is not careful with type assertions and binding.

julia> a::Vector{Float64} = zeros(100);

# This converts a `UnitRange` into a `Vector{Float64}`, allocating memory
julia> @time a = 1:100
  0.000009 seconds (1 allocation: 896 bytes)
1:100

# This stores the values 1.0 to 100.0 into an existing allocation
julia> @time a .= 1:100;
  0.000004 seconds
6 Likes