What does Ref on one hand and "aliasing a vector" on the other mean?

Hi there,

I started following the first IJulia notebook, constructors, of Kamiński’s nice tutorial on dataframes and found that, right after the construction of an example DataFrame:

x = Dict("A" => [1,2], "B" => [true, false], "C" => ['a', 'b'], "fixed" => Ref([1,1]))
DataFrame(x)

he states: “This time we used Ref to protect a vector from being treated as a column and forcing broadcasting it into every row of :fixed column (note that the [1,1] vector is aliased in each row)” . I have two questions (maybe somehow related?), both regarding the use of “fixed” => Ref([1,1]) as the last key => value pair of the dictionary, as opposed to fixed=1 in a previous example:
(1) which role does Ref exactly play here? I somehow get the hang of it from the explanation stated, but I would rather like to have a little bit more of explanation of what happens under the hood, perhaps in terms of memory allocation/addresses, if that makes sense at all.
(2) I do not get even a hint of what the phrase: “the [1,1] vector is aliased in each row”. I guess this is again somehow related to memory issues/representation, but could someone make this a little bit clearer? Perhaps, contrasting with some other coding where this so-called aliasing does not happen, and which concrete consequences would emerge…
Thanks

I guess, the answer to (1) is that the resulting DataFrame is

A B C fixed
1 true 'a' [1, 1]
2 false 'b' [1, 1]

and without Ref it’d be

A B C fixed
1 true 'a' 1
2 false 'b' 1

“Aliasing” of arrays (or any mutable objects, really) just means having different names (aliases) to refer to the same actual object in memory. I.e., if you mutate the array in any row in the fixed column, you’ll see the change everywhere in the column.

Compare to the following:

julia> aliased = fill([], 3)
3-element Vector{Vector{Any}}:
 []
 []
 []
 
julia> push!(aliased[1], :foo)
1-element Vector{Any}:
 :foo

julia> aliased
3-element Vector{Vector{Any}}:
 [:foo]
 [:foo]
 [:foo]

julia> nonaliased = [[] for _ in 1:3]
3-element Vector{Vector{Any}}:
 []
 []
 []
 
julia> push!(nonaliased[1], :foo)
1-element Vector{Any}:
 :foo

julia> nonaliased
3-element Vector{Vector{Any}}:
 [:foo]
 []
 []
2 Likes

The help entry of Ref is quite complete, maybe it answers some of your doubts:

help?> Ref
search: Ref WeakRef prevfloat UndefRefError GlobalRef uppercasefirst lowercasefirst ProcessFailedException searchsortedfirst

  Ref{T}

  An object that safely references data of type T. This type is guaranteed to point to valid, Julia-allocated memory of the correct
  type. The underlying data is protected from freeing by the garbage collector as long as the Ref itself is referenced.

  In Julia, Ref objects are dereferenced (loaded or stored) with [].

  Creation of a Ref to a value x of type T is usually written Ref(x). Additionally, for creating interior pointers to containers
  (such as Array or Ptr), it can be written Ref(a, i) for creating a reference to the i-th element of a.

  Ref{T}() creates a reference to a value of type T without initialization. For a bitstype T, the value will be whatever currently
  resides in the memory allocated. For a non-bitstype T, the reference will be undefined and attempting to dereference it will result
  in an error, "UndefRefError: access to undefined reference".

  To check if a Ref is an undefined reference, use isassigned(ref::RefValue). For example, isassigned(Ref{T}()) is false if T is not
  a bitstype. If T is a bitstype, isassigned(Ref{T}()) will always be true.

  When passed as a ccall argument (either as a Ptr or Ref type), a Ref object will be converted to a native pointer to the data it
  references. For most T, or when converted to a Ptr{Cvoid}, this is a pointer to the object data. When T is an isbits type, this
  value may be safely mutated, otherwise mutation is strictly undefined behavior.

  As a special case, setting T = Any will instead cause the creation of a pointer to the reference itself when converted to a
  Ptr{Any} (a jl_value_t const* const* if T is immutable, else a jl_value_t *const *). When converted to a Ptr{Cvoid}, it will still
  return a pointer to the data region as for any other T.

  A C_NULL instance of Ptr can be passed to a ccall Ref argument to initialize it.

  Use in broadcasting
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  Ref is sometimes used in broadcasting in order to treat the referenced values as a scalar.

...

I think the most important thing here is that Ref([1,1]) is in no way special, you could obtain the same result using ([1, 1],) (i.e., wrapping it in a single-element tuple). What you are doing is that, instead of giving an vector of two elements that would be interpreted as a column, you are giving a container that is considered a scalar so instead of it being iterated so each row receives an element, the whole content is repeatedly assigned to each row. Numbers actually have the same property as unary Tuples and Refs, they are an scalar container of their own value:

julia> for number in 100; println(number); end
100

julia> for number in Ref(100); println(number); end
100

julia> for number in (100,); println(number); end
100
1 Like

That is true, but its various uses (broadcast prevention, single-element container, ccall interface) are almost totally unrelated or at best weakly related from a user perspective, so it can be kind of overwhelming.

3 Likes

I had already looked through it, but, as @Tamas_Papp called attention to, it is somewhat intitmidating… I am however digesting it little by little, and coming back as seems fit.

1 Like

Don’t feel bad about this, it is an artifact of some planned syntax that has not happened as planned (yet). Cf

I mentioned that because you said you wanted to understand what is going “under the hood”. :slight_smile:

Anyway, I am not sure if that is clear already, but the idea is to pass to a broadcasting function something that could be broadcasted, but should not. For example:

julia> x = [1,2]
2-element Vector{Int64}:
 1
 2

julia> y = [1,2]
2-element Vector{Int64}:
 1
 2

julia> x .* y  # multiply element-by-element
2-element Vector{Int64}:
 1
 4

julia> x .* Ref(y)  # multiply each element of x by vector y
2-element Vector{Vector{Int64}}:
 [1, 2]
 [2, 4]