Initiate multiple arrays from a set of names with comprehension

I’m trying to understand how to initiate multiple arrays from a set of names. To initiate a new vector I would do something like

j = Vector{Float64}(undef,10)
typeof(j)
Vector{Float64} (alias for Array{Float64, 1})

but suppose I want to initiate multiple vectors with a set of names.

entries = ("a", "b","c")

something like

[x = Vector{Float64}(undef,10) for x in entries]

creates the vectors but does not assign them to the variables in entries.

3-element Vector{Vector{Float64}}:
 [0.0, 6.52171459e-315, 5.43316358e-314, 0.0, 1.5e-323, 5.43316358e-314, 2.0e-323, 1.0e-323, 5.43316358e-314, 3.5e-323]
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.52171459e-315]
 [1.0e-323, 5.43316358e-314, 5.0e-323, 3.0e-323, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

but then

typeof(b)
ERROR: UndefVarError: b not defined

likewise

[x => Vector{Float64}(undef,10) for x in entries]

3-element Vector{Pair{String, Vector{Float64}}}:
 "a" => [5.4e-323, 3.0e-323, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 "b" => [2.0e-323, 5.433169509e-314, 2.5e-323, 1.0e-323, 5.433169509e-314, 4.0e-323, 1.0e-323, 5.433169509e-314, 5.4e-323, 3.0e-323]
 "c" => [0.0, 0.0, 0.0, 0.0, 0.0, 6.52171459e-315, 5.433169509e-314, 0.0, 2.0e-323, 5.433169509e-314]

Returns the same UndefVarError when the variables are called. I know I am missing something elementary here because the behavior is consistent when I try mapping.

map(x-> Vector{Float64}(undef,10),ent)

Thanks for your patience! Any pointers would be greatly appreciated!

awesome thanks so much this is really helpful! I’m not sure if its because I’m using version 1.8.3 but when I enter in

dict = [x => Vector{Float64}(undef, 10) for x in entries]

I don’t get a dictionary but a vector

julia> dict = [x => Vector{Float64}(undef, 10) for x in entries]
3-element Vector{Pair{String, Vector{Float64}}}:
 "a" => [5.3696034695e-314, 0.0, 1.0e-323, 5.3696034695e-314, 1.5e-323, 1.0e-323, 5.3696034695e-314, 3.0e-323, 1.0e-323, 5.3696034695e-314]
 "b" => [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.52171459e-315, 5.3696034695e-314, 0.0]
 "c" => [4.4e-323, 3.0e-323, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

julia> typeof(dict)
Vector{Pair{String, Vector{Float64}}} (alias for Array{Pair{String, Array{Float64, 1}}, 1})

Then when I try to access the entries I get an error

julia> dict["a"]
ERROR: ArgumentError: invalid index: "a" of type String
Stacktrace:
 [1] to_index(i::String)
   @ Base ./indices.jl:300
 [2] to_index(A::Vector{Pair{String, Vector{Float64}}}, i::String)
   @ Base ./indices.jl:277
 [3] to_indices
   @ ./indices.jl:333 [inlined]
 [4] to_indices
   @ ./indices.jl:325 [inlined]
 [5] getindex(A::Vector{Pair{String, Vector{Float64}}}, I::String)
   @ Base ./abstractarray.jl:1241
 [6] top-level scope
   @ REPL[161]:1

But if I try

dict2 =Dict([x => Vector{Float64}(undef, 10) for x in entries])

it seems to work. Also is there a way to do this without using a dictionary? My ultimate goal is to initiate these vectors in a function so that I can populate them and then add them to a DataFrame like this via bkamins post here How to initialize empty dataframe of specified size

julia> function test2()
           nt = (a=Vector{Int}(undef, 10^6),
                 b=Vector{String}(undef, 10^6),
                 c=Vector{Int}(undef, 10^6))
           for i in 1:10^6
               nt.a[i] = 1
               nt.b[i] = "1"
               nt.c[i] = 1.0
           end
           return DataFrame(nt, copycols=false)
       end
test2 (generic function with 1 method)

Will using a dictionary as opposed to just vectors affect performance?

This is not valid syntax for creating a dictionary, and I don’t think it ever has been. Has this code been run on an actual Julia instance, or has it been β€˜generated’ somehow?

Remove the [] for better performance.

In general, you should never dynamically create variables from strings. Using a dictionary is a much better way, though if you ultimately need this for a Dataframe, there may be other approaches. But don’t dynamically create variables.

1 Like

Seems very likely that @devvinish’s post is chatGPT nonsense.

2 Likes

If you know the names of the variables a priori, you can do

(a, b, c) = [Vector{Float64}(undef, 10) for _ in 1:3]

But if you just have entries (and if the elements of entries can be arbitrary), then creating a dictionary is probably the way to go.

1 Like

Thanks so much for help clearing that up, this is super helpful. If you don’t mind a quick follow up why does

dict2 =Dict(x => Vector{Float64}(undef, 10) for x in entries)

julia> @benchmark dict3 =Dict(x => Vector{Float64}(undef, 10) for x in entries)
BenchmarkTools.Trial: 10000 samples with 199 evaluations.
 Range (min … max):  440.744 ns …  1.789 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     495.603 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   530.629 ns Β± 83.013 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

     β–‚β–…β–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–β–β–β–β–β–β–        ▁▁▁▁▁▁▁▁▁▁ ▁▁       β–ƒ
  β–„β–‚β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–‡ β–ˆ
  441 ns        Histogram: log(frequency) by time       813 ns <

 Memory estimate: 976 bytes, allocs estimate: 8.

work faster than

dict2 =Dict([x => Vector{Float64}(undef, 10) for x in entries])

julia> @benchmark dict3 =Dict([x => Vector{Float64}(undef, 10) for x in entries])
BenchmarkTools.Trial: 10000 samples with 194 evaluations.
 Range (min … max):  511.387 ns …  1.157 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     560.778 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   599.912 ns Β± 89.956 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

    β–β–…β–‡β–ˆβ–‡β–‡β–†β–…β–…β–„β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β– ▁ ▁  ▁▁ ▁▁▁▁▁▁▁▁▁ ▁          β–‚
  β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–‡β–‡β–‡β–† β–ˆ
  511 ns        Histogram: log(frequency) by time       910 ns <

 Memory estimate: 1.06 KiB, allocs estimate: 9.

Is it because the Dict() function doesn’t have to go through an additional layer of comprehension?

awesome thanks! This works great!

It’s because Dict([]) first creates a vector which is then converted to a dictionary. Dict() directly creates the dictionary.

1 Like