How to create an empty dataframe from a vector of row names and column names

Hi,

I would like to create a large empty dataframe of type Floats from a vector of row names and column names. For example

rn=["row1", "row2", "row3"]
cn=["col1", "col2", "col3"]

Thanks !

It not quite clear to me how a DataFrame containing many columns where the first column column contains many row names can be describe as β€œempty”.

Do you want this?

julia> df = DataFrame()
0Γ—0 DataFrame

julia> df[:rowname] = rn
3-element Array{String,1}:
 "row1"
 "row2"
 "row3"
julia> for c in cn
          df[Symbol(c)] = 0.0
          end

julia> df
3Γ—4 DataFrame
β”‚ Row β”‚ rowname β”‚ col1    β”‚ col2    β”‚ col3    β”‚
β”‚     β”‚ String  β”‚ Float64 β”‚ Float64 β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ row1    β”‚ 0.0     β”‚ 0.0     β”‚ 0.0     β”‚
β”‚ 2   β”‚ row2    β”‚ 0.0     β”‚ 0.0     β”‚ 0.0     β”‚
β”‚ 3   β”‚ row3    β”‚ 0.0     β”‚ 0.0     β”‚ 0.0     β”‚
2 Likes

Thanks @johann.spies your solution is very good !

In your last example: why is the call to symbol necessary? I’m sure it is; I just don’t understand why.

DataFrames don’t index columns by strings, but rather by Symbols. There’s no super deep reason why it is that way other than that Symbols are more lightweight than strings and have some nice features.

Using a Symbol, for example, makes it easier for df.a to refer to column :a.

2 Likes

Most importantly, symbols are faster to look up than strings since they are interned.

2 Likes

Got it. Thanks guys.

An alternative way to do it is:

julia> df = DataFrame(fill(Float64, length(cn)), Symbol.(cn), length(rn))
3Γ—3 DataFrame
β”‚ Row β”‚ col1         β”‚ col2         β”‚ col3         β”‚
β”‚     β”‚ Float64      β”‚ Float64      β”‚ Float64      β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0735e-313  β”‚ 1.07357e-313 β”‚ 1.0735e-313  β”‚
β”‚ 2   β”‚ 6.61729e-316 β”‚ 7.6592e-316  β”‚ 3.53922e-316 β”‚
β”‚ 3   β”‚ 7.6592e-316  β”‚ 7.6592e-316  β”‚ 3.53922e-316 β”‚

julia> df.rowname = rn
3-element Array{String,1}:
 "row1"
 "row2"
 "row3"

julia> df
3Γ—4 DataFrame
β”‚ Row β”‚ col1         β”‚ col2         β”‚ col3         β”‚ rowname β”‚
β”‚     β”‚ Float64      β”‚ Float64      β”‚ Float64      β”‚ String  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0735e-313  β”‚ 1.07357e-313 β”‚ 1.0735e-313  β”‚ row1    β”‚
β”‚ 2   β”‚ 6.61729e-316 β”‚ 7.6592e-316  β”‚ 3.53922e-316 β”‚ row2    β”‚
β”‚ 3   β”‚ 7.6592e-316  β”‚ 7.6592e-316  β”‚ 3.53922e-316 β”‚ row3    β”‚

A side note for the other solution is that using df[:x] = 0.0 syntax will soon be not recommended and it will become df[:x] .= 0.0 soon (using broadcasting).

3 Likes

Thank you @bkamins your solution is great too !

I have a question : what is the best way to change some values in the dataframe using the colnames and the rownames ? I tried something like
df[[:rowname == "row1"], :col3] = 5

I assume the row-name is unique. In this case what you should do is:

df[findfirst(=="row1", df.rowname), :col3] = 5

Alternative way to write is e.g.:

df.col3[df.rowname .== "row1"] .= 5

This will also work if the column names are not unique.

1 Like

@bkamins very interesting ! :wink: thanks !

It is also possible to change one column at once :nerd_face:

julia> df.col3 .= [1,2,3]
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

julia> df
3Γ—3 DataFrame
β”‚ Row β”‚ col1         β”‚ col2         β”‚ col3    β”‚
β”‚     β”‚ Float64      β”‚ Float64      β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 9.88131e-324 β”‚ 6.92857e-310 β”‚ 1.0     β”‚
β”‚ 2   β”‚ 4.94066e-324 β”‚ 6.92857e-310 β”‚ 2.0     β”‚
β”‚ 3   β”‚ 6.92859e-310 β”‚ 6.92857e-310 β”‚ 3.0     β”‚