Hi,
I would like to create a large empty dataframe of type Floats from a vector of row names and column names. For example
rn=["row1", "row2", "row3"]
cn=["col1", "col2", "col3"]
Thanks !
Hi,
I would like to create a large empty dataframe of type Floats from a vector of row names and column names. For example
rn=["row1", "row2", "row3"]
cn=["col1", "col2", "col3"]
Thanks !
It not quite clear to me how a DataFrame containing many columns where the first column column contains many row names can be describe as βemptyβ.
Do you want this?
julia> df = DataFrame()
0Γ0 DataFrame
julia> df[:rowname] = rn
3-element Array{String,1}:
"row1"
"row2"
"row3"
julia> for c in cn
df[Symbol(c)] = 0.0
end
julia> df
3Γ4 DataFrame
β Row β rowname β col1 β col2 β col3 β
β β String β Float64 β Float64 β Float64 β
βββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββ€
β 1 β row1 β 0.0 β 0.0 β 0.0 β
β 2 β row2 β 0.0 β 0.0 β 0.0 β
β 3 β row3 β 0.0 β 0.0 β 0.0 β
Thanks @johann.spies your solution is very good !
In your last example: why is the call to symbol necessary? Iβm sure it is; I just donβt understand why.
DataFrames donβt index columns by strings, but rather by Symbols. Thereβs no super deep reason why it is that way other than that Symbols are more lightweight than strings and have some nice features.
Using a Symbol, for example, makes it easier for df.a
to refer to column :a
.
Most importantly, symbols are faster to look up than strings since they are interned.
Got it. Thanks guys.
An alternative way to do it is:
julia> df = DataFrame(fill(Float64, length(cn)), Symbol.(cn), length(rn))
3Γ3 DataFrame
β Row β col1 β col2 β col3 β
β β Float64 β Float64 β Float64 β
βββββββΌβββββββββββββββΌβββββββββββββββΌβββββββββββββββ€
β 1 β 1.0735e-313 β 1.07357e-313 β 1.0735e-313 β
β 2 β 6.61729e-316 β 7.6592e-316 β 3.53922e-316 β
β 3 β 7.6592e-316 β 7.6592e-316 β 3.53922e-316 β
julia> df.rowname = rn
3-element Array{String,1}:
"row1"
"row2"
"row3"
julia> df
3Γ4 DataFrame
β Row β col1 β col2 β col3 β rowname β
β β Float64 β Float64 β Float64 β String β
βββββββΌβββββββββββββββΌβββββββββββββββΌβββββββββββββββΌββββββββββ€
β 1 β 1.0735e-313 β 1.07357e-313 β 1.0735e-313 β row1 β
β 2 β 6.61729e-316 β 7.6592e-316 β 3.53922e-316 β row2 β
β 3 β 7.6592e-316 β 7.6592e-316 β 3.53922e-316 β row3 β
A side note for the other solution is that using df[:x] = 0.0
syntax will soon be not recommended and it will become df[:x] .= 0.0
soon (using broadcasting).
Thank you @bkamins your solution is great too !
I have a question : what is the best way to change some values in the dataframe using the colnames and the rownames ? I tried something like
df[[:rowname == "row1"], :col3] = 5
I assume the row-name is unique. In this case what you should do is:
df[findfirst(=="row1", df.rowname), :col3] = 5
Alternative way to write is e.g.:
df.col3[df.rowname .== "row1"] .= 5
This will also work if the column names are not unique.
@bkamins very interesting ! thanks !
It is also possible to change one column at once
julia> df.col3 .= [1,2,3]
3-element Array{Float64,1}:
1.0
2.0
3.0
julia> df
3Γ3 DataFrame
β Row β col1 β col2 β col3 β
β β Float64 β Float64 β Float64 β
βββββββΌβββββββββββββββΌβββββββββββββββΌββββββββββ€
β 1 β 9.88131e-324 β 6.92857e-310 β 1.0 β
β 2 β 4.94066e-324 β 6.92857e-310 β 2.0 β
β 3 β 6.92859e-310 β 6.92857e-310 β 3.0 β