Hmm Iβm not sure that ever worked. What did you expend df.col1 to return after you ran that code?
Do you want a for loop?
df = DataFrame()
for c in [:col1, :col2, :col3]
df[:, c] = []
end
Note the :. In more recent version of DataFrames you need to specify both dimensions when indexing a data frame. df[:col] and df[[:col1, :col2]] are both deprecated
Iβm very certain it worked prior to the new pkg update.
Right now, I also have to deal with changing from df[:col] and df[[:col1, :col2]] to df[!, :col] and df[!, [:col1, :col2]] in my code.
I expect it to return an empty dataframe with the columns I specified:
0Γ3 DataFrame
I intend to populate that dataframe row by row in the next step, hence why I need an empty dataframe with specified columns.
For the record, your for-loop solution works, but I prefer @bkamins one liner.
Currently, Iβm pushing a 1xn Array{Any, 2} row-by-row, but Iβm going to switch it to a Dict instead, which I think is much safer approach, now that I know cols=:union exists.
I actually have both cases, one where DataFrame([:c1, :c2, :c3] .=> Ref([])) is most useful and another where insertcols!(df, ([:col1, :col2, :col3] .=> Ref([]))...) is most suitable.
what does the β=>β operator do ? Is that DataFrames specific or a Julia operator ?
p.s. is Ref() necessary to make sure that each new column doesnβt get the same empty list ?
is Ref() necessary to make sure that each new column doesnβt get the same empty list ?
Ref is necessary for the broadcasting to work. DataFrames.jl automatically takes care that the column [] is copied and not reused (you cound turn it off with copycols=false but in your case do not do this).
Great, thank you for taking the time to write that, itβs really useful (and thank you also for, you know, taking the time to write DataFrames !)
Also. shame on me. Juliaβs new and improved help mode can tell me what β=>β is (the online documentation doesnβt work well for operators).
help?> =>
search: =>
Pair(x, y)
x => y
Construct a Pair object with type Pair{typeof(x), typeof(y)}. The elements
are stored in the fields first and second. They can also be accessed via
iteration (but a Pair is treated as a single "scalar" for broadcasting
operations).
...
If you need any ideas for future blog post, may I suggest writing about how to optimize performance when using DataFrames?
For example, I noted in the previous pkg version that the difference in computation time performance between filter(row -> row.col1 == x, df) and df[df[:col1 ] .== x, :]
was very significant. For a particular DataFrame, I measured the average time over 1000 runs, and the results were: