Iβm learning from the wonderful Julia Data Science book. In chapter 4.7 Variable Transformations we have the following DataFrame
example:
function grades_2020()
name = ["Sally", "Bob", "Alice", "Hank"]
grade_2020 = [1, 5, 8.5, 4]
DataFrame(; name, grade_2020)
end
A transformation function is provided as:
plus_one(grades) = grades .+ 1
and several examples of the transformations have been provided as well:
1.
transform(grades_2020(), :grade_2020 => plus_one => :grade_2020)
2.
select(grades_2020(), :, :grade_2020 => plus_one => :grade_2020)
3.
df = grades_2020()
df.grade_2020 = plus_one.(df.grade_2020)
As you can see, in all three examples columns/vectors are selected by explicitly naming the column of interest. This is trivial as there is only one Float64
column. But what if I had 50, 100, 1000 columns?
The third example allows me to do something like:
3a.
df = grades_2020_to_2050()
df[!, 2:50] = plus_one.(df[!, 2:50])
However, the book advises against this:
But, although the last example is easier since it builds on more native Julia operations, we strongly advise to use the functions provided by
DataFrames.jl
in most cases because they are more capable and easier to work with.
Now, Iβve been reading the docs on how to take a DataFrame subset, and Iβm trying to apply it to examples 1 and 2 above but I cannot make it work. Typical error being method error, no method matching getindex.
Which method should I use to achieve the selection as per my example 3a., if Iβm using DataFrame
functions/methods? Also, is there a way to specify random, i.e. non-sequential columns, when doing this? Say, instead of columns 2 to 50, columns 3 to 5 then 7 to 12 then 24 to 37 instead?
Thanks in advance.