I love broadcasting over arrays. I love that in Julia I can define func(x)
where x is e.g. a namedtuple and then call it on func.(x)
where x is a vector of namedtuples.
However… I’m having difficulty extending this to DataFrames.jl - I expect this to be straightforwad as DataFrames have the same “shape” as vectors of namedtuples. But how exactly do I do this?
Assume that the implied interface to use this function is that the input is always guaranteed to have properties a
and b
. Here’s an example of one such function:
function func(x)
x.a + x.b^2 + 5
end
df = DataFrame(
a=[1,2,3],
b=[1.,3,4],
c=[10.,20.,30.],
)
func(df)
errors because it expects that the properties are things that it can ^2
, but if the input is a DataFrame that property is a Vector.
func.(df)
also errors because it broadcasts over all cells, so it expects that the first row of the first columns (which is an Int64 in this case) has properties a
and b
, so it fails.
I’m aware of the Tables.jl interface, but that would require me to either have an if Tables.istable
or define a new trait-like method the Tables case. I don’t like either of the options. Ideally I don’t want to write a vector-like func
, I want some broadcast-like mechanism that would work on DataFrames as well. Defining a func
method for DataFrames would be acceptable as long as it contained no logic, i.e. it should passback to the generic func
. So for example, writing
function func(x::DataFrame)
x.a .+ x.b .^ 2 .+ 5
end
would also not be a good solution.
EDIT: I wonder if there’s some lower level call of broadcast that would allow to control this. Checking
EDIT2: Oh look a blog post about broadcasting by Bogumil the DataFrames.jl guy Broadcast fusion in Julia: all you need to know to avoid pitfalls | Blog by Bogumił Kamiński Unfortunately doesn’t mention DataFrames