I do notice the names of the columns much match In Julia (seems sensible) and also length or you get:
ERROR: DimensionMismatch(“arrays could not be broadcast to a common size; got a dimension with lengths 5 and 4”)
so I checked in R, and it’s the same unless a multiple of the length:
A ← 1:2
B ← 1:3
 2 4 4
In A + B : longer object length is not a multiple of shorter object length
B ← 1:4
 2 4 4 6
I was maybe expecting NA or NaN for extra rows (is there an easy way?) but more importantly any idea about what’s the idea behind the repeating/multiple R behavior and if Julia’s DataFrames should support such (maybe optionally).
This is called recycling in R and imho is a terrible footgun - Julia is quite consistent in asking users to be explicit in their intentions rather than trying to guess and rely on DWIM, which I think is one of the strenghts of the language. It does mean more verbosity/less convenience in some situations, but I think it’s a tradeoff well worth making.
Incidentally I think this is an excellent example where the Julia behaviour makes life easier: the broadcasted dot makes it clear that addition happens elementwise, and it doesn’t make sense to do this for shapes that don’t match - rather than coming up with a “solution” to this “problem” for the user, DataFrames asks people to be explicit what they think should happen in these cases. Your missing suggestion would mean just pad out the smaller DataFrame with missing (where, btw - at the end? The start? Randomly in between?), but my guess is that in 9x% of all cases where this happens the fact that someone tries to add different-sized DataFrames is actually a bug in their code, and it’s helpful that an error is raised rather than a silent workaround performed in the background.
Here’s an alternative which may be useful.
The advantage is that any stat function could be applied
# Create a DF to store results into
# There are more efficient ways of doing this but here I just copy one of the existing DFs
mean_df = copy(df1)
# Calculate the mean
mean_df[:, :] = mean(cat(map(Matrix, [df1, df2, df3]), dims=3))
The call to map() converts each DataFrame into a matrix.
cat(..., dims=3) then concatenates these along the third dimension, creating an N-dimensional matrix
We can then simply apply the stat function (mean() in this case, but could be any function) to the result.
mean_df[:, :] = assigns the values in the target DataFrame (note the [:, :])