DataFrames Complexity

Sorry for being blunt, but a question that I am having for a while now and is in someway bothering me:
Is there anything that DataFrames.jl does which can’t be done with arrays and matrices in a more efficient, less complicated way, and more readable code?
Aside from giving us a slow array of type {Any} what is the real added value one gets from using DataFrames?

I think there are a lot of things that are easier with data frames! It’s hard to comment on this without a specific example.

What do you mean by a slow array type {Any}? The vectors in DataFrames have specific eltypes, they are normal Julia vectors and carry all those performance benefits with them.

4 Likes

Working with heterogenous tables. I work almost entirely with datasets that are part string, part categorical, and part numeric, and without DataFrames it’d be near impossible to do this efficiently.

2 Likes

Maybe it’s helpful to flip your question: can you provide any example in which a Matrix{Any} is substantially more efficient than a DataFrame with one Float64 column and one String column? You seem to be assuming you almost always can use Matrix{Any}, but you haven’t given any examples yet.

3 Likes
@pipe df |>
  group_by(_, :grp) |>
  combine(_, :col1 => mean => :mean_col1)

is pretty readable to me. How do you do that with a Matrix?

2 Likes

For one thing, keeping track of column indexes instead of names would make almost all code less readable, and more brittle.

Being blunt is acceptable (if not ideal), but please do a bit of research before asking a question like this. No, DataFrames are not a

2 Likes