Best way to to handle DataFrames and write legible code


#1

What is the best way to get data out of DataFrames/save data in DataFrames and keep code legible? I quickly have the problem that my code looks something like this:

df_special_dataframe[(df_special_dataframe[:Date].>=Year).&(df_special_dataframe[:variable1].>=Threshold),:variable1] = An expression of similar length

That makes my code hard to read and work on. I see that part of the solution is to use the shortest names as possible for variables and DataFrames, but is there a better way to handle DataFrames in general?

I know from VBA that there is an “with…end” environment that looks like this:

With theCustomer.Comments
        .Add("First comment.")
        .Add("Second comment.")
End With

It basically saves you typing “theCustomer.Comments” again. I can imagine that a similar functionality could be implemented for DataFrames or is already existing.


#2

I’m sure someone else has a better solution, but I would just make a short alias for the dataframe and put the filter in a helper variable. I think this is free (i.e. doesn’t cost extra allocations) and it is much more readable.

df = df_special_dataframe
rows = (df[:Date].>=Year) .& (df[:variable1].>=Threshold)
df[rows, :variable1] = expression 

Just a side note about my pet peeve, the precedence of &. Hopefully the parentheses will no longer be required in Julia 1.0. The relevant issue has been reopened and Jeff Bezanson is having a look at it. :slight_smile:


#3

One suggestion is to use Query.jl. It’s legible once you get the hang of it.
DataFramesMeta.jl has a @with macro that does what you describe.


#4

Thanks, Query and DataFramesMeta look very interesting! I will have a look at them at the next opportunity :slight_smile: