Does passing a dataframe declared outside a function as an argument improves performance?

Hi there,

I had a question regarding performance of functions in general which act or use DataFrames. Let say I read in a csv file into a DataFrame in a my main program (code). Then I declare a function which uses such Dataframe. Does it matter performance wise whether I pass the DataFrame as an argument or just reference it directly inside the function (its on a global scope).

thanks a lot,
Miguel.

It is not fully clear what you mean by “I pass the DataFrame as a function”. Do you mean that you pass just DataFrame constructor and create a data frame inside a function.

Having said that, whatever option you choose it should not impact the performance significantly.

I think the question is whether it is better to pass the dataframe as an argument into the function, or to reference it as a global variable inside the function.

It is definitely better practice to pass it as an argument, global variables (except constants) should be avoided in general. Whether it impacts performance in your particular case is hard to say without more detail, but passing it as an argument is either as fast or faster, and possibly much faster.

4 Likes

DataFrame is type unstable, so the performance should not be affected as I have commented above.
(passing it to a function might give time improvement of nanosecond order as it will avoid dynamic dispatch in the function scope, but it will not be noticeable in practice)

However, what @DNF comments it indeed a valid general recommendation in Julia and should be followed.

1 Like

Thanks a lot. I know that avoiding global variables is good practice but I was having trouble sending dataframes to different workers with pmap() and I got it working if I instead declared the dataframes everywhere and get the functions to access it at every worker.