How to save the name of a DataFrame to a dictionary?

Hello!

I am writing a function by which I intend to read in a DataFrame, do some analysis with the data, and save the results and the name of the DataFrame in a dictionary. I am almost done except saving the DataFrame name. I tried various combinations of :, Expr(), QuoteNode(), but the best I could have achieved is saving the “entire” dataset in the dictionary. I only wish to save the name of the DataFrame.

Any suggestion will be appreciated! The following is a MWE.

using DataFrames

testdata = DataFrame(A = 1:2, B=["M", "F"]);

function test1(df::DataFrame)

    # doing some analysis here ...

    z = Dict{Symbol, Any}()  
    z[:dataname] = :($df)      # wish to save name of the DataFrame
    return z
end    

dict1 = test1(testdata);
display(dict1[:dataname])     # returns the entire dataset, not what I want

Unfortunately, that is impossible in julia. Inside a function, you can’t see the name of the variable outside the function.

You will have to pass the string "testdata" to the function to give it a name.

2 Likes

^

Or associate the name with the dictionary like

testdata = DataFrame(A = 1:2, B=["M", "F"], name = "testdata");
2 Likes

An option to avoid repeating the variable name is to (ab)use keyword argument splatting:

using DataFrames
testdata = DataFrame(A = 1:2, B=["M", "F"]);

function test1(; kwargs...)
    df_name, df = first(kwargs)

    z = Dict{Symbol,Any}()
    z[:dataname] = df_name
    return z
end

julia> dict1 = test1(; testdata)  # Semicolon required here!
Dict{Symbol, Any} with 1 entry:
  :dataname => :testdata

not sure it’s a good idea :slight_smile:

1 Like

I would suggest restructruring your code so that

  1. you read in the dataframe and keep the name (eg in a name => df pair),
  2. when doing analysis, work on the df part and carry the name, which you can automate easily with eg
    lift(named_df, f, args....) = named_df[1] => f(named_df[2], args...)
    
  3. just use the name when you are done.
2 Likes

I thank @pdeffebach , @tomerarnon , @sijo , and @Tamas_Papp for spending time responding to my question. Really appreciated. They are all informative to me. The suggestion of @sijo is quite good, though if possible I’d prefer not to have the semicolon there so as to keep the API clean.

@Tamas_Papp 's suggestion is … deep to me! I am still new to Julia, and after playing the lines for a while I still couldn’t fathom how to use them in my workflow. The following shows my silly, useless effort in trying to understand the code. Could @Tamas_Papp kind enough to elaborate? Many thanks.

using DataFrames
testdata = DataFrame(A = 1:2, B=["M", "F"]);

function f(data::DataFrame)
  res = sum(data[:, 1])
  return res
end    

mypair = ("my", testdata);
lift(named_df, f) = named_df[1] => f(named_df[2])
lift(mypair, f)

julia> lift(mypair, f)
"my" => 3
1 Like

From the code you included I think you got the essence of my solution perfectly, so I am not sure what to elaborate on. Just write your transformation functions f and lift them.

2 Likes

OP this might be a bit of an XY problem.

I think the real solution is to keep your data frames in Dicts so that each one has a name attached to it, and that named is actually stored someplace instead of just being the variable name you use in global scope.

2 Likes

If I understand @Tamas_Papp correctly, this may suit your needs

using DataFrames

mydata = ("testdata" => DataFrame(A = 1:2, B=["M", "F"]))
# use mydata.first and mydata.second to refer to the Pair elements

function test(in::Pair{String, DataFrame})
    df = in.second
    df.C = 3:4     # do stuff with df
    z = Dict{Symbol, Any}()  
    z[:dataname] = in.first     # save name
    z[:dataframe] = df          # save dataframe
    z[:pair] = in               # save the pair if you like
    return z
end    

out = test(mydata)
out[:dataname]
out[:dataframe]
out[:pair]
mydata.second                   # changed as well

In dynamic languages, names are symbols that are bound to values or actual data objects: names are pointers to objects, objects don’t have names. In your example, the DataFrame is an object, Tamas suggested to use a Pair to collect a name (String) and the DataFrame object together.

3 Likes

With destructuring you can also extract the name and the data from the pair, directly in the function signature:

using DataFrames

function test1((name, df))
    z = Dict{Symbol, Any}()
    z[:dataname] = name
    z[:result] = df[1,1]  # or whatever
    return z
end

mydata = "testdata" => DataFrame(A = 1:2, B=["M", "F"])

julia> test1(mydata)
Dict{Symbol, Any} with 2 entries:
  :result   => 1
  :dataname => "testdata"

Or of course you could write a macro @test1 instead of a function (quite overkill I think).

3 Likes

Thank you all again for the suggestions and advice; I learned a lot. The examples of @DaymondLing and @sijo are particularly illustrative! I think I can figure out my solution now.

That, and also reusing existing transformations with lift. But that’s just a style issue, it can be coded either way.

1 Like

Very neat, I forgot about destructuring right in the parameter list. Thank you.