I am writing a function by which I intend to read in a DataFrame, do some analysis with the data, and save the results and the name of the DataFrame in a dictionary. I am almost done except saving the DataFrame name. I tried various combinations of :, Expr(), QuoteNode(), but the best I could have achieved is saving the “entire” dataset in the dictionary. I only wish to save the name of the DataFrame.
Any suggestion will be appreciated! The following is a MWE.
using DataFrames
testdata = DataFrame(A = 1:2, B=["M", "F"]);
function test1(df::DataFrame)
# doing some analysis here ...
z = Dict{Symbol, Any}()
z[:dataname] = :($df) # wish to save name of the DataFrame
return z
end
dict1 = test1(testdata);
display(dict1[:dataname]) # returns the entire dataset, not what I want
I thank @pdeffebach , @tomerarnon , @sijo , and @Tamas_Papp for spending time responding to my question. Really appreciated. They are all informative to me. The suggestion of @sijo is quite good, though if possible I’d prefer not to have the semicolon there so as to keep the API clean.
@Tamas_Papp 's suggestion is … deep to me! I am still new to Julia, and after playing the lines for a while I still couldn’t fathom how to use them in my workflow. The following shows my silly, useless effort in trying to understand the code. Could @Tamas_Papp kind enough to elaborate? Many thanks.
using DataFrames
testdata = DataFrame(A = 1:2, B=["M", "F"]);
function f(data::DataFrame)
res = sum(data[:, 1])
return res
end
mypair = ("my", testdata);
lift(named_df, f) = named_df[1] => f(named_df[2])
lift(mypair, f)
julia> lift(mypair, f)
"my" => 3
From the code you included I think you got the essence of my solution perfectly, so I am not sure what to elaborate on. Just write your transformation functions f and lift them.
I think the real solution is to keep your data frames in Dicts so that each one has a name attached to it, and that named is actually stored someplace instead of just being the variable name you use in global scope.
If I understand @Tamas_Papp correctly, this may suit your needs
using DataFrames
mydata = ("testdata" => DataFrame(A = 1:2, B=["M", "F"]))
# use mydata.first and mydata.second to refer to the Pair elements
function test(in::Pair{String, DataFrame})
df = in.second
df.C = 3:4 # do stuff with df
z = Dict{Symbol, Any}()
z[:dataname] = in.first # save name
z[:dataframe] = df # save dataframe
z[:pair] = in # save the pair if you like
return z
end
out = test(mydata)
out[:dataname]
out[:dataframe]
out[:pair]
mydata.second # changed as well
In dynamic languages, names are symbols that are bound to values or actual data objects: names are pointers to objects, objects don’t have names. In your example, the DataFrame is an object, Tamas suggested to use a Pair to collect a name (String) and the DataFrame object together.
Thank you all again for the suggestions and advice; I learned a lot. The examples of @DaymondLing and @sijo are particularly illustrative! I think I can figure out my solution now.