Convert dataframe to runnable Julia code?

How would I convert a dataframe to Julia code? I have a dataframe containing these in a notebook, and I want Julia code that when run, produces a dataframe with the same contents.

10 rows × 12 columns

PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
Int64	Int64	Int64	String	String	Float64?	Int64	Int64	String	Float64	String?	String?
1	30	0	3	Todoroff, Mr. Lalio	male	missing	0	0	349216	7.8958	missing	S
2	205	1	3	Cohen, Mr. Gurshon "Gus"	male	18.0	0	0	A/5 3540	8.05	missing	S

Sorry I’m not sure I understand the question - how did you create this DataFrame in the first place?

If you’ve got the DataFrame in a notebook you can just using CSV; CSV.write("mydata.csv", df) and then another file which does using CSV, DataFrames; CSV.read("mydata.csv", DataFrame) will give you the same DataFrame back.

The DataFrame was obtained using CSV.read().

The thing is, I can easily re-create it any time, but doing that requires, well, reading a CSV. I want something that is just Julia code, no external files.

An external file makes some things more difficult. For instance asking a question about code is hard because I can’t post an external file alongside the code in the forums.

Blindly copy-pasting lines from the original CSV to a string, and then reading it as CSV afterward doesn’t work as the following is not a valid string.

thecsv = """
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S
205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S
"""

In any case, it is a lot more work to do it manually.

In R I would use dump(), which produces a file of R code you can run to produce the original object, but Julia’s dump() does a different thing.

If you just want it to be pastable so you can post it on forums and the like, you could print it in a copy-paste-friendly form. Here is one way to do that:

function to_pastable(df)
    print("DataFrame(")
    for n in names(df)
        print(n, " = ", df[!, n], ", ")
    end
    print(")")
end

The above will only work for types that print the same as they “are”. If the dataframe contains types with overloaded show methods, this won’t work.

1 Like

That does work for the data I have, so it’s a solution, but in general it’s not that great to have code that works ‘most of the time’.

I would just dump it into a CSV format as a string, and read that back. Eg

using CSV, DataFrames
df = DataFrame(a = 1:5, b = 6:10)

# write
CSV.write(stdout, df) # copy-paste the output between """s below

# read
table = """
a,b
1,6
2,7
3,8
4,9
5,10
"""
df2 = DataFrame(CSV.File(IOBuffer(table)))
1 Like

That doesn’t work. The output is


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
104,0,3,"Johansson, Mr. Gustaf Joel",male,33.0,0,0,7540,8.6542,,S
117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q
631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18.0,0,0,365226,6.75,,Q
664,0,3,"Coleff, Mr. Peju",male,36.0,0,0,349210,7.4958,,S
672,0,1,"Davidson, Mr. Thornton",male,31.0,1,0,F.C. 12750,52.0,B71,S
779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q
796,0,2,"Otter, Mr. Richard",male,39.0,0,0,28213,13.0,,S
[6]:
IJulia.IJuliaStdio{Base.PipeEndpoint}(IOContext(Base.PipeEndpoint(RawFD(0x0000002c) open, 0 bytes waiting)))

If you insist on double (which become triple) quotes for "Hegarty, Miss. Hanora ""Nora""", you need to escape them in the string.

I don’t want to manually edit the contents. Sorry.

Why would you need to edit it manually? Use something like

let io = IOBuffer()
    CSV.write(io, df)
    Base.print_quoted(stdout, String(take!(io)))
end
2 Likes

To add on to what @Tamas_Papp wrote,

I do CSV.write(df) in the REPL

 julia> df = DataFrame(a = [1, 2, 3], b = ["x", "y", "z"])

julia> CSV.write(stdout, df)
a,b
1,x
2,y
3,z
Base.TTY(RawFD(0x0000000d) open, 0 bytes waiting)

Then I copy and paste that into a multiline string

s = """
a,b
1,x
2,y
3,z
"""

CSV.read(IOBuffer(s), DataFrame)

This is pretty inelegant, i agree. But it’s feasible nonetheless.

Alternatively, a Vector of NamedTuples could also work fine and require no parsing.

julia> print(map(NamedTuple, Tables.rows(df)))
NamedTuple{(:a, :b),Tuple{Int64,String}}[(a = 1, b = "x"), (a = 2, b = "y"), (a = 3, b = "z")]

data = [(a = 1, b = "x"), (a = 2, b = "y"), (a = 3, b = "z")]
DataFrame(data)

etc.

This (IOBuffer) works, and I can make a string variable with it. The reason I couldn’t do that is that I’ve done Julia for one week. I looked at the documentation of CSV, and I couldn’t figure out how to make it read from a string, it wants a file.

Also, to_pastable(df) works.

Maybe my SO answer here is helpful: How to provide reproducible Sample Data in Julia - Stack Overflow

1 Like

This StackOverflow solution seems to be the shortest one that works, so arguably it’s the best one.