I am wondering whether it is possible to embed a CSV file in a julia program. It almost works with CSV.read(), except I do not see a way to signal “EOF” (like with ^D on the REPL).
julia> using DataFrames, CSV
julia> d= CSV.read( stdin )
yyyymmdd,days,rate
19960102,9,5.763067
19960102,15,5.745902
19960103,8,5.763067
19960103,14,5.747397
^D ## please no unprintable control characters
Hmmm…I’m not quite clear on the question here; is it really to read a CSV file from stdin or do you just want to “hard-code” a csv file in a script? For the latter, you could do something as simple as:
This is not an answer to your original question, but if I wanted to include nontrivial data in a package/project, I would just Serialization.serialize, and read it at runtime from a path I determined with @__DIR__.
agreed. the principal use of END would be for small data frames, such as illustrative data sets.
perl also has an useful DATA feature that one can stick at the end of illustrative programs. but this is not a package feature that would be easy to implement. then again, perl is not so very good dealing with multiple files packaged together and residing elsewhere for its quick-and-dirty uses.
definitely not as good visually for the reader. see, our illustrative finance data sets are often not calculated, but real data (think interest rates and stock returns in different months), and can be up to, say, 24 months long. just aligning them this way is a pain.
The CSV you display in the opening post is not aligned either.
I am now not really sure what you want. For small datasets, you can just use code and align (and comment!) as you prefer. For larger datasets, this is presumably not a concern as they would not be eyeballed directly.
Perhaps you can also include an IJulia notebook that visualizes certain features of the data (eg distribution, lag-1 scatter plot).
A END keyword won’t work as what you suggested since the script is NOT read through STDIN. You’ll never be able to read anything. What you are asking for is just a way to embed string in a script and I don’t see what’s missing from a normal multi-line string constant as @quinnj suggests.
the """ construct is a reasonable alternative. not as nice, but close enough. the listing data on long lines as in DataFrame( yyyymmdd= ..., ...) is also feasible, but again not as nice for this purpose.
Still, are you talking about typing in data in a REPL or embeding data in a script. You always talk about script and then give REPL as example.
What I don’t understand is what’s not nice about it, or really, what are you looking for. AFAICT it’s just the difference between
d = read()
<data>
__END__
and
str = """
<data>
""" # __END__
d = read(IOBuffer(str))
That’s roughly the same number of lines. You get your __END__ in the comment if you want. Syntax highlight already works. read comes after the data but if you really want you can create a string macro and do
data = csv"""
<data>
"""
if it hasn’t been done already. (Should be as simple as macro csv_str(str) :(read(IOBuffer($(esc(str))))) end)
This still won’t work. You’ll still need to convience the parser to ignore your data, which is almost certainly invalid code.