Delimited files

#Say I have data in the following format

v = [(1, 0.1, [1, 2, 3]), (2, 0.2, [3,4,5])]

#I can write it to a delimited file using

using DelimitedFiles

open("testdlm.csv", "w") do io

     writedlm(io, v)

end

# The content of test.csv is

# 1 0.1 [1, 2, 3]

# 2 0.2 [3, 4, 5]

I wonder why commas inside square brackets are treated differently than high commas, by the writedlm() function.

Because it’s the default and documented that way, if you wanted CSV:

writedlm(io, v, ',')

TAB is the default delimiter, very common as default on Unix/Linux.

I noticed:

writedlm(io, v, delim=',')
ERROR: ArgumentError: unknown option delim

even though I read the docs as that should work… CSV.jl does have delim, and understandably, its default is the comma for read and write. It can be much faster for reading (fastest of all languages), I’m not sure write is any faster.

They are not treated differently. You would get exactly the same results from [[1, 0.1, [1, 2, 3]], [2, 0.2, [3,4,5]]].

The elements of v are the rows of the output. Your first row has the elements (columns) 1, 0.1, [1, 2, 3] and your second row has the elements 2, 0.2, [3,4,5], and this is exactly how they are printed (with a tab delimiter by default).

That’s because delim is (documented as) a positional argument, not a keyword argument:

julia> writedlm(stdout, v, ',')
1,0.1,[1, 2, 3]
2,0.2,[3, 4, 5]

Unlike Python, there is a distinction between the two types of arguments in Julia: you never give names when passing positional arguments, and you always give names when passing keyword arguments.

2 Likes

so when writedlm composes the lines of the file it uses, in some way, the info of the data structure it saves (also the semantics not only the syntax) !?
On the contrary, when it has to read it can be based only on the form and possibly on the info of the delim parameter supplied by the user.

w = open("test.csv", "r") do io
    readdlm(io)
end


# 2×5 Matrix{Any}:
#  1  0.1  "[1,"  "2,"  "3]"
#  2  0.2  "[3,"  "4,"  "5]"

w = open("testdlm.csv", "r") do io
    readdlm(io, '\t')
end

# 2×3 Matrix{Any}:
#  1  0.1  "[1, 2, 3]"
#  2  0.2  "[3, 4, 5]"


w = open("testdlm.csv", "r") do io
    readdlm(io, ',')
end
# 2×3 Matrix{Any}:
#  "1\t0.1\t[1"  2  " 3]"
#  "2\t0.2\t[3"  4  " 5]"

It just writes the elements as strings. There’s nothing special about that. Your reading examples look perfectly reasonable. I wouldn’t expect that a data file reader would parse Julia code syntax. I’d be really annoyed if every CSV file I read that contained parentheses caused Julia to create tuples, for example.

about the delim parameter, this is what REPL provides


help?> writedlm
search: writedlm

  writedlm(f, A, delim='\t'; opts)

  Write A (a vector, matrix, or an iterable collection of
  iterable rows) as text to f (either a filename string or an  
  IO stream) using the given delimiter delim (which defaults   
  to tab, but can be any printable Julia object, typically a   
  Char or AbstractString).

as far as the operating logic of the two functions writedlm and readdlm is concerned, since I cannot follow / understand the code I try to deduce from tests how it “works” at least in some simple cases.
To this end, I made the following tests

v = [(1, 0.1, [1 , 2 , 3]), (2 , 0.2 , [3 , 4 , 5])]

# content of dlmv file
# 1	0.1	[1, 2, 3]
# 2	0.2	[3, 4, 5]

julia> rv = open("dlmv.csv", "r") do io
    readdlm(io)
end
# 2×5 Matrix{Any}:
#  1  0.1  "[1,"  "2,"  "3]"
#  2  0.2  "[3,"  "4,"  "5]"




v1=[v]

# content of dlmv1 file
# (1, 0.1, [1, 2, 3])	(2, 0.2, [3, 4, 5])
julia> rv1 = open("dlmv1.csv", "r") do io
    readdlm(io,',')
end
# 1×9 Matrix{Any}:
# "(1"  0.1  " [1"  2  " 3])\t(2"  0.2  " [3"  4  " 5])"

v2=[v,v]
# content of dlmv2 file
# (1, 0.1, [1, 2, 3])	(2, 0.2, [3, 4, 5])
# (1, 0.1, [1, 2, 3])	(2, 0.2, [3, 4, 5])

julia> open("dlmv2.csv", "r") do io
    readdlm(io,',')
end
# 2×9 Matrix{Any}:
# "(1"  0.1  " [1"  2  " 3])\t(2"  0.2  " [3"  4  " 5])"
# "(1"  0.1  " [1"  2  " 3])\t(2"  0.2  " [3"  4  " 5])"     

From these tests I deduce that, in the writing phase, the comma at the level of the external square brackets (let’s say level-0, this level) is translated into “new line” and the one at level-1 is translated into a space / tab.

On the other hand, during the reading phase, the comma function as delimiter intervenes at all levels.
In this sense I described the two characteristics of the writedlm functions (which uses info on the vector structure) while for readdlm the content is “flat”