Delimited files

rocco_sprmnt21 · May 16, 2022, 9:17am

#Say I have data in the following format

v = [(1, 0.1, [1, 2, 3]), (2, 0.2, [3,4,5])]

#I can write it to a delimited file using

using DelimitedFiles

open("testdlm.csv", "w") do io

     writedlm(io, v)

end

# The content of test.csv is

# 1 0.1 [1, 2, 3]

# 2 0.2 [3, 4, 5]

I wonder why commas inside square brackets are treated differently than high commas, by the writedlm() function.

Palli · May 16, 2022, 12:25pm

Because it’s the default and documented that way, if you wanted CSV:

writedlm(io, v, ',')

TAB is the default delimiter, very common as default on Unix/Linux.

I noticed:

writedlm(io, v, delim=',')
ERROR: ArgumentError: unknown option delim

even though I read the docs as that should work… CSV.jl does have delim, and understandably, its default is the comma for read and write. It can be much faster for reading (fastest of all languages), I’m not sure write is any faster.

stevengj · May 16, 2022, 2:30pm

They are not treated differently. You would get exactly the same results from [[1, 0.1, [1, 2, 3]], [2, 0.2, [3,4,5]]].

The elements of v are the rows of the output. Your first row has the elements (columns) 1, 0.1, [1, 2, 3] and your second row has the elements 2, 0.2, [3,4,5], and this is exactly how they are printed (with a tab delimiter by default).

That’s because delim is (documented as) a positional argument, not a keyword argument:

julia> writedlm(stdout, v, ',')
1,0.1,[1, 2, 3]
2,0.2,[3, 4, 5]

Unlike Python, there is a distinction between the two types of arguments in Julia: you never give names when passing positional arguments, and you always give names when passing keyword arguments.

rocco_sprmnt21 · May 16, 2022, 3:48pm

so when writedlm composes the lines of the file it uses, in some way, the info of the data structure it saves (also the semantics not only the syntax) !?
On the contrary, when it has to read it can be based only on the form and possibly on the info of the delim parameter supplied by the user.

w = open("test.csv", "r") do io
    readdlm(io)
end


# 2×5 Matrix{Any}:
#  1  0.1  "[1,"  "2,"  "3]"
#  2  0.2  "[3,"  "4,"  "5]"

w = open("testdlm.csv", "r") do io
    readdlm(io, '\t')
end

# 2×3 Matrix{Any}:
#  1  0.1  "[1, 2, 3]"
#  2  0.2  "[3, 4, 5]"


w = open("testdlm.csv", "r") do io
    readdlm(io, ',')
end
# 2×3 Matrix{Any}:
#  "1\t0.1\t[1"  2  " 3]"
#  "2\t0.2\t[3"  4  " 5]"

Jeff_Emanuel · May 16, 2022, 4:11pm

It just writes the elements as strings. There’s nothing special about that. Your reading examples look perfectly reasonable. I wouldn’t expect that a data file reader would parse Julia code syntax. I’d be really annoyed if every CSV file I read that contained parentheses caused Julia to create tuples, for example.

rocco_sprmnt21 · May 17, 2022, 4:09pm

about the delim parameter, this is what REPL provides


help?> writedlm
search: writedlm

  writedlm(f, A, delim='\t'; opts)

  Write A (a vector, matrix, or an iterable collection of
  iterable rows) as text to f (either a filename string or an  
  IO stream) using the given delimiter delim (which defaults   
  to tab, but can be any printable Julia object, typically a   
  Char or AbstractString).

as far as the operating logic of the two functions writedlm and readdlm is concerned, since I cannot follow / understand the code I try to deduce from tests how it “works” at least in some simple cases.
To this end, I made the following tests

v = [(1, 0.1, [1 , 2 , 3]), (2 , 0.2 , [3 , 4 , 5])]

# content of dlmv file
# 1	0.1	[1, 2, 3]
# 2	0.2	[3, 4, 5]

julia> rv = open("dlmv.csv", "r") do io
    readdlm(io)
end
# 2×5 Matrix{Any}:
#  1  0.1  "[1,"  "2,"  "3]"
#  2  0.2  "[3,"  "4,"  "5]"




v1=[v]

# content of dlmv1 file
# (1, 0.1, [1, 2, 3])	(2, 0.2, [3, 4, 5])
julia> rv1 = open("dlmv1.csv", "r") do io
    readdlm(io,',')
end
# 1×9 Matrix{Any}:
# "(1"  0.1  " [1"  2  " 3])\t(2"  0.2  " [3"  4  " 5])"

v2=[v,v]
# content of dlmv2 file
# (1, 0.1, [1, 2, 3])	(2, 0.2, [3, 4, 5])
# (1, 0.1, [1, 2, 3])	(2, 0.2, [3, 4, 5])

julia> open("dlmv2.csv", "r") do io
    readdlm(io,',')
end
# 2×9 Matrix{Any}:
# "(1"  0.1  " [1"  2  " 3])\t(2"  0.2  " [3"  4  " 5])"
# "(1"  0.1  " [1"  2  " 3])\t(2"  0.2  " [3"  4  " 5])"

From these tests I deduce that, in the writing phase, the comma at the level of the external square brackets (let’s say level-0, this level) is translated into “new line” and the one at level-1 is translated into a space / tab.

On the other hand, during the reading phase, the comma function as delimiter intervenes at all levels.
In this sense I described the two characteristics of the writedlm functions (which uses info on the vector structure) while for readdlm the content is “flat”

Topic		Replies	Views
Saving using DelimitedFiled with header New to Julia question	2	355	May 5, 2020
DelimitedFiles reading everything in one column New to Julia	1	341	January 15, 2021
Reading text or csv files and assigning values to variable names New to Julia question	24	5906	November 25, 2020
ERROR: LoadError: ArgumentError: unknown option delim New to Julia	2	1116	November 18, 2017
DelimitedFiles:writedlm usage General Usage	17	6209	January 6, 2020

Delimited files

Related topics