How to specify thousand separator with CSV.read data

My Excel spread uses apostrophes to separate thousands and even worse it uses" ’ " instead of straight apostroph e.g. 1’000. Is there a way to specify this in CSV.read()? Because when I import data it interprets 1’000 as 1\x92000.

Excel Data comes in the form of:
Amount;Date;Time
1’000;10.01.2022;13:22:17
6’000;10.01.2022;13:20:12
3’000;10.01.2022;13:23:08

then I use
df=CSV.read(“filepath.csv”, delim =“;”, dateformat= “dd.mm.yyyy”, normalizenames= true, DataFrame);

println(df) shows the Amount as String7 which prohibits me from doing any calculation with the data.

image

1 Like

Welcome to the community! Can you post an example of the data that you’re trying to read so that others can easily copy/paste it as they attempt to help? Thanks!

There is not:

As a workaround, you can just remove the separator and parse the resulting string as a number. In the DataFrames.jl mini-language:

julia> transform!(df, :Amount =>  ByRow(str -> parse(Int, replace(str, '’' => ""))) => :Amount)
3×3 DataFrame
 Row │ Amount  Date        Time
     │ Int64   Date        String15
─────┼──────────────────────────────
   1 │   1000  2022-01-10  13:22:17
   2 │   6000  2022-01-10  13:20:12
   3 │   3000  2022-01-10  13:23:08

or more simply

df.Amount = parse.(Int, replace.(df.Amount, '’' => ""))
1 Like

You should also be able to use CSV.read: read the file into a string, delete the delimiter, and do CSV.read on an in-memory buffer (IOBuffer):

s = filter(!=('’'), read("filepath.csv", String))
CSV.read(CSV.File(IOBuffer(s)), delim=';', ....)

Thanks a lot for the suggestions.

@drvi just added support for this in the Parsers.jl package, so now we just need to plumb support for this in the CSV.jl package if someone wanted a pretty easy first issue to try out.

2 Likes

Just so whoever stumbles across this problem knows. The issue is fixed by Support groupmark by LilithHafner · Pull Request #1093 · JuliaData/CSV.jl · GitHub and according to this example, you can now do this.

using CSV

# In many places in the world, digits to the left of the decimal place are broken into
# groups by a thousands separator. We can ignore those separators by passing the `groupmark`
# keyword argument.
data = """
x y
1 2
2 1,729
3 87,539,319
"""

file = CSV.File(IOBuffer(data); groupmark=',')
3 Likes