How to specify thousand separator with CSV.read data

Nikx · September 7, 2022, 1:37pm

My Excel spread uses apostrophes to separate thousands and even worse it uses" ’ " instead of straight apostroph e.g. 1’000. Is there a way to specify this in CSV.read()? Because when I import data it interprets 1’000 as 1\x92000.

Excel Data comes in the form of:
Amount;Date;Time
1’000;10.01.2022;13:22:17
6’000;10.01.2022;13:20:12
3’000;10.01.2022;13:23:08

then I use
df=CSV.read(“filepath.csv”, delim =“;”, dateformat= “dd.mm.yyyy”, normalizenames= true, DataFrame);

println(df) shows the Amount as String7 which prohibits me from doing any calculation with the data.

mthelm85 · September 7, 2022, 1:41pm

Welcome to the community! Can you post an example of the data that you’re trying to read so that others can easily copy/paste it as they attempt to help? Thanks!

nilshg · September 7, 2022, 3:31pm

There is not:

github.com/JuliaData/CSV.jl

Ability to specify a thousands separator character

opened 06:24PM - 04 May 20 UTC

closed 05:15AM - 05 Jun 23 UTC

jaakkor2

new feature

This feature was listed in https://github.com/JuliaData/CSV.jl/issues/3 , but no…t implemented. https://github.com/JuliaData/CSV.jl/issues/207 has a complicated workaround (but keyword transforms does not exists anymore?), and says "there isn't a "thousands" separator option, but we could potentially add on". Practical example, data is two first rows from table 2 on page 10 in https://www.istat.it/it/files//2020/05/Rapporto_Istat_ISS.pdf ``` a=CSV.File(IOBuffer(""" Alessandria 95,7 98,2 -12,8 91,0 1.199 693 222 18,5 Ancona 76,6 84,3 -10,7 49,4 704 528 86 12,2 """), delim=" ", header = false, decimal = ',') ``` Row 1, column 6 would ideally be Int 1199. Here comma is a decimal separator and period is a thousands separator.

stillyslalom · September 7, 2022, 3:47pm

As a workaround, you can just remove the separator and parse the resulting string as a number. In the DataFrames.jl mini-language:

julia> transform!(df, :Amount =>  ByRow(str -> parse(Int, replace(str, '’' => ""))) => :Amount)
3×3 DataFrame
 Row │ Amount  Date        Time
     │ Int64   Date        String15
─────┼──────────────────────────────
   1 │   1000  2022-01-10  13:22:17
   2 │   6000  2022-01-10  13:20:12
   3 │   3000  2022-01-10  13:23:08

or more simply

df.Amount = parse.(Int, replace.(df.Amount, '’' => ""))

stevengj · September 7, 2022, 4:15pm

You should also be able to use CSV.read: read the file into a string, delete the delimiter, and do CSV.read on an in-memory buffer (IOBuffer):

s = filter(!=('’'), read("filepath.csv", String))
CSV.read(CSV.File(IOBuffer(s)), delim=';', ....)

Nikx · September 7, 2022, 4:33pm

Thanks a lot for the suggestions.

quinnj · September 8, 2022, 3:49pm

@drvi just added support for this in the Parsers.jl package, so now we just need to plumb support for this in the CSV.jl package if someone wanted a pretty easy first issue to try out.

Yuan-Ru-Lin · June 16, 2024, 1:10am

Just so whoever stumbles across this problem knows. The issue is fixed by Support groupmark by LilithHafner · Pull Request #1093 · JuliaData/CSV.jl · GitHub and according to this example, you can now do this.

using CSV

# In many places in the world, digits to the left of the decimal place are broken into
# groups by a thousands separator. We can ignore those separators by passing the `groupmark`
# keyword argument.
data = """
x y
1 2
2 1,729
3 87,539,319
"""

file = CSV.File(IOBuffer(data); groupmark=',')

Topic		Replies	Views
, and . in numbers inside CSV format Data question , csv	7	313	February 5, 2023
Parsing CSV files where numbers may have thousand separators and a percentage symbol General Usage csv	1	652	October 26, 2019
Tidying up a csv file (follow-up to question 53261/4) New to Julia dates , csv	9	1231	January 14, 2021
CSV won't read tab separated file General Usage csv	23	651	March 4, 2024
Read file with CSV.read New to Julia	8	19795	September 9, 2019

How to specify thousand separator with CSV.read data

Related topics