Formatting Float64 output with CSV.write()

I am using the CSV package to export a DataFrame. One of the columns has DataType Float64, so it get exported using scientific notation format and the software that reads it (SQL Server bcp) cannot read it properly. I have tried to use the typemap options, and I don’t think it is the right way.

Ideally, I would like to limit the number of decimal to 8 and write to CSV the number like in the following example:

instead of

Here a simple test case:

df = DataFrame(Test =1e6*rand():1e6:4e6 )

This will print something similar to the following


Ideally I would like to have


Any idea?


Could you give more information? The Microsoft SQL bcp documentation says that scientific notation is supported, but it looks like it might need a sign after e, i.e. 1.23e+6 and not 1.23e6. If that’s the only problem, it would be easy to add the + via post-processing.

1 Like

@stevengj, you are indeed right, BCP does not have issues with the scientific notation (regardless of ± sign)
The issue I am facing is caused by the target field which is a DECIMAL(26,8) in the database. In the CSV file generated it is much longer as it comes from a Float64.
Something like this will fail:

#@ Row 1, Column 1: String data, right truncation @#

Something like this would succeed


From this, the need to truncate the 8th decimal

One option would be to just loop over rows and columns and output CSV yourself — it’s not a complicated format to write.

Using the Printf package, you can output in whatever format you want, e.g.

julia> @printf("%0.8f", 1234567.1234567890123456)
1 Like

That is what we have in place right now and I am trying to replace it with something more performant.
The CSV package is ~y times faster than the rows/columns loop :sunny:

This package should be more performant than Printf.

Could you share your benchmark?

I edited the existing code to use the package you recommended.
Before it was using @sprintf (see the commented code).

fmt = "%.8f"
open(file_path1,"w") do f
  for irow in 1:size(df)[1]
    for icol in 1:size(df)[2]
      if typeof(df[irow,icol]) == Float64
        #write(f, @sprintf "%0.8f" df[irow,icol])
        write(f, cfmt(fmt, df[irow,icol]))
        write(f, string(df[irow,icol]))
        write(f, ",")

The results are not what I expected as the execution time went from

17.255884 seconds


423.600634 seconds 

So please let me know if I did something wrong in the above code.

ps With the CSV library it takes just ~2.5 seconds

Rather than @sprintf, which allocates a string that you then write to f, you should do

@printf(f, "%0.8f" df[irow,icol])

to write directly to the stream f.

Similarly, don’t do write(f, string(df[irow,icol])), do

print(f, df[irow,icol])

In general, use print for writing text, and write for raw binary representations.

Also, don’t benchmark in global scope. Put all performance-critical code into a function.

1 Like

Thanks for the advice - I implemented the changes you recommended, unfortunately the impact is very low, in the range of a second.
I tried to remove the @printf(f, "%0.8f", ...) and the execution time went down to ~7 seconds! Which is still twice slower than the CSV… and not valid for my BCP job…

I ended up changing the data type of the DataFrame to comply the database requirements.
I preprocess the Float64 columns this way:

df[:Test] = map( (x) -> string(round(x; digits=8)) , df[:Test])

This way the column contents are already truncated to fit the database DECIMAL(26, 8) data type we have in the database, so the BCP will not complain about the length anymore. The CSV.write() does not provide formatting flexibility, so adapting the input is, at least for now I think, the way to go.

Please shoot if you have a better way to do this!


Good question. I haven’t tested in months but now Printf is a lot faster than before! There must have been huge improvements recently.

Here’s my test result:

julia> x = rand(100_000);

julia> foo(x) = open("/tmp/test.csv", "w") do f 
           for v in x
               @printf(f, "%16.8f\n",v)
foo (generic function with 1 method)

julia> goo(x) = open("/tmp/test2.csv", "w") do f 
           fspec = FormatExpr("{:16.8f}")
           for v in x
               printfmtln(f, fspec, v)
goo (generic function with 1 method)

julia> @btime foo($x)
  36.167 ms (11 allocations: 688 bytes)

julia> @btime goo($x)
  52.566 ms (500032 allocations: 9.92 MiB)
1 Like