CSV.Write DataFrame boolean columns as 0 or 1, not as string: "false" and "true"

When I save a DataFrame with boolean columns, CSV.write them as strings: “false” and “true”, so it increases file sizes.

How to save the boolean columns into 0 and 1.

Thank you in advance.

Would using something like Int8.() before writing the csv work?

1 Like

I try to convert

df[:,:ART]= Int8.(df[:,:ART])

But for some reasons, it doesn’t help =((

I can’t covert bool to int =((

dff=DataFrame(id=[1,2,3],b=[true,false,true])
dff[:,:b]=Int8.(dff[:,:b])
typeof(dff.b)

If you don’t mind changing some internal CSV behaviour in your module, you can use

CSV.writecell(buf,pos,len,io, x::Bool, opts) = CSV.writecell(buf,pos,len,io, x ? "1" : "0",opts)

Thank you. It works, even I don’t think it is the best, but I will use it.

If you do dff.b = Int8.(dff.b) instead, I think it will work. The difference must be that this is overwriting the entire column (which changes the type), but you were just overwriting the values with dff[:,:b]=Int8.(dff[:,:b]), which left the type as is.

Anyway, let me know if it works.

yes, you are right

is alternatively written as dff[!, :b] = Int8.(dff.b). Using : instead of ! means that you request an in-place operation while ! indicates that you want to replace the old vector with a new one.

2 Likes