If it’s only 0s and 1s, which might end up being interpreted as Int64
, then it’s no wonder that the size in memory blows up.
Such a row in your CSV looks like this (if I understood correctly):
0,1,0,1,1,0,...
Which means you have ~2 bytes per value. Int64
has a size of 8 bytes, so you will occupy 4x more “space” in ram.
I’d recommend using some kind of in-place data conversion to Bool
or UInt8
. You need to help CSV/DataFrames and tell them the exact types. They cannot guess it unless they read the whole data, but that’s already too late…