Hello everyone,
I am completly new to Julia, so please my apologies if my question is trivial. I have quickly looked for the reply and I have not found it.
I need to import a large matrix (40 000 x 40 000, app 6GB) into Julia, all rows and columns without changing the order. The file is in csv and only with numerical values. So nothing particularly complex. I have 32 GB of RAM, and I have already done it with Python (so I assume my hardware is not the issue, even if it takes a couple of minutes to do this import with Python, and we are at the limit because it does not work if several other applications using a lot of RAM are open).
I have installed VSC to execute Julia (with Python, I use mainly Anaconda) and made an import of a small CSV test (11MB) with Julia. No particular issue.
But when I try with my big matrix,
CSV.read(csv_path, DataFrame),
it runs for several hours (!!), and then VSC freezes.
Then I tried to import it directly as a matrix (and not as a data frame) and in float32 to see if this could help, I wrote the following function:
function load_csv_to_matrix(file_path, num_rows, num_columns)
data_matrix = Array{Float32}(undef, num_rows, num_columns)
row_index = 1
for row in CSV.Rows(file_path)
col_index = 1
for value in row
# Check to ensure we don't exceed the column bounds of the matrix
if col_index > num_columns
break
end
data_matrix[row_index, col_index] = parse(Float32, value)
col_index += 1
end
row_index += 1
if row_index > num_rows
break
end
end
return data_matrix
end
Here again, it works with my small CSV test, but not with my big matrix.
Last, maybe the issue comes from my use of VSC. So far, I have written my code into the editor area, and then run it by doing: โJulia: Execute code in REPLโ (but I do not think it is the issue as it works for my small CSV).
Thank you in advance for any help.