Loading .csv into a DataFrame with missing insertion and column-type correction (one-linerish)

Hi,
forgive me if the question is trivial or if there have been similar or identical questions already answered, but I’ve not been able to find them.
I have to load a .csv file into a DataFrame, but the file contains “NA” for missing values, so a lot of Int or Float columns are detected as String.
I then used allowmissing(), replaced the “NA” values with the recode() function but now I’m left with having to re-type all the mistyped columns from String to the correct numerical type.
Since my solution seems pretty ugly and cumbersome for such a trivial task (allowmissing + recode + go through the DataFrame to see mistyped columns + manually retype any of them) I was wondering if there is a standard solution since this is a quite common (while infamous) task.

I stepped into this problem while converting and old script from Python. Using Pandas, the “NA” are automatically treated as NaN, so the columns are still interpreted as numerical type columns.

I finally found the missingstring and missingstrings parameters for CSV.File

It completely solves the problem with:

df = CSV.File(“filename.csv”, missingstring=“NA”) |> DataFrame

if there are more than one string replacing the missing values, it’s:

df = CSV.File(“filename.csv”, missingstrings=[“string1”, “string2”) |> DataFrame

hope this can help someone else coming from Python

3 Likes