DataFrames : Removing values from column

Hey everyone,

I would like to remove values that are equal to datetime2unix(DateTime(0)) from my column :

replace!(df[:,:col2], datetime2unix(DateTime(0)) => nothing)

OR

df[:,:col2] = replace(df[:,:col2], datetime2unix(DateTime(0)) => nothing)

I get this error :

MethodError: Cannot `convert` an object of type Nothing to an object of type Float64

Do you have any ideas ?

Thank you

I can think of a few different things here:

Firstly, it sounds like you are describing a scenario where you are reading in data where you know that DateTime(0) isn’t actually a valid observation in the dataset. If that is indeed the case, you’re best bet is probably to handle this when you read in the dataset. Most packages have this capability as a keyword argument. For example:

using CSV
CSV.read("file.csv", missingstrings = "00:00:00") 

If that isn’t the issue, you could try the below:

Try doing this using an operation that is not in place. My guess is that if you ran typeof(df[:, :col2]) you would see something like Vector{DateTime} . Since DateTime vectors cannot hold objects of type Nothing, the error makes sense. So if you just created a new object that would solve your problem:

col3 = replace(df[:, :col2], datetime2unix(DateTime(0)) => nothing)
typeof(col3)
df2 = hcat(df, col3)

or

df = transform(df, :col2 => ByRow(
      x -> if x == datetime2unix(DateTime(0))
          nothing
     else
          x
     end
) => :col2)
     

If you don’t need the whole row, then you could use an inplace operation to just subset out the bad observations:

filter!(:col2 => c -> c == datetime2unix(DateTime(0)), df)
1 Like

Your issue comes from using df[:, :col2] indexing which attempts to use the existing column, which as Derek says can’t hold values of type Nothing.

You can do:

df[:,:col2] = replace(df[:,:col2], datetime2unix(DateTime(0)) => nothing)

or

df.col2 = replace(df[:,:col2], datetime2unix(DateTime(0)) => nothing)

If you create a new vector you are however re-allocating everything, which will not be the most performant way of going about this.

Note also that nothing is not usually meant to signify missing data, for this there is missing.

In summary, I would do the following:

julia> df = DataFrame(x = [now(), DateTime(0)]) # example data
2Γ—1 DataFrame
 Row β”‚ x
     β”‚ DateTime
─────┼─────────────────────────
   1 β”‚ 2021-11-08T17:17:52.754
   2 β”‚ 0000-01-01T00:00:00

julia> allowmissing!(df, :x) # Change type of column x to allow missing values
2Γ—1 DataFrame
 Row β”‚ x
     β”‚ DateTime?
─────┼─────────────────────────
   1 β”‚ 2021-11-08T17:17:52.754
   2 β”‚ 0000-01-01T00:00:00

julia> replace!(df.x, DateTime(0) => missing); df # use mutating version of replace
2Γ—1 DataFrame
 Row β”‚ x
     β”‚ DateTime?
─────┼─────────────────────────
   1 β”‚ 2021-11-08T17:17:52.754
   2 β”‚ missing
3 Likes

Thank you both for these answers, I replaced my values DateTime(0) with missing as you said it worked perfectly !