Remove / replace undef values in DataFrame

gitboy16 · October 5, 2021, 7:38am

Hi,
Is there a way to replace all undef value in a Dataframe column with NA or another value?
And how do I get rid of all the rows that have at least 1 undef value?
Thank you

Karajan · October 5, 2021, 7:59am

I think the first question should be how the undefs got there in the first place because that should never happen.

DataFrames also seems to prevent this:

julia> o = Vector{String}(undef, 10)
10-element Vector{String}:
 #undef
[...]

julia> p = DataFrame(o)
ERROR: UndefRefError: access to undefined reference
[...]

Could you provide a MWE (minimal working example)?

gitboy16 · October 5, 2021, 8:05am

Here is a DataFrame that contains undef values:

df = DataFrame(
  A = Vector{String}(undef, 5),
  B = [5,1,2,undef,4]
)

Karajan · October 5, 2021, 8:13am

Ah I see, thank you.

If you want to fill an array with values like

a = zeros(10)
for i in eachindex(a)
    a[i] = 5
end

you can avoid filling the array with zeros (zeros(10)) since you know you are going to overwrite the values anyway. So here using

a = Vector{Float64}(undef, 10)
for i in eachindex(a)
    a[i] = 5
end

would save you a tiny amount of time.

However, this is pretty much the only time you should use undef. If you have code where you can access undefs, something went wrong. Since you seem to have control over the arrays themselves I would advise you to overwrite the undefs immediately or not use them in the first place.

What are you using them for? Maybe something like missing would be more fitting (and way easier and safer to work with)?

gitboy16 · October 5, 2021, 8:17am

I did not build the DataFrame, I got it like this in a .jls file and I loaded the DataFrame with the following code:

df = Serialization.deserialize("data.jls")

Karajan · October 5, 2021, 8:32am

Well, it sounds like something went wrong with the deserialization (possibly due to “In general, this process will not work if the reading and writing are done by different versions of Julia, or an instance of Julia with a different system image.”?). For anything but short term saving probably CSV.jl, JDF.jl, etc. would be a better choice in the future.

If you have any way to access the original data that would probably the easiest option but if not I appreciate all that advice is not going to help you.

I hacked something together that can check if a single element is undef. Maybe someone else knows of a better way.

df = DataFrame(
  A = Vector{String}(undef, 5),
  B = [5,1,2,undef,4]
)
a = df.A
isassigned(a)  # false
a[1] = "hi"
isassigned(a)  # true
isassigned(Ref(@view a[1]).x)  # true
isassigned(Ref(@view a[2]).x)  # false
b = df.B
b[3] === UndefInitializer()  # false
b[4] === UndefInitializer()  # true

Using this I would go through all the data to replace all the undefs.

(PS: make extra sure the rest of your data is intact. It seems unlikely to me that something blew a few holes in your dataset but the rest was untouched )

Henrique_Becker · October 5, 2021, 12:27pm

This does not seem like a realistic example to me. This is not how a Vector{Int} with undefined positions would look like at all. [5,1,2,undef,4] is a Vector{Any} with 3 Int and a UndefInitializer object that would never exist there unless something very wrong was done. If you create a Vector{Int} with Vector{Int}(undef, 4) it will look like:

4-element Array{Int64,1}:
 140534710323696
 140534774110064
 140534710323728
               0

It will never have an undef inside it, because Int returns true for isbitstype and, therefore, each undefined position is just the Int value of the dirty bits from the memory allocated for the array. There is no way to represent undef with an Int.

gitboy16 · October 5, 2021, 1:08pm

I made up a DataFrame with undef. The DataFrame I use only have String columns and I can’t share it here.

Topic		Replies	Views
Undef to zeros? New to Julia question	4	615	July 19, 2021
Append! produces #undef values from missings. Related to WeakRefStrings / JDF General Usage bug	0	212	August 8, 2023
Another UndefRefError: access to undefined reference Data	21	222	April 21, 2025
Meaning and alternatives to "undef" when initializing vectors New to Julia	11	2452	June 4, 2020
DataFrames : Removing values from column New to Julia dataframes	3	541	November 9, 2021

Remove / replace undef values in DataFrame

Related topics