How to replace 'Nothing' data type in an array

shrdlu · July 22, 2024, 10:59pm

Hi everyone,

I’m currently trying to untangle a pretty messy multi-nested json file by flattening them out manually and curating them into a dataframe (mostly following the method described here: https://julia.school/julia/json/).

Some of the sub-nested elements are giving me some issues - I ended up with arrays like below, let’s say named ‘nest1_array’:

 Dict{String, Any}("sub" => "K2")
 Dict{String, Any}("sub" => "A5")
 Dict{String, Any}("sub" => "G1")
 Dict{String, Any}("sub" => "G1")
 Dict{String, Any}("sub" => "B1")
 nothing
 Dict{String, Any}("sub" => "GD2")
 nothing
 nothing
 nothing
 Dict{String, Any}("sub" => "A6")
 Dict{String, Any}("sub" => "C1")

Now my goal is do below to isolate ‘sub’ elements from above into its own array and push it into a manually curated dataframe, via:

clean_array=Any[]

for i in eachindex(nest1_array)
       push!(clean_array, nest1_array[i]["sub"])
end

curated_df[!, :sub]=clean_array

Unfortunately, trying to isolate ‘sub’ entries from the nest1_array does not work, giving me below error message:

ERROR: MethodError: no method matching getindex(::Nothing, ::String)

Now, I can vaguely understand the error is because ‘nothing’ is its own data type, and indexing it along with the rest of the strings might be problematic (please do feel free to point out if this is wrong, or if there are better elaborations for the beginner!).

But what would be a solution to this issue? Is there a preexisting method or function that can be used to convert ‘::Nothing’ data rows into a text, like ‘N/A’?

So far only solution I found was to export the array to a text file and just use shell utilities + regex to clean the output, but I would far, far prefer a pure Julia solution. Especially since I have dozens more of these nested columns to go through…

As usual, any advice would be appreciated. Thank you!

Satvik · July 22, 2024, 11:29pm

The most straightforward way is to use an if statement:

for i in eachindex(nest1_array)
       if isnothing(nest1_array[i])
           push!(clean_array, "N/A")
       else
           push!(clean_array, nest1_array[i]["sub"])
       end
end

You can also use replace:

nest1_array_replaced = replace(nest1_array, nothing => Dict("sub" => "N/A")
curated_df[!, :sub] = [x["sub"] for x in nest1_array_replaced]

GHTaarn · July 23, 2024, 2:12am

The if solution by @Satvik is closest to your original code, but a oneliner alternative to the replace solution could be:

curated_df[!, :sub] = [isnothing(x) ? "N/A" : x["sub"] for x in nest1_array]

shrdlu · July 23, 2024, 2:48am

Hi,

Thank you for responding.

I’m a bit befuddled right now - the control flow + ‘isnothing’ based method you described sometimes works and sometimes doesn’t.

For example, here’s a version of your script I executed on a JSON-to-dictionary array (generated by JSON.jl).

for i in eachindex(results_col)
        if isnothing(results_col[i])
                push!(enriched_sample_array, "N/A")
        else
                push!(enriched_sample_array, results_col[i]["enriched_sample"])
        end
end

One would assume ‘nothing’ entries (not present in original data, but present in the JSON.jl generated array) should be replaced with “N/A” - but it’s not the case. Here’s the output of the array after processing via above method:

      nothing
      nothing
      nothing
      nothing
      nothing
      nothing
 false
  true
  true
 false
 false
  true
     ⋮
  true
  true
  true
  true
  true
  true
      nothing
  true
 false
 false
      nothing

And so and so forth for a few of thousand rows.

Which is a problem, since the ‘nothing’ on the column is probably a type of data, not just string, as this error message suggests when I try to export the final dataframe as an external file.

ERROR: LoadError: A `nothing` value was found in column 9 and it is not a printable value.
There are several ways to handle this situation:
1) fix the data, perhaps replace `nothing` with `missing`,
2) use `transform` option with a function to replace `nothing` with whatever value (including `missing`), like `CSV.write(...; transform=(col, val) -> something(val, missing))` or
3) use `TableOperations.transform` option to transform specific columns

Granted, I was able to get around above error by using the transform option as suggested by the error message (like this:)

CSV.write("dataframe.tsv", df; delim='\t', transform=(col,val)->something(val,"N/A"))

But I’m a little bothered by the fact that there’s still ‘Nothing’ data on some of the columns even after initial loop, and even after the CSV.write with transform I see blank spaces on some of the columns rather than N/A.

So - I feel like there’s a weird predictability issue I’m fighting with. I’m sure I’ve done something wrong, but Julia script used to do what I’ve done are so simple I’m not sure what it could be.

Any ideas would be appreciated!

shrdlu · July 23, 2024, 2:50am

This looks great!

I think I can generally understand what’s going on, but would you mind adding a light explanation on what each of the symbols and commands do, just to make sure?

GHTaarn · July 23, 2024, 3:35am

Looking at

curated_df[!, :sub] = [isnothing(x) ? "N/A" : x["sub"] for x in nest1_array]

The left hand side of the = is the same as what you have already used, so hopefully you understand it already, it is just a way of assigning values to a full column of the Dataframe.

The right hand side is what is called an “array comprehension”, it is a way of applying a function to each member of an array, so this would be equivalent to

function f(x)
    return isnothing(x) ? "N/A" : x["sub"]
end

curated_df[!, :sub] = [f(x) for x in nest1_array]

The ? : combination is an operator known from C and many other languages, it is just a shorthand for:

function f(x)
    if isnothing(x)
        return "N/A"
    else
        return x["sub"]
    end
end

curated_df[!, :sub] = [f(x) for x in nest1_array]

Unlike some other languages however, Julia requires whitespace around both the ? and the :

shrdlu · July 23, 2024, 5:50am

Ah, I understand now. Thank you for the very thorough explanation!

Topic		Replies	Views
Can't Convert 'Nothing' to 'Missing' General Usage question	4	2009	June 30, 2020
Dealing with different concepts of "missingness" New to Julia	13	602	February 10, 2021
How to convert all nothings in dataframe to missing? General Usage question	14	1570	January 26, 2021
DataFrame can't get filter to get rid of rows where a column contains a blank " " New to Julia dataframes	8	1078	February 19, 2022
Which should I use, nothing or NA for DataFrames? New to Julia	4	1021	December 30, 2018

How to replace 'Nothing' data type in an array

Related topics