How to replace 'Nothing' data type in an array

Hi everyone,

I’m currently trying to untangle a pretty messy multi-nested json file by flattening them out manually and curating them into a dataframe (mostly following the method described here: https://julia.school/julia/json/).

Some of the sub-nested elements are giving me some issues - I ended up with arrays like below, let’s say named ‘nest1_array’:

 Dict{String, Any}("sub" => "K2")
 Dict{String, Any}("sub" => "A5")
 Dict{String, Any}("sub" => "G1")
 Dict{String, Any}("sub" => "G1")
 Dict{String, Any}("sub" => "B1")
 nothing
 Dict{String, Any}("sub" => "GD2")
 nothing
 nothing
 nothing
 Dict{String, Any}("sub" => "A6")
 Dict{String, Any}("sub" => "C1")

Now my goal is do below to isolate ‘sub’ elements from above into its own array and push it into a manually curated dataframe, via:

clean_array=Any[]

for i in eachindex(nest1_array)
       push!(clean_array, nest1_array[i]["sub"])
end

curated_df[!, :sub]=clean_array

Unfortunately, trying to isolate ‘sub’ entries from the nest1_array does not work, giving me below error message:

ERROR: MethodError: no method matching getindex(::Nothing, ::String)

Now, I can vaguely understand the error is because ‘nothing’ is its own data type, and indexing it along with the rest of the strings might be problematic (please do feel free to point out if this is wrong, or if there are better elaborations for the beginner!).

But what would be a solution to this issue? Is there a preexisting method or function that can be used to convert ‘::Nothing’ data rows into a text, like ‘N/A’?

So far only solution I found was to export the array to a text file and just use shell utilities + regex to clean the output, but I would far, far prefer a pure Julia solution. Especially since I have dozens more of these nested columns to go through…

As usual, any advice would be appreciated. Thank you!

The most straightforward way is to use an if statement:

for i in eachindex(nest1_array)
       if isnothing(nest1_array[i])
           push!(clean_array, "N/A")
       else
           push!(clean_array, nest1_array[i]["sub"])
       end
end

You can also use replace:

nest1_array_replaced = replace(nest1_array, nothing => Dict("sub" => "N/A")
curated_df[!, :sub] = [x["sub"] for x in nest1_array_replaced]
1 Like

The if solution by @Satvik is closest to your original code, but a oneliner alternative to the replace solution could be:

curated_df[!, :sub] = [isnothing(x) ? "N/A" : x["sub"] for x in nest1_array]
1 Like

Hi,

Thank you for responding.

I’m a bit befuddled right now - the control flow + ‘isnothing’ based method you described sometimes works and sometimes doesn’t.

For example, here’s a version of your script I executed on a JSON-to-dictionary array (generated by JSON.jl).

for i in eachindex(results_col)
        if isnothing(results_col[i])
                push!(enriched_sample_array, "N/A")
        else
                push!(enriched_sample_array, results_col[i]["enriched_sample"])
        end
end

One would assume ‘nothing’ entries (not present in original data, but present in the JSON.jl generated array) should be replaced with “N/A” - but it’s not the case. Here’s the output of the array after processing via above method:

      nothing
      nothing
      nothing
      nothing
      nothing
      nothing
 false
  true
  true
 false
 false
  true
     ⋮
  true
  true
  true
  true
  true
  true
      nothing
  true
 false
 false
      nothing

And so and so forth for a few of thousand rows.

Which is a problem, since the ‘nothing’ on the column is probably a type of data, not just string, as this error message suggests when I try to export the final dataframe as an external file.

ERROR: LoadError: A `nothing` value was found in column 9 and it is not a printable value.
There are several ways to handle this situation:
1) fix the data, perhaps replace `nothing` with `missing`,
2) use `transform` option with a function to replace `nothing` with whatever value (including `missing`), like `CSV.write(...; transform=(col, val) -> something(val, missing))` or
3) use `TableOperations.transform` option to transform specific columns

Granted, I was able to get around above error by using the transform option as suggested by the error message (like this:)

CSV.write("dataframe.tsv", df; delim='\t', transform=(col,val)->something(val,"N/A"))

But I’m a little bothered by the fact that there’s still ‘Nothing’ data on some of the columns even after initial loop, and even after the CSV.write with transform I see blank spaces on some of the columns rather than N/A.

So - I feel like there’s a weird predictability issue I’m fighting with. I’m sure I’ve done something wrong, but Julia script used to do what I’ve done are so simple I’m not sure what it could be.

Any ideas would be appreciated!

This looks great!

I think I can generally understand what’s going on, but would you mind adding a light explanation on what each of the symbols and commands do, just to make sure?

Looking at

curated_df[!, :sub] = [isnothing(x) ? "N/A" : x["sub"] for x in nest1_array]

The left hand side of the = is the same as what you have already used, so hopefully you understand it already, it is just a way of assigning values to a full column of the Dataframe.

The right hand side is what is called an “array comprehension”, it is a way of applying a function to each member of an array, so this would be equivalent to

function f(x)
    return isnothing(x) ? "N/A" : x["sub"]
end

curated_df[!, :sub] = [f(x) for x in nest1_array]

The ? : combination is an operator known from C and many other languages, it is just a shorthand for:

function f(x)
    if isnothing(x)
        return "N/A"
    else
        return x["sub"]
    end
end

curated_df[!, :sub] = [f(x) for x in nest1_array]

Unlike some other languages however, Julia requires whitespace around both the ? and the :

1 Like

Ah, I understand now. Thank you for the very thorough explanation!

1 Like