Convert String to Number while maintaining 'missing's

I have a database, Microsoft Access (Yes I know but I don’t know SQL well enough yet) format, of woody plant stem growth. There are 5 plants with 5 shoots each measured for a total of 25. The measurement of the same shoot was repeated every 4 days for 7 periods. These shoots were measured starting before the bud broke so there are string indicators in the database to signify the stage the bud was at the time of measuring. BB=Bud Break, BS=Bud Swell, LO=Leaf out and N/A means the shoot was missing for what ever reason (usually eaten). These codes are there to signify what stage of growth the bud is in and will be used in later analysis so I cannot just change them to 0 in the database.

When I use ODBC.jl to import the dataset by querying the database with an SQL query, It returns with the length measured in a string format, and places it into a DataFrame as a string. Not surprising given all those LO and BB and the BS but for the the purposes of this program, each one of them is a zero.

In order to convert the LO’s, BB’s, BS’s and N/A’s to zero I have used this block of code which seems terribly inelegent and redundant but I cannot get the β€˜|’ or β€˜||’ operator to work. If anyone has a better more elegant way please share.

replace!(df.Length,"lo"=>"0")
replace!(df.Length,"bb"=>"0")
replace!(df.Length,"bs"=>"0")
replace!(df.Length,"n/a"=>"0")

However, as it want to happen sometimes the shoot didn’t get measured. It either got knocked off, or was not visible to or was just plain missed that trip. And this is the rub.

I want to convert the string values in the DataFrame column to Float64 but the missings that are there throws Julia off which causes her to throw an error. I want to maintain the missings in the Length column so I can use regression to estimate what that length would have been has the measurement actually been and replace the missing with a calculated value. I cannot figure out how to do the conversion from String to Float64 while maintaining the β€œmissings”

Thank you
Mike Sergeant

You should not get an error with relplace! here is an example if I understand your data correctly:

julia> x = [1.5, missing, "lo", "bb", "bs", "n/a"]
6-element Vector{Any}:
 1.5
  missing
  "lo"
  "bb"
  "bs"
  "n/a"

julia> replace!(x, "lo" => 0.0, "bb" => 0.0, "bs" => 0.0, "n/a" => 0.0)
6-element Vector{Any}:
 1.5
  missing
 0.0
 0.0
 0.0
 0.0

an alternative way to do it would be e.g.:

julia> map(v -> v in Set(["lo", "bb", "bs", "n/a"]) ? 0.0 : v, x)
6-element Vector{Union{Missing, Float64}}:
 1.5
  missing
 0.0
 0.0
 0.0
 0.0

which creates a new vector with a narrowed down element type.

If you are using parse at any point, you can also benefit from passmissing. The syntax would be passmissing(parse)(Float64, x). This will propagate missing if x is missing. It’s in Missings.jl.

Oh, I did not see the passmissing command in the documents. Thank you.

Correct me if I am wrong but wasn’t Missings.jl incorporated into the base Julia?

I’m not getting an error. It works just seemed redundant and rather inelegant. The data actually looks like this:

 Row β”‚ Farm                   Variety  Period  Date                 Plant  Shoot   Length  
      β”‚ String?                String?  Int16?  DateTime…?           Int8   Int16?  String? 
──────┼─────────────────────────────────────────────────────────────────────────────────────
    1 β”‚ Talsma                 Aurora        1  2019-05-15T00:00:00      1       1  7.01
    2 β”‚ Talsma                 Aurora        1  2019-05-15T00:00:00      1       2  8.59
    3 β”‚ Talsma                 Aurora        1  2019-05-15T00:00:00      1       3  LO
    4 β”‚ Talsma                 Aurora        1  2019-05-15T00:00:00      1       4  LO

No, the definition of missing was, as well as a few helper functions like skipmissing, but most of the helper functions remained in Missings.jl.

Oh, thank you for clearing that up.

When I try various interations of passmissing(parse)(Float64,df) in the REPL, I get the following error.

 MethodError: no method matching parse(::Type{Float64}, ::DataFrame)
Closest candidates are:
  parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at parse.jl:379
Stacktrace:
 [1] macro expansion
   @ ~\.julia\packages\Missings\hn4Ye\src\Missings.jl:0 [inlined]
 [2] (::Missings.PassMissing{typeof(parse)})(::Type{Float64}, ::DataFrame)
   @ Missings ~\.julia\packages\Missings\hn4Ye\src\Missings.jl:195
 [3] top-level scope
   @ REPL[37]:1

I have tried

parse.(Float64,passmissing(df[!,:Length]))
passmissing(parse.(Float64,df[!,:Length]))
parse(Float64,passmissing(df.Length))
passmissing(parse(Float64,df.Length))

All of whom gave a long error stack

Closest candidates are:
  parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at parse.jl:379
Stacktrace:
  [1] _broadcast_getindex_evalf
    @ .\broadcast.jl:648 [inlined]
  [2] _broadcast_getindex
    @ .\broadcast.jl:631 [inlined]
  [3] getindex
    @ .\broadcast.jl:575 [inlined]
  [4] macro expansion
    @ .\broadcast.jl:984 [inlined]
  [5] macro expansion
    @ .\simdloop.jl:77 [inlined]
  [6] copyto!
    @ .\broadcast.jl:983 [inlined]
  [7] copyto!
    @ .\broadcast.jl:936 [inlined]
  [8] copy
    @ .\broadcast.jl:908 [inlined]
  [9] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(parse), Tuple{Base.RefValue{Type{Float64}}, Vector{Union{Missing, String}}}})
    @ Base.Broadcast .\broadcast.jl:883
 [10] top-level scope
    @ REPL[40]:1

Note quite! passmissing actually wraps functions.

julia> using Missings;

julia> x = ["1.09", "4.6", missing];

julia> passmissing(parse).(Float64, x)
3-element Vector{Union{Missing, Float64}}:
 1.09
 4.6
  missing
1 Like

Ooooooohh there is the dot operator in there. Missed that also.

Works fantastic! Thank you!