I have a database, Microsoft Access (Yes I know but I donβt know SQL well enough yet) format, of woody plant stem growth. There are 5 plants with 5 shoots each measured for a total of 25. The measurement of the same shoot was repeated every 4 days for 7 periods. These shoots were measured starting before the bud broke so there are string indicators in the database to signify the stage the bud was at the time of measuring. BB=Bud Break, BS=Bud Swell, LO=Leaf out and N/A means the shoot was missing for what ever reason (usually eaten). These codes are there to signify what stage of growth the bud is in and will be used in later analysis so I cannot just change them to 0 in the database.
When I use ODBC.jl to import the dataset by querying the database with an SQL query, It returns with the length measured in a string format, and places it into a DataFrame as a string. Not surprising given all those LO and BB and the BS but for the the purposes of this program, each one of them is a zero.
In order to convert the LOβs, BBβs, BSβs and N/Aβs to zero I have used this block of code which seems terribly inelegent and redundant but I cannot get the β|β or β||β operator to work. If anyone has a better more elegant way please share.
replace!(df.Length,"lo"=>"0")
replace!(df.Length,"bb"=>"0")
replace!(df.Length,"bs"=>"0")
replace!(df.Length,"n/a"=>"0")
However, as it want to happen sometimes the shoot didnβt get measured. It either got knocked off, or was not visible to or was just plain missed that trip. And this is the rub.
I want to convert the string values in the DataFrame column to Float64 but the missings that are there throws Julia off which causes her to throw an error. I want to maintain the missings in the Length column so I can use regression to estimate what that length would have been has the measurement actually been and replace the missing with a calculated value. I cannot figure out how to do the conversion from String to Float64 while maintaining the βmissingsβ
Thank you
Mike Sergeant
You should not get an error with relplace!
here is an example if I understand your data correctly:
julia> x = [1.5, missing, "lo", "bb", "bs", "n/a"]
6-element Vector{Any}:
1.5
missing
"lo"
"bb"
"bs"
"n/a"
julia> replace!(x, "lo" => 0.0, "bb" => 0.0, "bs" => 0.0, "n/a" => 0.0)
6-element Vector{Any}:
1.5
missing
0.0
0.0
0.0
0.0
an alternative way to do it would be e.g.:
julia> map(v -> v in Set(["lo", "bb", "bs", "n/a"]) ? 0.0 : v, x)
6-element Vector{Union{Missing, Float64}}:
1.5
missing
0.0
0.0
0.0
0.0
which creates a new vector with a narrowed down element type.
If you are using parse
at any point, you can also benefit from passmissing
. The syntax would be passmissing(parse)(Float64, x)
. This will propagate missing
if x
is missing
. Itβs in Missings.jl.
Oh, I did not see the passmissing command in the documents. Thank you.
Correct me if I am wrong but wasnβt Missings.jl incorporated into the base Julia?
Iβm not getting an error. It works just seemed redundant and rather inelegant. The data actually looks like this:
Row β Farm Variety Period Date Plant Shoot Length
β String? String? Int16? DateTimeβ¦? Int8 Int16? String?
βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β Talsma Aurora 1 2019-05-15T00:00:00 1 1 7.01
2 β Talsma Aurora 1 2019-05-15T00:00:00 1 2 8.59
3 β Talsma Aurora 1 2019-05-15T00:00:00 1 3 LO
4 β Talsma Aurora 1 2019-05-15T00:00:00 1 4 LO
No, the definition of missing
was, as well as a few helper functions like skipmissing
, but most of the helper functions remained in Missings.jl
.
Oh, thank you for clearing that up.
When I try various interations of passmissing(parse)(Float64,df) in the REPL, I get the following error.
MethodError: no method matching parse(::Type{Float64}, ::DataFrame)
Closest candidates are:
parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at parse.jl:379
Stacktrace:
[1] macro expansion
@ ~\.julia\packages\Missings\hn4Ye\src\Missings.jl:0 [inlined]
[2] (::Missings.PassMissing{typeof(parse)})(::Type{Float64}, ::DataFrame)
@ Missings ~\.julia\packages\Missings\hn4Ye\src\Missings.jl:195
[3] top-level scope
@ REPL[37]:1
I have tried
parse.(Float64,passmissing(df[!,:Length]))
passmissing(parse.(Float64,df[!,:Length]))
parse(Float64,passmissing(df.Length))
passmissing(parse(Float64,df.Length))
All of whom gave a long error stack
Closest candidates are:
parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at parse.jl:379
Stacktrace:
[1] _broadcast_getindex_evalf
@ .\broadcast.jl:648 [inlined]
[2] _broadcast_getindex
@ .\broadcast.jl:631 [inlined]
[3] getindex
@ .\broadcast.jl:575 [inlined]
[4] macro expansion
@ .\broadcast.jl:984 [inlined]
[5] macro expansion
@ .\simdloop.jl:77 [inlined]
[6] copyto!
@ .\broadcast.jl:983 [inlined]
[7] copyto!
@ .\broadcast.jl:936 [inlined]
[8] copy
@ .\broadcast.jl:908 [inlined]
[9] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(parse), Tuple{Base.RefValue{Type{Float64}}, Vector{Union{Missing, String}}}})
@ Base.Broadcast .\broadcast.jl:883
[10] top-level scope
@ REPL[40]:1
Note quite! passmissing
actually wraps functions.
julia> using Missings;
julia> x = ["1.09", "4.6", missing];
julia> passmissing(parse).(Float64, x)
3-element Vector{Union{Missing, Float64}}:
1.09
4.6
missing
1 Like
Ooooooohh there is the dot operator in there. Missed that also.
Works fantastic! Thank you!