Is there any way to check a Dataframe to see why this error would be returned when running a function on a DF? The error message is too vague. I got this error when running:
unique!(df,[col1,col2,col3])
I’ve only have only received this error on 1 dataset and I clean out missing values beforehand and don’t use undef.
StackTrace
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex
@ .\essentials.jl:917 [inlined]
[2] _broadcast_getindex
@ .\broadcast.jl:644 [inlined]
[3] _getindex
@ .\broadcast.jl:674 [inlined]
[4] _broadcast_getindex
@ .\broadcast.jl:650 [inlined]
[5] getindex
@ .\broadcast.jl:610 [inlined]
[6] copyto_widen!(res::Vector{…}, bc::Base.Broadcast.Broadcasted{…}, pos::Int64, col::Int64)
@ DataFrames C:\Users\programmer8\.julia\packages\DataFrames\kcA9R\src\other\broadcasting.jl:27
[7] copy(bc::Base.Broadcast.Broadcasted{DataFrames.DataFrameStyle, Tuple{…}, typeof(coalesce), Tuple{…}})
@ DataFrames C:\Users\programmer8\.julia\packages\DataFrames\kcA9R\src\other\broadcasting.jl:77
[8] materialize(bc::Base.Broadcast.Broadcasted{DataFrames.DataFrameStyle, Nothing, typeof(coalesce), Tuple{…}})
@ Base.Broadcast .\broadcast.jl:872
[9] top-level scope
@ c:\data\process.jl:111
Some type information was truncated. Use `show(err)` to see complete types.
If I guessed right, it’ll show for each column whether its elements can have a reference and if so, whether they are all assigned. If you see 0, 0 then that’s a column that can throw the error. If that’s the case, you’ll have to reevaluate how you’re instantiating the DataFrame, and you’ll also have to consider the columns that can’t have references because those elements just silently hold garbage values if not assigned.
You’ll need to explain the context, paste exact code, and paste the full stack trace. I could imagine it happening with eachindex(::DataFrame) but that’s not happening. evidently happening but shouldn’t.
Yeah that could be a problem, hence the recommendations to paste the code and stacktrace. If you can’t reproduce the DataFrame object directly, could you at least show typeof(df), eltype.(eachcol(df)) as well?
this is not how you do it. You want to use isassigned(col, idx) to check if a given index in a column is assigned or not, before trying to access it (which is what ismissing.(col) would do)
I defined a string Matrix with some undefined values and then tried to build a DataFrame, but the constructor fails for undefined reference
I wonder if and how it is possible to have an undefined value inside a table in DataFrames
Not really. In general it’s not a good idea to keep undefined entries in Julia arrays. Better use missing or nothing for entries where there’s no value.
I need to revisit that dataset from when I originally raised the question. I got this same error on a dataset yesterday. I had to write a quick function to drill down using a try-catch. Yesterday, the error was from having double quotes inside of a double quoted csv field. The error comes when trying to coalesce() after the data was read in with CSV.jl
"John "Jonathan" Smith","","",""
function find_undef(df::DataFrame)
undef_indices = []
for col in 1:ncol(df)
for row in 1:nrow(df)
try
if ismissing(df[row, col])
push!(undef_indices, (row, col))
end
catch e
msg = string("Error at"*string(row)*","*string(col))
println(msg)
end
end
end
return undef_indices
end
undef_postitions = find_undef(nonm)
println(undef_positions
If you use the data below inside a csv file and then try to read it in, it gives the error Cannot 'convert' an object of type Missing to an object of type String. Removing the types=String, truncates the cell and only places Tom in row 1 col 1.