So for either column, I would like to get the maximum ranges of contiguous valid data, e.g. the lines (indexes or actual lines) range 6 to 10 for column 1.
Or for missings 2 to 6 and 8 to 11 for column 2.
I started to think about running through the lines etc. but I guess that there’s a more native way to do that.
function max_valid_run(v)
max_len = 0
curr_len = 0
for x in v
if x !== missing && !isnan(x)
curr_len += 1
max_len = max(max_len, curr_len)
else
curr_len = 0
end
end
return max_len
end
mapcols(col -> max_valid_run(col), df)
Probably is a more efficient way of checking the columns but have to pay some care as isnan(missing) gives missing as opposed to false so !isnan(x) && x !== missing would throw a TypeError
(1 ∉ ix) reads pretty nice and is my preferred syntax for this by a lot, but note that you can instead write it as !in(1, ix) if you prefer or need to avoid non-ASCII in code.
Any Unicode operator or function name in Julia base language will always have an equivalent pure ASCII version of it that works the same. So the mathematical syntax is really nice to have, but always optional.