On iterating through columns of a data frame and check if they contain String values

I am still new to Julia and I am a bit confused about how loop through the columns of a data frame and check their type. I have looked into the documentation and couldn’t find any clarification yet, so I am posting this here.

For the sake of discussion, let’s say I want to check the type of each column in a data frame and perform an operation on those that are not String (perhaps I should say that don’t contain elements of type String). Would the following snippet be correct (in Julia v1.4.1 with DataFrames v0.20.2)?

using DataFrames

df = DataFrame(text = ["a", "b", "c"], num1 = [1, 3, 5], num2 = [2.0, 4.0, 6.0])

for nm in names(df)
    if eltype(df[!, nm]) != String
        df[!, nm] = df[!, nm]./100
    end
end

What would be the best practice to accomplish this?

Incidentally, and I beg pardon in advance for asking this in the wrong category, in VS Code v1.44.2, the word “df” in names(df), eltype(df[!, nm]) and df[!, nm]./100 has an underlying yellow/orange wavy line (the editor claims “Missing reference: df Julia(Julia)”), as well as it shows a wawy reddish underlying line from String to the equal after df[!, nm] in the next line (in this case it throws a “Parsing error Julia(Julia)” message). Why is that the case? Are these warnings/errors? The code runs just fine without any message from Julia though…

versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

I think you’ve got the key best practice idea as far as writing the data in place.

In VS Code all you may need to do is include the Julia Language Support extension.

Yep that’s it. There is an iterator eachcol and also a function mapcols that can help you do this. Unfortunately, there is no mapnumericcols, so you always need to check column types as in your example (so it basically isn’t really any easier with those functions).

Thank you @oliver and @tbeason so much for your answers. I assume I can/should close this post then (how?).

You can just mark one of our replies as the “Solution”.