On iterating through columns of a data frame and check if they contain String values

abianco · April 23, 2020, 4:11pm

I am still new to Julia and I am a bit confused about how loop through the columns of a data frame and check their type. I have looked into the documentation and couldn’t find any clarification yet, so I am posting this here.

For the sake of discussion, let’s say I want to check the type of each column in a data frame and perform an operation on those that are not String (perhaps I should say that don’t contain elements of type String). Would the following snippet be correct (in Julia v1.4.1 with DataFrames v0.20.2)?

using DataFrames

df = DataFrame(text = ["a", "b", "c"], num1 = [1, 3, 5], num2 = [2.0, 4.0, 6.0])

for nm in names(df)
    if eltype(df[!, nm]) != String
        df[!, nm] = df[!, nm]./100
    end
end

What would be the best practice to accomplish this?

Incidentally, and I beg pardon in advance for asking this in the wrong category, in VS Code v1.44.2, the word “df” in names(df), eltype(df[!, nm]) and df[!, nm]./100 has an underlying yellow/orange wavy line (the editor claims “Missing reference: df Julia(Julia)”), as well as it shows a wawy reddish underlying line from String to the equal after df[!, nm] in the next line (in this case it throws a “Parsing error Julia(Julia)” message). Why is that the case? Are these warnings/errors? The code runs just fine without any message from Julia though…

versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

oliver · April 28, 2020, 6:17pm

I think you’ve got the key best practice idea as far as writing the data in place.

In VS Code all you may need to do is include the Julia Language Support extension.

tbeason · April 28, 2020, 6:32pm

Yep that’s it. There is an iterator eachcol and also a function mapcols that can help you do this. Unfortunately, there is no mapnumericcols, so you always need to check column types as in your example (so it basically isn’t really any easier with those functions).

abianco · April 28, 2020, 8:21pm

Thank you @oliver and @tbeason so much for your answers. I assume I can/should close this post then (how?).

tbeason · April 28, 2020, 8:49pm

You can just mark one of our replies as the “Solution”.

Topic		Replies	Views
Help me with dataFrames New to Julia question	2	258	April 8, 2022
How to format the type of a column in Julia? New to Julia dataframes	2	724	August 26, 2021
Replacing some DataFrame values based on their type, for multiple columns - limits of the df.colname syntax New to Julia	6	225	May 22, 2024
Type conversion driving me crazy Data dataframes	10	1157	September 14, 2022
How can I find the subset of dataframe that contain elements with Number datatype? Performance dataframes	2	268	July 27, 2021

On iterating through columns of a data frame and check if they contain String values

Related topics