Conditional selection of columns in for loop

Hey guys,

Just another beginner-question, I think there is an easy way of doing this:
I have a Dataframe with 50 columns that each contain 10 million rows of sensor values. I want to resample those, but want to apply differnt resampling algorithms on some of those columns. Since the columnnames are the names of the Sensors I was planning to do this:

for i in df.names
if occursin(“Temp”, i) == true
…do this
end

But I found out that df.names returns type Symbol which can’t be compaired with a string. As far as I understood. Any help would be greatly appreciated. If you have advice how to do it differently and better performancewise I am very happy for suggestions!
Thanks in advance.

EDIT: Just found out that I can convert the columnames vector to a string vector by doing:
colnames_string=[ string(col) for col in colnames_symbol]

Still if there is a better and faster way, I would appreciate hints!

Best
Merit

You could do the string conversion directly in the comparison and would save yourself the extra allocation, because [...] actually allocates extra space. So depending on how often you have to iterate over your DataFrame, and on which level you did the allocation, this might help.

Also in my julia version (v1.2) I access the names via names(df), see also the documentation

So for me worked:

for i in names(df)
    if occursin("Temp", string(i)) == true
        @show i
    end
end

This does of course not answer the question for an alternative faster way :wink:

You can also use “string” using the dot notation, it is very readable:

for i in string.(names)
if occursin(“Temp”, i) == true
…do this
end
1 Like

Great, that worked perfectly! Thanks a lot!

normally I would write:

for col in eachcol(df[!, r"Temp"])
    # do something with col
end

in such a case

2 Likes