Print dataframe column names from within a function

I have a simple question about printing dataframe names. I apologize if this has already been asked - I searched but could not find an answer.

I sometimes work with Dataframes of 20+ columns, and I cannot remember the names of all the columns. In addition, I like to subset Dataframes for analysis. Thus, I like to print the column names with a colon and comma, to enable an easy copy and paste to subset a dataframe.

Currently, I use a comprehension for this purpose. But for ease of use, I would like to print from within a function. And when I try to print from within a function, I get an extra array of β€œNothing” data that fills up the REPL. Is there a way to suppress the extra β€œNothing” data from printing within a function?

Here is a MWE:

# generate 20-column dataframe
using DataFrames
df = DataFrame(col1 = rand(2), col2 = rand(2), col3 = rand(2), col4 = rand(2), col5 = rand(2), col6 = rand(2), col7 = rand(2), col8 = rand(2), col9 = rand(2), col10 = rand(2), col11 = rand(2), col12 = rand(2), col13 = rand(2), col14 = rand(2), col15 = rand(2), col16 = rand(2), col17 = rand(2), col18 = rand(2), col19 = rand(2), col20 = rand(2));

# Method 1 - print names without semicolon (produces printed names and 20-element Array{Nothing, 1})
[print(string(":", x, ", ")) for x in names(df)]

# Method 2 - print names with semicolon (produces printed names only)
[print(string(":", x, ", ")) for x in names(df)];

# Method 3 - use function (produces printed names and 20-element Array{Nothing, 1})
function dfnames(df::DataFrame)
    [print(string(":", x, ", ")) for x in names(df)];
end

dfnames(df)

Thanks for any thoughts.

The print function returns nothing, since all it does is print something and not actually return an object.

You are using an array comprehension ([x for x in items]) instead of a for loop. It sounds like you want

function dfnames(df::DataFrame)
   for name in names(df)
       print(":", name, ", ")
   end
end
2 Likes

You could use join as in

julia> df = DataFrame(a = ones(3), b = string.('a':'c'), c = rand(3)) 
3Γ—3 DataFrame
β”‚ Row β”‚ a       β”‚ b      β”‚ c        β”‚
β”‚     β”‚ Float64 β”‚ String β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0     β”‚ a      β”‚ 0.849903 β”‚
β”‚ 2   β”‚ 1.0     β”‚ b      β”‚ 0.61542  β”‚
β”‚ 3   β”‚ 1.0     β”‚ c      β”‚ 0.663782 β”‚

julia> join(string.(':', names(df)), ", ")
":a, :b, :c"
2 Likes

Thank you @pdeffebach and @dmbates - both excellent answers and solutions that I will use.

Just for my own education, do you know why the following:

[print(string(":", x, ", ")) for x in names(df)];

does not print the array of β€œNothing”, yet the following does print the array of nothing?

function dfnames(df::DataFrame)
    [print(string(":", x, ", ")) for x in names(df)];
end

Is it because the semicolon is ignored inside a function?

Thanks again.

Your function returns the value of the array, so in the REPL it’s printing the output. If you call dfnames(df); you will get the same behavior (not printing an array of nothings).

Got it, thank you.