Why does Julia prefix columnames and colors with a colon :?

Why does Julia prefix variables and colors with a colon : ?

For example:
iris[:SepalWidth]

plot!(collect(1:10), rand(10), color=:red, label="red")

That’s the representation for symbols. See https://docs.julialang.org/en/v1/manual/metaprogramming/#Symbols-1

Just writing red would mean the variable red.

2 Likes

And what would this mean:

iris[SepalWidth]

That depends on whether SepalWidth is defined or not. It would work if it was a variable with value

SepalWidth = :SepalWidth

You can think of symbols as convenient values that have a special role in Julia programs. Presumably here iris is a dictionary with keys that are symbols.

2 Likes

See also: https://en.wikipedia.org/wiki/Symbol_(programming)

1 Like

I meant if it’s not a variable but a column name.

If the item (SepalWidth) isn’t recognized as a value (such as integer, string, or a symbol), it will be looked up as a variable name. If the variable doesn’t exist, an error results.

Note that you can also write iris.SepalWidth:

julia> iris.SepalWidth
150-element Array{Float64,1}:
 3.5
 3.0
 3.2
 3.1
3 Likes

They are actually a nice feature, it’s probably a bit unfortunate that dataframes doesn’t also let you use the more ordinary indexing style with Ints just as easily. I’m pretty sure it is still the case, so it’s tempting to guess it was a design decision

xplot(x,y, c, l) = plot!(x, y, color=Symbol(c), label=String(l));
xplot(collect(1:10), rand(10), "red", :red)

Since column positions can be pretty accidental, I don’t think it is robust practice to use them for indexing in a dataframe.

That is true, but you could easily still opt in by using symbols, and have Ints as a fallback
Add: >?DataFrame does have a few examples with Integer indexing

Note that

is not correct — it works just fine, see the relevant methods in the source.

I was merely pointing out that while it works for DataFrames, I don’t think it is good practice for working with data. Merely regenerating data with a slightly different column order or an extra column can break code very easily.

2 Likes

Yes i just spotted that, and it’s mostly normal too. df[1] and df[:,1] are a little different still (or subject to change?) compared to operations on df2 = reduce(hcat, getfields(df, :columns))

I don’t quite understand what you mean here.

I don’t know what getfields is. Did you misspell getfield? In any case, there is a columns accessor.

That would be correct

There was a df.columns, i’m not sure if there’s others

df[:,1] gives a copy warning, and df2[1] is scalar

So how do you do matrix multiplication?

I mean the column(::DataFrame), which is exported and part of the API. You should avoid accessing fields of a composite type unless that is explicitly declared as the way to work work with them.

Generally, for eg OLS, you form a design matrix. That is not a direct hcat of the columns though, eg for categorical variables, etc.

1 Like

This one? https://juliastats.github.io/StatsModels.jl/latest/formula.html

.- I think i can more or less sum up my view as being ‘Documentation is great when you need it, but not needing it is still kind of preferable’, fwiw
I’m quite probably terrible at API design myself though =) Singling out dataframes because it’s commonly one of the first packages people might use, and that is when user friendliness really either shines or hurts

columns() is in the unexported names on DataFrames v0.14.1