Why does Julia prefix columnames and colors with a colon :?


#1

Why does Julia prefix variables and colors with a colon : ?

For example:
iris[:SepalWidth]

plot!(collect(1:10), rand(10), color=:red, label="red")


#2

That’s the representation for symbols. See https://docs.julialang.org/en/v1/manual/metaprogramming/#Symbols-1


#3

Just writing red would mean the variable red.


#4

And what would this mean:

iris[SepalWidth]


#5

That depends on whether SepalWidth is defined or not. It would work if it was a variable with value

SepalWidth = :SepalWidth

You can think of symbols as convenient values that have a special role in Julia programs. Presumably here iris is a dictionary with keys that are symbols.


#6

See also: https://en.wikipedia.org/wiki/Symbol_(programming)


#7

I meant if it’s not a variable but a column name.


#8

If the item (SepalWidth) isn’t recognized as a value (such as integer, string, or a symbol), it will be looked up as a variable name. If the variable doesn’t exist, an error results.


#9

Note that you can also write iris.SepalWidth:

julia> iris.SepalWidth
150-element Array{Float64,1}:
 3.5
 3.0
 3.2
 3.1

#10

They are actually a nice feature, it’s probably a bit unfortunate that dataframes doesn’t also let you use the more ordinary indexing style with Ints just as easily. I’m pretty sure it is still the case, so it’s tempting to guess it was a design decision

xplot(x,y, c, l) = plot!(x, y, color=Symbol(c), label=String(l));
xplot(collect(1:10), rand(10), "red", :red)

#11

Since column positions can be pretty accidental, I don’t think it is robust practice to use them for indexing in a dataframe.


#12

That is true, but you could easily still opt in by using symbols, and have Ints as a fallback
Add: >?DataFrame does have a few examples with Integer indexing


#13

Note that

is not correct — it works just fine, see the relevant methods in the source.

I was merely pointing out that while it works for DataFrames, I don’t think it is good practice for working with data. Merely regenerating data with a slightly different column order or an extra column can break code very easily.


#14

Yes i just spotted that, and it’s mostly normal too. df[1] and df[:,1] are a little different still (or subject to change?) compared to operations on df2 = reduce(hcat, getfields(df, :columns))


#15

I don’t quite understand what you mean here.

I don’t know what getfields is. Did you misspell getfield? In any case, there is a columns accessor.


#16

That would be correct

There was a df.columns, i’m not sure if there’s others

df[:,1] gives a copy warning, and df2[1] is scalar


#17

So how do you do matrix multiplication?


#18

I mean the column(::DataFrame), which is exported and part of the API. You should avoid accessing fields of a composite type unless that is explicitly declared as the way to work work with them.

Generally, for eg OLS, you form a design matrix. That is not a direct hcat of the columns though, eg for categorical variables, etc.


#19

This one? https://juliastats.github.io/StatsModels.jl/latest/formula.html

.- I think i can more or less sum up my view as being ‘Documentation is great when you need it, but not needing it is still kind of preferable’, fwiw
I’m quite probably terrible at API design myself though =) Singling out dataframes because it’s commonly one of the first packages people might use, and that is when user friendliness really either shines or hurts


#20

columns() is in the unexported names on DataFrames v0.14.1