Why does Julia prefix variables and colors with a colon : ?
For example:
iris[:SepalWidth]
plot!(collect(1:10), rand(10), color=:red, label="red")
Why does Julia prefix variables and colors with a colon : ?
For example:
iris[:SepalWidth]
plot!(collect(1:10), rand(10), color=:red, label="red")
That’s the representation for symbols. See Metaprogramming · The Julia Language
Just writing red
would mean the variable red
.
And what would this mean:
iris[SepalWidth]
That depends on whether SepalWidth is defined or not. It would work if it was a variable with value
SepalWidth = :SepalWidth
You can think of symbols as convenient values that have a special role in Julia programs. Presumably here iris
is a dictionary with keys that are symbols.
See also: Symbol (programming) - Wikipedia
I meant if it’s not a variable but a column name.
If the item (SepalWidth
) isn’t recognized as a value (such as integer, string, or a symbol), it will be looked up as a variable name. If the variable doesn’t exist, an error results.
Note that you can also write iris.SepalWidth
:
julia> iris.SepalWidth
150-element Array{Float64,1}:
3.5
3.0
3.2
3.1
They are actually a nice feature, it’s probably a bit unfortunate that dataframes doesn’t also let you use the more ordinary indexing style with Ints just as easily. I’m pretty sure it is still the case, so it’s tempting to guess it was a design decision
xplot(x,y, c, l) = plot!(x, y, color=Symbol(c), label=String(l));
xplot(collect(1:10), rand(10), "red", :red)
Since column positions can be pretty accidental, I don’t think it is robust practice to use them for indexing in a dataframe.
That is true, but you could easily still opt in by using symbols, and have Ints as a fallback
Add: >?DataFrame
does have a few examples with Integer indexing
Note that
is not correct — it works just fine, see the relevant methods in the source.
I was merely pointing out that while it works for DataFrames, I don’t think it is good practice for working with data. Merely regenerating data with a slightly different column order or an extra column can break code very easily.
Yes i just spotted that, and it’s mostly normal too. df[1] and df[:,1] are a little different still (or subject to change?) compared to operations on df2 = reduce(hcat, getfields(df, :columns))
I don’t quite understand what you mean here.
I don’t know what getfields
is. Did you misspell getfield
? In any case, there is a columns
accessor.
That would be correct
There was a df.columns
, i’m not sure if there’s others
df[:,1]
gives a copy warning, and df2[1]
is scalar
So how do you do matrix multiplication?
I mean the column(::DataFrame)
, which is exported and part of the API. You should avoid accessing fields of a composite type unless that is explicitly declared as the way to work work with them.
Generally, for eg OLS, you form a design matrix. That is not a direct hcat
of the columns though, eg for categorical variables, etc.
This one? https://juliastats.github.io/StatsModels.jl/latest/formula.html
.- I think i can more or less sum up my view as being ‘Documentation is great when you need it, but not needing it is still kind of preferable’, fwiw
I’m quite probably terrible at API design myself though =) Singling out dataframes because it’s commonly one of the first packages people might use, and that is when user friendliness really either shines or hurts
columns()
is in the unexported names on DataFrames v0.14.1