Hi all,
very noob question but I can’t find an answer. I’m trying to do a linear regression on some data I have:
julia> df_under_test[:, [instruction_sym, measure_power_sym, binary_weight_sym]]
1165×3 DataFrame
Row │ Instruction Base power (W) Binary weight
│ String Quantity… Int64
──────┼───────────────────────────────────────────────────────
1 │ add_r0_r0_r0_ror_23 0.108844±7.4e-5 W 12
2 │ add_r0_r0_r0_ror_21 0.108616±7.4e-5 W 11
3 │ add_r0_r0_r0_ror_22 0.108496±7.3e-5 W 11
4 │ add_r0_r0_r0_ror_25 0.108466±7.5e-5 W 11
5 │ add_r0_r0_r0_ror_27 0.108389±7.4e-5 W 12
6 │ add_r0_r0_r0_ror_18 0.108376±7.6e-5 W 10
7 │ add_r0_r0_r0_ror_24 0.108354±7.5e-5 W 10
8 │ add_r0_r0_r0_ror_7 0.108344±7.5e-5 W 11
9 │ add_r0_r0_r0_ror_9 0.108283±7.6e-5 W 10
10 │ add_r0_r0_r0_ror_26 0.108269±7.4e-5 W 11
11 │ add_r0_r0_r0_ror_6 0.108252±7.4e-5 W 10
12 │ add_r0_r0_r0_ror_20 0.108202±7.5e-5 W 10
13 │ add_r0_r0_r0_ror_19 0.108196±7.5e-5 W 11
14 │ add_r0_r0_r0_ror_14 0.10818±7.6e-5 W 11
⋮ │ ⋮ ⋮ ⋮
1152 │ add_r6_r6_r1 0.073308±6.3e-5 W 5
1153 │ add_r2_r5_r2 0.073302±6.0e-5 W 5
1154 │ add_r1_r5_r1 0.073237±6.2e-5 W 5
1155 │ add_r3_r3_r1 0.073213±6.3e-5 W 5
1156 │ add_r4_r1_r4 0.073161±6.0e-5 W 4
1157 │ add_r0_r0_r0 0.07305±6.2e-5 W 2
1158 │ add_r4_r4_r1 0.072969±6.1e-5 W 4
1159 │ add_r2_r1_r2 0.072941±5.9e-5 W 4
1160 │ add_r0_r0_r5 0.072929±5.9e-5 W 4
1161 │ add_r1_r1_r1 0.072881±6.1e-5 W 4
1162 │ add_r2_r2_r1 0.072863±6.2e-5 W 4
1163 │ add_r0_r5_r0 0.072772±6.0e-5 W 4
1164 │ add_r0_r0_r1 0.072522±6.1e-5 W 3
1165 │ add_r0_r1_r0 0.072425±6.1e-5 W 3
1137 rows omitted
I want to correlate the power with the binary weight, like this:
ols = lm(@formula(measure_power_sym ~ 1 + binary_weight_sym), df_under_test)
where measure_power_sym
and binary_weight_sym
are the Symbol
s corresponding to the columns, as you can see in the first listing.
However, this throws
julia> ols = lm(@formula(measure_power_sym ~ 1 + binary_weight_sym), df_under_test)
ERROR: ArgumentError: There isn't a variable called 'measure_power_sym' in your data; the nearest names appear to be:
Stacktrace:
[1] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:Instruction, Symbol("Base power (W)"), Symbol("Is conditional"), Symbol("Barrel shift amount"), Symbol("Has immediate operand"), Symbol("Immediate amount"), Symbol("Dest reg == source reg"), Symbol("Binary encoding"), Symbol("Binary weight"), :mnemonic), Tuple{SubArray{String, 1, Vector{String}, Tuple{Vector{Int64}}, false}, SubArray{Quantity{Measurement{Float64}, 𝐋^2 𝐌 𝐓^-3, Unitful.FreeUnits{(W,), 𝐋^2 𝐌 𝐓^-3, nothing}}, 1, Vector{Quantity{Measurement{Float64}, 𝐋^2 𝐌 𝐓^-3, Unitful.FreeUnits{(W,), 𝐋^2 𝐌 𝐓^-3, nothing}}}, Tuple{Vector{Int64}}, false}, SubArray{Bool, 1, BitVector, Tuple{Vector{Int64}}, false}, SubArray{Signed, 1, Vector{Signed}, Tuple{Vector{Int64}}, false}, SubArray{Bool, 1, BitVector, Tuple{Vector{Int64}}, false}, SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int64}}, false}, SubArray{Bool, 1, BitVector, Tuple{Vector{Int64}}, false}, SubArray{String15, 1, Vector{String15}, Tuple{Vector{Int64}}, false}, SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int64}}, false}, SubArray{String, 1, Vector{String}, Tuple{Vector{Int64}}, false}}}; model::Type{LinearModel}, contrasts::Dict{Symbol, Any})
@ StatsModels ~/.julia/packages/StatsModels/G1ClG/src/modelframe.jl:78
[2] fit(::Type{LinearModel}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::SubDataFrame{DataFrame, DataFrames.Index, Vector{Int64}}, args::Nothing; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ StatsModels ~/.julia/packages/StatsModels/G1ClG/src/statsmodel.jl:85
[3] fit
@ ~/.julia/packages/StatsModels/G1ClG/src/statsmodel.jl:78 [inlined]
[4] #lm#5
@ ~/.julia/packages/GLM/4A2DM/src/lm.jl:157 [inlined]
[5] lm (repeats 2 times)
@ ~/.julia/packages/GLM/4A2DM/src/lm.jl:157 [inlined]
[6] top-level scope
@ REPL[34]:1
So, how should I do regression if the DataFrame’s column names have spaces in them?
Thanks!