Hi all,
very noob question but I can’t find an answer. I’m trying to do a linear regression on some data I have:
julia> df_under_test[:, [instruction_sym, measure_power_sym, binary_weight_sym]]
1165×3 DataFrame
  Row │ Instruction          Base power (W)     Binary weight 
      │ String               Quantity…          Int64         
──────┼───────────────────────────────────────────────────────
    1 │ add_r0_r0_r0_ror_23  0.108844±7.4e-5 W             12
    2 │ add_r0_r0_r0_ror_21  0.108616±7.4e-5 W             11
    3 │ add_r0_r0_r0_ror_22  0.108496±7.3e-5 W             11
    4 │ add_r0_r0_r0_ror_25  0.108466±7.5e-5 W             11
    5 │ add_r0_r0_r0_ror_27  0.108389±7.4e-5 W             12
    6 │ add_r0_r0_r0_ror_18  0.108376±7.6e-5 W             10
    7 │ add_r0_r0_r0_ror_24  0.108354±7.5e-5 W             10
    8 │ add_r0_r0_r0_ror_7   0.108344±7.5e-5 W             11
    9 │ add_r0_r0_r0_ror_9   0.108283±7.6e-5 W             10
   10 │ add_r0_r0_r0_ror_26  0.108269±7.4e-5 W             11
   11 │ add_r0_r0_r0_ror_6   0.108252±7.4e-5 W             10
   12 │ add_r0_r0_r0_ror_20  0.108202±7.5e-5 W             10
   13 │ add_r0_r0_r0_ror_19  0.108196±7.5e-5 W             11
   14 │ add_r0_r0_r0_ror_14   0.10818±7.6e-5 W             11
  ⋮   │          ⋮                   ⋮                ⋮
 1152 │ add_r6_r6_r1         0.073308±6.3e-5 W              5
 1153 │ add_r2_r5_r2         0.073302±6.0e-5 W              5
 1154 │ add_r1_r5_r1         0.073237±6.2e-5 W              5
 1155 │ add_r3_r3_r1         0.073213±6.3e-5 W              5
 1156 │ add_r4_r1_r4         0.073161±6.0e-5 W              4
 1157 │ add_r0_r0_r0          0.07305±6.2e-5 W              2
 1158 │ add_r4_r4_r1         0.072969±6.1e-5 W              4
 1159 │ add_r2_r1_r2         0.072941±5.9e-5 W              4
 1160 │ add_r0_r0_r5         0.072929±5.9e-5 W              4
 1161 │ add_r1_r1_r1         0.072881±6.1e-5 W              4
 1162 │ add_r2_r2_r1         0.072863±6.2e-5 W              4
 1163 │ add_r0_r5_r0         0.072772±6.0e-5 W              4
 1164 │ add_r0_r0_r1         0.072522±6.1e-5 W              3
 1165 │ add_r0_r1_r0         0.072425±6.1e-5 W              3
                                             1137 rows omitted
I want to correlate the power with the binary weight, like this:
ols = lm(@formula(measure_power_sym ~ 1 + binary_weight_sym), df_under_test)
where measure_power_sym and binary_weight_sym are the Symbols corresponding to the columns, as you can see in the first listing.
However, this throws
julia> ols = lm(@formula(measure_power_sym ~ 1 + binary_weight_sym), df_under_test)
ERROR: ArgumentError: There isn't a variable called 'measure_power_sym' in your data; the nearest names appear to be: 
Stacktrace:
 [1] ModelFrame(f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::NamedTuple{(:Instruction, Symbol("Base power (W)"), Symbol("Is conditional"), Symbol("Barrel shift amount"), Symbol("Has immediate operand"), Symbol("Immediate amount"), Symbol("Dest reg == source reg"), Symbol("Binary encoding"), Symbol("Binary weight"), :mnemonic), Tuple{SubArray{String, 1, Vector{String}, Tuple{Vector{Int64}}, false}, SubArray{Quantity{Measurement{Float64}, 𝐋^2 𝐌 𝐓^-3, Unitful.FreeUnits{(W,), 𝐋^2 𝐌 𝐓^-3, nothing}}, 1, Vector{Quantity{Measurement{Float64}, 𝐋^2 𝐌 𝐓^-3, Unitful.FreeUnits{(W,), 𝐋^2 𝐌 𝐓^-3, nothing}}}, Tuple{Vector{Int64}}, false}, SubArray{Bool, 1, BitVector, Tuple{Vector{Int64}}, false}, SubArray{Signed, 1, Vector{Signed}, Tuple{Vector{Int64}}, false}, SubArray{Bool, 1, BitVector, Tuple{Vector{Int64}}, false}, SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int64}}, false}, SubArray{Bool, 1, BitVector, Tuple{Vector{Int64}}, false}, SubArray{String15, 1, Vector{String15}, Tuple{Vector{Int64}}, false}, SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int64}}, false}, SubArray{String, 1, Vector{String}, Tuple{Vector{Int64}}, false}}}; model::Type{LinearModel}, contrasts::Dict{Symbol, Any})
   @ StatsModels ~/.julia/packages/StatsModels/G1ClG/src/modelframe.jl:78
 [2] fit(::Type{LinearModel}, f::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term}}, data::SubDataFrame{DataFrame, DataFrames.Index, Vector{Int64}}, args::Nothing; contrasts::Dict{Symbol, Any}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ StatsModels ~/.julia/packages/StatsModels/G1ClG/src/statsmodel.jl:85
 [3] fit
   @ ~/.julia/packages/StatsModels/G1ClG/src/statsmodel.jl:78 [inlined]
 [4] #lm#5
   @ ~/.julia/packages/GLM/4A2DM/src/lm.jl:157 [inlined]
 [5] lm (repeats 2 times)
   @ ~/.julia/packages/GLM/4A2DM/src/lm.jl:157 [inlined]
 [6] top-level scope
   @ REPL[34]:1
So, how should I do regression if the DataFrame’s column names have spaces in them?
Thanks!
