Spaces in query.jl


#21

@davidanthoff How will this work for a situation like this:

@from i in df begin
	       @where i.Parking_Tax == true
	       @select i
	       @collect DataFrame
       end

where the name of the column is “Parking Tax”. Previous version of DataFrames converted the name in “Parking_Tax” but not the latest version. So now I’m getting: type NamedTuple has no field Parking_Tax

I mean this does work but it’s part of a tutorial for beginners and it really makes things ugly and complicated:

@from i in df begin
	@where getproperty(i, Symbol("Parking Tax")) == true
	@select i
	@collect DataFrame
end

Update 1

This is acceptable, but if there’s a better way, I’d be grateful to learn about it:

@from i in df begin
	@where i[Symbol("Parking Tax")] == true
	@select i
	@collect DataFrame
end

Thanks


#22

Not sure it applies here, but foo.”bar” is valid Julia syntax and calls getproperty with a string.


#23

Thanks, yes, I tried that but it errors out because there is no getproperty defined which accepts a string as its second argument.

I ended up renaming the columns

rename!(df, [n => replace(string(n), " "=>"_") |> Symbol for n in names(df)])

#24

If you create macro S_str then you could use @where i[S"Parking Tax"]==true . Is that unacceptable for you too?

I was played with MWE maybe it could be useful for somebody:

julia> using DataFrames, Query, CSV

julia> macro S_str(a) :(Symbol($a)) end

julia> io = IOBuffer("""Parking Tax,col2
       true,2
       false,6""");

julia> df = CSV.File(io) |> DataFrame
2×2 DataFrame
│ Row │ Parking Tax │ col2   │
│     │ Bool⍰       │ Int64⍰ │
├─────┼─────────────┼────────┤
│ 1   │ true        │ 2      │
│ 2   │ false       │ 6      │

julia> @from i in df begin
               @where i[S"Parking Tax"]==true
               @select i
               @collect DataFrame
       end
1×2 DataFrame
│ Row │ Parking Tax │ col2   │
│     │ Bool⍰       │ Int64⍰ │
├─────┼─────────────┼────────┤
│ 1   │ true        │ 2      │


#25

I think that this looks quite nice - but I’d rather stay away from it in a beginners tutorial.


#26

Oh sorry! I missed that it is for beginners. But do you plan to put there this line?


#27

Comprehensions have been previously introduced - but maybe you’re right and should be done with an iteration. :thinking:

Edit 1:
Yes, definitely, good point! At least I can show the iteration and add that it can be done with a comprehension. Having the two versions side by side should clarify the comprehension syntax too.


#28

rename! can also take a function if you want to apply the same transformation to all names:

julia> rename!(df) do n
           s = replace(string(n), ' ' => '_')
           Symbol(s)
       end

which is not a one liner but is probably a bit easier to parse visually.

Overall it feels like renaming columns is the best solution as it also offers this nice extra exercise.


#29

I don’t have a better idea than what was posted here already… At the end of the day this boils down to how one can interact with named tuples in julia.

I do think a macro a la s"foo bar" that is equivalent to Symbol("foo bar") would be nice, but that should probably be done (if at all) in base…


#30

FWIW, you can also do CSV.File(..., normalizenames=true) on import to avoid such names.


#31

What about adding a getproperty method for strings, so that foo.“bar baz” Just Works?


#32

I like that idea, but it would have to be implemented in base for NamedTuple, otherwise it would be a bad case of type piracy.