How to select all columns instead of one in a dataframe?

question

#1

Hello,
Is there a way to select all dataframe except one single column?
Below is my dataframe.

10×5 DataFrames.DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species  │
├─────┼─────────────┼────────────┼─────────────┼────────────┼──────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ "setosa" │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ "setosa" │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ "setosa" │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ "setosa" │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ "setosa" │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ "setosa" │
│ 7   │ 4.6         │ 3.4        │ 1.4         │ 0.3        │ "setosa" │
│ 8   │ 5.0         │ 3.4        │ 1.5         │ 0.2        │ "setosa" │
│ 9   │ 4.4         │ 2.9        │ 1.4         │ 0.2        │ "setosa" │
│ 10  │ 4.9         │ 3.1        │ 1.5         │ 0.1        │ "setosa" │

If i want to all columns except :PetalWidth . Is there a way to achieve this instead of writing all the column names other the

iris_1 = iris[[:SepalLength,:SepalWidth,:PetalLength,:Species]]


#2
julia> df = dataset("datasets", "iris");

julia> df[filter(x -> x != :PetalWidth, names(df))]

#3

Thank You.


#4

Another, slightly shorter solution:

df[setdiff(names(df), [:PetalWidth])]

#5

Good to have alternative method. Thank You.


#6

Is there a way to do this using Regex with a DataFrame?

It’s a bit difficult since as far as I can tell you can’t perform regex on a symbol, you have to convert all column names to strings first. DataFramesMeta doesn’t have that functionality, but such functionality should probably live in there. Something like

select(df, :x2, :x2, r"^y")

Would evaluate if a keyword argument is a Regex then find all the names in df matching that regex, then collect all the other arguments and call df[args].

The mix of Symbol and Regex would be to emulate Stata, where you can do

keep id_variable x1 x2 y*

#7

If anybody is up for implementations. it’d be probably good to be consistent with the JuliaDB syntax for these operations:

http://juliadb.org/latest/api/selection.html#Column-special-selection-1

For regexes it is select(t, r"^y"), I’ve only now noticed it’s not documented.


#8

Thanks, ill look at this.

BTW, i’m still confused, is there a manifesto somewhere that delineates the use cases of JuliaDB and DataFrames? It feels like JuliaDB is uniformly upstream of DataFrames, but its not clear how.