Question about comprehensions

I have found Julia’s comprehensions to be one of its most useful features, but I am running into a problem which I hope someone can help me out with.

Let’s say I have an array of size N x 10 in a variable named data, where N is very large. It looks like this:

No. Time dt Fx Fy
0 0 0.00454545 0 0
1 0.00454545 0.00454545 -0.889309 0.016332

Then, using the following one-liner, I can extract those values of Fx and Fy for which the time obeys certain conditions, e.g., that it be greater than 100:

[data[i,4:5] for i in 1:length(data[:,1]) if data[i,2] > 100]

However, this gives me an array of arrays instead of a n x 2 array:

[1.49301, 1.12155]
 [1.49276, 1.12132]
 [1.49267, 1.12127]
...

The problem with this is that I would then like to average both Fx and Fx along the time dimension. And the formulation mean(A,dims=1) requires A to be a 2D array, not a 1D array of arrays. So what I’d like is to be able to do something like this:

[mean(data[i,4:5],dims=1) for i in 1:length(data[:,1]) if data[i,2] > 100]
and to then get a single 1 x 2 array contianing the averages.

I understand that I can easily do this sort of calculation by writing a few more lines of code. But Julia is so close to giving me a single, elegant line of code that will do this calculation in one go! Can anyone help me figure out how to make the output of a comprehension like this be a n x 2 array instead of an array of arrays?

It’s very possible there’s a better way to get what you want, but here is one way that also uses a generator, but not an array comprehension per se


julia> using StatsBase

julia> a = rand(100, 2);

julia> mean(a[i, :] for i in axes(a, 1) if a[i, 2] > 0.5)
2-element Array{Float64,1}:
 0.5259827860575703
 0.7700552962146116
1 Like

Works like a charm, thank you!

Shouldn’t a simple mean(A) work (without the dims argument)?

1 Like

You could also consider using a DataFrame for this task, which will let you refer to columns by name and chain operations together:

using DataFrames, Pipe
data = DataFrame(randn(100, 5), [:No, :Time, :dt, :Fx, :Fy])

@pipe data |>
    filter(:Time => t -> t>1, _) |> 
    select(_, [:Fx, :Fy]) |>
    mean.(eachcol(_))

or using DataFramesMeta:

using DataFramesMeta
data1 = @linq data |> where(:Time .> 100) |> select(:Fx, :Fy)
mean.(eachcol(data1))

Depending on your use case and taste, one of these might be clearer to read and/or easier to maintain.

1 Like

The way I was doing it, mean was being applied to the individual rows of data one by one, and that’s why something like @tomerarnon’s method was needed. But you’re right that the dims=1 I had in there was unnecessary

I didn’t know that’s what DataFrames are for! Thanks, this is probably a better way to manipulate my data. I also don’t really know how pipes work, so I’ll have to look into that…

1 Like