I have found Julia’s comprehensions to be one of its most useful features, but I am running into a problem which I hope someone can help me out with.

Let’s say I have an array of size N x 10 in a variable named `data`, where N is very large. It looks like this:

No. Time dt Fx Fy
0 0 0.00454545 0 0
1 0.00454545 0.00454545 -0.889309 0.016332

Then, using the following one-liner, I can extract those values of `Fx` and `Fy` for which the time obeys certain conditions, e.g., that it be greater than 100:

`[data[i,4:5] for i in 1:length(data[:,1]) if data[i,2] > 100]`

However, this gives me an array of arrays instead of a n x 2 array:

``````[1.49301, 1.12155]
[1.49276, 1.12132]
[1.49267, 1.12127]
...
``````

The problem with this is that I would then like to average both `Fx` and `Fx` along the time dimension. And the formulation `mean(A,dims=1)` requires A to be a 2D array, not a 1D array of arrays. So what I’d like is to be able to do something like this:

`[mean(data[i,4:5],dims=1) for i in 1:length(data[:,1]) if data[i,2] > 100]`
and to then get a single 1 x 2 array contianing the averages.

I understand that I can easily do this sort of calculation by writing a few more lines of code. But Julia is so close to giving me a single, elegant line of code that will do this calculation in one go! Can anyone help me figure out how to make the output of a comprehension like this be a n x 2 array instead of an array of arrays?

It’s very possible there’s a better way to get what you want, but here is one way that also uses a generator, but not an array comprehension per se

``````
julia> using StatsBase

julia> a = rand(100, 2);

julia> mean(a[i, :] for i in axes(a, 1) if a[i, 2] > 0.5)
2-element Array{Float64,1}:
0.5259827860575703
0.7700552962146116
``````
1 Like

Works like a charm, thank you!

Shouldn’t a simple `mean(A)` work (without the `dims` argument)?

1 Like

You could also consider using a `DataFrame` for this task, which will let you refer to columns by name and chain operations together:

``````using DataFrames, Pipe
data = DataFrame(randn(100, 5), [:No, :Time, :dt, :Fx, :Fy])

@pipe data |>
filter(:Time => t -> t>1, _) |>
select(_, [:Fx, :Fy]) |>
mean.(eachcol(_))
``````

or using `DataFramesMeta`:

``````using DataFramesMeta
data1 = @linq data |> where(:Time .> 100) |> select(:Fx, :Fy)
mean.(eachcol(data1))
``````

Depending on your use case and taste, one of these might be clearer to read and/or easier to maintain.

1 Like

The way I was doing it, `mean` was being applied to the individual rows of `data` one by one, and that’s why something like @tomerarnon’s method was needed. But you’re right that the `dims=1` I had in there was unnecessary

I didn’t know that’s what `DataFrames` are for! Thanks, this is probably a better way to manipulate my data. I also don’t really know how pipes work, so I’ll have to look into that…

1 Like