Question about comprehensions

EmadMasroor · September 14, 2020, 4:31pm

I have found Julia’s comprehensions to be one of its most useful features, but I am running into a problem which I hope someone can help me out with.

Let’s say I have an array of size N x 10 in a variable named data, where N is very large. It looks like this:

No.	Time	dt	Fx	Fy
0	0	0.00454545	0	0
1	0.00454545	0.00454545	-0.889309	0.016332
…	…	…	…	…

Then, using the following one-liner, I can extract those values of Fx and Fy for which the time obeys certain conditions, e.g., that it be greater than 100:

[data[i,4:5] for i in 1:length(data[:,1]) if data[i,2] > 100]

However, this gives me an array of arrays instead of a n x 2 array:

[1.49301, 1.12155]
 [1.49276, 1.12132]
 [1.49267, 1.12127]
...

The problem with this is that I would then like to average both Fx and Fx along the time dimension. And the formulation mean(A,dims=1) requires A to be a 2D array, not a 1D array of arrays. So what I’d like is to be able to do something like this:

[mean(data[i,4:5],dims=1) for i in 1:length(data[:,1]) if data[i,2] > 100]
and to then get a single 1 x 2 array contianing the averages.

I understand that I can easily do this sort of calculation by writing a few more lines of code. But Julia is so close to giving me a single, elegant line of code that will do this calculation in one go! Can anyone help me figure out how to make the output of a comprehension like this be a n x 2 array instead of an array of arrays?

tomerarnon · September 14, 2020, 4:45pm

It’s very possible there’s a better way to get what you want, but here is one way that also uses a generator, but not an array comprehension per se


julia> using StatsBase

julia> a = rand(100, 2);

julia> mean(a[i, :] for i in axes(a, 1) if a[i, 2] > 0.5)
2-element Array{Float64,1}:
 0.5259827860575703
 0.7700552962146116

EmadMasroor · September 14, 2020, 4:48pm

Works like a charm, thank you!

lhnguyen-vn · September 14, 2020, 4:51pm

Shouldn’t a simple mean(A) work (without the dims argument)?

ElOceanografo · September 14, 2020, 5:07pm

You could also consider using a DataFrame for this task, which will let you refer to columns by name and chain operations together:

using DataFrames, Pipe
data = DataFrame(randn(100, 5), [:No, :Time, :dt, :Fx, :Fy])

@pipe data |>
    filter(:Time => t -> t>1, _) |> 
    select(_, [:Fx, :Fy]) |>
    mean.(eachcol(_))

or using DataFramesMeta:

using DataFramesMeta
data1 = @linq data |> where(:Time .> 100) |> select(:Fx, :Fy)
mean.(eachcol(data1))

Depending on your use case and taste, one of these might be clearer to read and/or easier to maintain.

EmadMasroor · September 14, 2020, 5:20pm

The way I was doing it, mean was being applied to the individual rows of data one by one, and that’s why something like @tomerarnon’s method was needed. But you’re right that the dims=1 I had in there was unnecessary

EmadMasroor · September 14, 2020, 5:22pm

I didn’t know that’s what DataFrames are for! Thanks, this is probably a better way to manipulate my data. I also don’t really know how pipes work, so I’ll have to look into that…

Topic		Replies	Views
Differences between `for i = 1:2, j = 1:3` and `for i = 1:2 for j = 1:3` in comprehensions General Usage question	1	570	August 20, 2018
Comprehensions and vectorization New to Julia	13	4923	September 20, 2017
Memory Comprehensions General Usage array , memory , arrays , comprehension	7	348	May 25, 2025
When Should One Use `(...)` or `[...]` for Array Comprehensions General Usage arrays	3	785	May 3, 2021
Documentation for comprehensions with if and/or w/o comma General Usage question	3	570	April 17, 2019

Question about comprehensions

Related topics