Hello I’m new to Julia and
randn(N) returns a list of
N normal numbers. (By the way, you can quote code by putting it between backticks ```).
For the second part, have you seen the
count function in Julia?
count(p,xz) takes a list
xs of elements, applies a function
p to the elements whose result is
false, and then returns the number of
trues in the list. The function
p could be written something like
p = x -> (x > threshold), where
threshold is a variable. The comparison
x > threshold will return a boolean, which could be what you need.
I’m sorry I didn’t specify that an hint suggests to use an element-wise logical comparison with . (dot) to the logical operator. Any idea on how to do that?
You might want to look at section 5.4.1 of the Julia manual
It doesn’t say anything about how to compute the values above and below 1.96 in a standard normal distribution using the dot to access the array of random numbers
This has a dot in it:
v0.5.0> a = randn(1000); v0.5.0> length(a[a .< -1.96]) 24
But that’s not a good hint. You should do as @felix suggests:
More faster, more Julian.
I guess the answer is either
count(x -> (x > 1.96) || (x < -1.96), x) or
countnz((x .> 1.96) | (x .< -1.96)) if you like the dot notation. But as @felix and @DNF said, the first version is more Julian.
I am not sure whether this
creates an additional array.
As far as I can tell, it creates two extra arrays(!)
You can also do
sum(randn() > 1.96 for i in 1:N)
(on Julia 0.5 or later).
This uses a generator that does not actually ever create the array of random numbers, and so is more efficient for large
julia> f1(N) = count(x->x>1.96, randn(N)) f1 (generic function with 2 methods) julia> f2(N) = sum(randn() > 1.96 for i in 1:N) f2 (generic function with 2 methods) julia> f3(N) = sum(randn(N) .> 1.96) f3 (generic function with 1 method) julia> f4(N) = (a = randn(N); length(a[a .< -1.96])) f4 (generic function with 1 method) julia> @time f1(10^8) 1.742257 seconds (8 allocations: 762.940 MB, 5.10% gc time) 2499801 julia> @time f2(10^8) 0.879822 seconds (8 allocations: 256 bytes) 2502178 julia> @time f3(10^8) 1.301290 seconds (73.26 k allocations: 778.590 MB, 7.58% gc time) 2501985 julia> @time f4(10^8) 1.533186 seconds (73.26 k allocations: 797.675 MB, 1.40% gc time) 2501473
Note that, on my machine at least, the original suggestion (my
f4) is faster than
A reminder that for these kinds of performance tests, everything should be in a function, and timed only on the second run. Also, I should really be using BenchmarkTools.jl for this.
It’s even slightly faster to do
f4(N) = sum(i -> randn() > 1.96, 1:N)
on my machine.
Keep in mind that suggestion
f1 assumed that the array
a already existed.
Edit: still that is an extremely weird result.
True. Here is the original version:
g1(r) = count(x->x>1.96, r) g2(r) = length(r[r .> 1.96]) g3(r) = sum(r .> 1.96) g4(r) = sum(i > 1.96 for i in r) g5(r) = sum(x->x>1.96, r) function run_bench(N) r = randn(N) @time g1(r) @time g2(r) @time g3(r) @time g4(r) @time g5(r) end
0.003731 seconds (745 allocations: 361.594 KB)
0.001243 seconds (741 allocations: 164.734 KB)
0.000444 seconds (2 allocations: 32 bytes)
0.388841 seconds (73.25 k allocations: 34.730 MB)
0.195034 seconds (73.25 k allocations: 15.650 MB, 35.30% gc time)
0.078795 seconds (2 allocations: 32 bytes)
I have opened an issue to deprecate `count` in favour of `sum`: https://github.com/JuliaLang/julia/issues/20663
Yeah, there’s clearly something wrong with the implementation of
count. I do think that it is semantically different from
sum, and should not be deprecated.