Hello, I have these simple functions, one with @simd and one without: @noinline function msum(x) s=zero(eltype(x)) for i in 1:length(x) @inbounds s+=x[i] end s end @noinline function vsum(x) s=zero(eltype(x)) @simd for i in 1:length(x) …

It seems what is happening is that your msum IS automatically vectorizing for vector inputs, that is why you don’t see a speed-up compared to vsum, they are both vectorized. The timings you got also suggests that. EDIT: I misread the results, the timings does not suggest what I though.

That is incorrect, notice that the two examples are summing vectors of the same length (one masquerading as a nx1 array) and the you get a speedup factor of two on the array but not on the vector.

That does not change the results, I tried it.

I cannot reproduce (neither on master nor 0.6). julia> N=10_000;v=rand(N); vm=rand(N,1); v32=rand(Float32,N); v32m=rand(Float32,N,1); julia> begin println("F64 vec") @btime msum($v); @btime vsum($v); println("F64 mat") @btime msum($vm); @btime vsum($vm); …

On my system (0.7-alpha): F64 vec 9.114 Î¼s (0 allocations: 0 bytes) 9.120 Î¼s (0 allocations: 0 bytes) F64 mat 9.113 Î¼s (0 allocations: 0 bytes) 1.801 Î¼s (0 allocations: 0 bytes) F32vec 9.118 Î¼s (0 allocations: 0 bytes) 9.115 Î¼s (0 allocations: 0 bytes) F32mat 9.112 Î¼s (0 alloca…

[image] raminammour: That does not change the results, I tried it. Are you sure? @noinline function msum(x) s = zero(eltype(x)) for i in 1:length(x) @inbounds s += x[i] end return s end @noinline function vsum(x) s = zero(eltype(x)) @simd for i in 1:length(x)…

[image] DNF: Edit: How strange. Strange indeed! I get different results… Does anyone have a hint on how this can be system dependent?

The best way to check for vectorization is to look at the output of @code_llvm. This is also system dependent because you need to be compiling for an architecture that has the necessary instruction set. So for reference, here’s my versioninfo(): julia> versioninfo() Julia Version 0.6.2 Commit d38…

I see the vectorization instructions for vsum(Array) but no instructions for vsum(Vector). For reference: julia> @code_llvm vsum(rand(10)) ; Function vsum ; Location: REPL[32]:2 define double @julia_vsum_36461(%jl_value_t addrspace(10)* nonnull dereferenceable(40)) { top: ; Location: REPL[32]:3 ;…

Help understanding vectorization (or lack thereof)

General Usage Performance

dpsanders June 7, 2018, 8:00pm 2

You seem to be benchmarking in global scope. Use BenchmarkTooks.jl for these kinds of microbenchmarks.

CSV Reading (rewrite in C?)

Topic		Replies	Views
A fast sum. Any downsides? Performance sum	18	1177	December 16, 2024
LoopVectorization almost doubles execution time? Performance loopvectorization	6	724	July 9, 2021
A simple SIMD.jl loop that is slower than a vanilla `@inbounds @simd` Performance simd	8	2080	June 27, 2021
LoopVectorization: @turbo performs worse than @inbounds on trivial loop New to Julia question , simd , loopvectorization	9	2240	August 28, 2021
Summing matrix elements is >1000X slower than summing vector elements General Usage performance	8	1406	April 17, 2017

Help understanding vectorization (or lack thereof)

Related topics