Memory allocations in for loop variable

Hm. Well, I guess the takeaway is that there’s still more performance to gain then - the only question is how and where we might do that best. As far as I remember, --math-mode=fast does not go full aggressive optimization though, so maybe that’s a place to start. There might be some stuff (notes and comments, that is) in various issues on github, but I don’t remember which off the top of my head.

In any case, I think we’ve derailed the original thread quite a bit :sweat_smile:

If I am not mistaken, the - -fastmath=iee/fast serves only to tell if the places where @fastmath was used should really do fast-math or not. If you don’t use the macro, even with the fast option, it will still use iee arithmetic.

But I’m not entirely sure of that.

Nope, it overrides @fastmath:

--math-mode={ieee,fast} | Disallow or enable unsafe floating point optimizations (overrides @fastmath declaration)

https://docs.julialang.org/en/stable/manual/getting-started/

1 Like

I’m still not sure that means what I think it meant or not.
I’ll have to fidle with julia to test it.
My understanding of that sentence is that if you use the iee option julia will ignore @fastmath otherwise it will abide to it, not necessarily meaning it will use it everywere.

This is my understanding because I believe either iee or fast is the default. Or is it that the default is neither? --math-mode={iee/fast/macroonly}, where macroonly is the default?

The default is IEEE, except when annotated with @fastmath, which enables fast math in that part of the code.

When using --math-mode=ieee, @fastmath is ignored and everything is done according to the IEEE Floating Point standard.

When using --math-mode=fast, IEEE is ignored and it’s like @fastmath is put before everything you do, i.e. fastmath everywhere without the need for special annotation via @fastmath. That’s my understanding at least. Otherwise, why annotate with @fastmath in the first place when you don’t use any flags?

For a little more info, check here.

1 Like

That’s exactly also my understanding.

@fastmath is meant to be used when you don’t specify --math-mode=fast but still you want some portions of the code to follow --math-mode=fast while the remaining parts are kept as default, --math-mode=ieee.

Yep, that’s what I think too - it was more of a rhetorical question :slight_smile:

I don’t know much of assembly, but I’m still confused with this results :sweat_smile:

Using no flag:

> julia

julia> iee(x) = sqrt(x)
iee (generic function with 1 method)

julia> fast(x) = @fastmath sqrt(x)
fast (generic function with 1 method)

julia> @code_native iee(2.)
	.section	__TEXT,__text,regular,pure_instructions
; Function iee {
; Location: REPL[1]:1
; Function sqrt; {
; Location: math.jl:479
; Function <; {
; Location: REPL[1]:1
	pushl	%eax
	vxorps	%xmm1, %xmm1, %xmm1
	vucomisd	%xmm0, %xmm1
;}
	ja	L17
; Location: math.jl:480
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	popl	%eax
	retl
; Function sqrt; {
; Location: math.jl:479
L17:
	decl	%eax
	movl	$276700656, %eax        ## imm = 0x107E1DF0
	addl	%eax, (%eax)
	addb	%al, (%eax)
	decl	%eax
	movl	$260683632, %edi        ## imm = 0xF89B770
	addl	%eax, (%eax)
	addb	%al, (%eax)
	calll	*%eax
	nopw	(%eax,%eax)
;}}

julia> @code_native fast(2.)
	.section	__TEXT,__text,regular,pure_instructions
; Function fast {
; Location: REPL[2]:1
; Function sqrt_fast; {
; Location: REPL[2]:1
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

with the fast flag:

> julia --math-mode=fast

julia> iee(x) = sqrt(x)
iee (generic function with 1 method)

julia> fast(x) = @fastmath sqrt(x)
fast (generic function with 1 method)

julia> @code_native iee(2.)
	.section	__TEXT,__text,regular,pure_instructions
; Function iee {
; Location: REPL[1]:1
; Function sqrt; {
; Location: math.jl:479
; Function <; {
; Location: REPL[1]:1
	pushl	%eax
	vxorps	%xmm1, %xmm1, %xmm1
	vucomisd	%xmm0, %xmm1
;}
	ja	L17
; Location: math.jl:480
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	popl	%eax
	retl
; Function sqrt; {
; Location: math.jl:479
L17:
	decl	%eax
	movl	$291515888, %eax        ## imm = 0x11602DF0
	addl	%eax, (%eax)
	addb	%al, (%eax)
	decl	%eax
	movl	$52098928, %edi         ## imm = 0x31AF770
	addl	%eax, (%eax)
	addb	%al, (%eax)
	calll	*%eax
	nopw	(%eax,%eax)
;}}

julia> @code_native fast(2.)
	.section	__TEXT,__text,regular,pure_instructions
; Function fast {
; Location: REPL[2]:1
; Function sqrt_fast; {
; Location: REPL[2]:1
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

Finally, with iee flag:

> julia --math-mode=ieee

julia> iee(x) = sqrt(x)
iee (generic function with 1 method)

julia> fast(x) = @fastmath sqrt(x)
fast (generic function with 1 method)

julia> @code_native iee(2.)
	.section	__TEXT,__text,regular,pure_instructions
; Function iee {
; Location: REPL[1]:1
; Function sqrt; {
; Location: math.jl:479
; Function <; {
; Location: REPL[1]:1
	pushl	%eax
	vxorps	%xmm1, %xmm1, %xmm1
	vucomisd	%xmm0, %xmm1
;}
	ja	L17
; Location: math.jl:480
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	popl	%eax
	retl
; Function sqrt; {
; Location: math.jl:479
L17:
	decl	%eax
	movl	$511352304, %eax        ## imm = 0x1E7A9DF0
	addl	%eax, (%eax)
	addb	%al, (%eax)
	decl	%eax
	movl	$238929776, %edi        ## imm = 0xE3DC770
	addl	%eax, (%eax)
	addb	%al, (%eax)
	calll	*%eax
	nopw	(%eax,%eax)
;}}

julia> @code_native fast(2.)
	.section	__TEXT,__text,regular,pure_instructions
; Function fast {
; Location: REPL[2]:1
; Function sqrt_fast; {
; Location: REPL[2]:1
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

1 Like

Edit: An admin should probably split this derailed discussion to a new thread…

Ok, I guess that was a bad example (still odd, though). The following show that my understanding was indeed wrong and it is as you said:

No flag:

julia> iee(x,y,z) = x*y + z
iee (generic function with 1 method)

julia> fast(x,y,z) = @fastmath x*y + z
fast (generic function with 1 method)

julia> @code_native iee(2.,3.,1.)
	.section	__TEXT,__text,regular,pure_instructions
; Function iee {
; Location: REPL[1]:1
; Function *; {
; Location: REPL[1]:1
	vmulsd	%xmm1, %xmm0, %xmm0
;}
; Function +; {
; Location: float.jl:395
	vaddsd	%xmm2, %xmm0, %xmm0
;}
	retl
	nopl	(%eax)
;}

julia> @code_native fast(2.,3.,1.)
	.section	__TEXT,__text,regular,pure_instructions
; Function fast {
; Location: REPL[2]:1
; Function add_fast; {
; Location: REPL[2]:1
	vfmadd213sd	%xmm2, %xmm1, %xmm0
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

fast flag:

julia> iee(x,y,z) = x*y + z
iee (generic function with 1 method)

julia> fast(x,y,z) = @fastmath x*y + z
fast (generic function with 1 method)

julia> @code_native iee(2.,3.,1.)
	.section	__TEXT,__text,regular,pure_instructions
; Function iee {
; Location: REPL[1]:1
; Function +; {
; Location: REPL[1]:1
	vfmadd213sd	%xmm2, %xmm1, %xmm0
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

julia> @code_native fast(2.,3.,1.)
	.section	__TEXT,__text,regular,pure_instructions
; Function fast {
; Location: REPL[2]:1
; Function add_fast; {
; Location: REPL[2]:1
	vfmadd213sd	%xmm2, %xmm1, %xmm0
;}
	retl
	nopw	%cs:(%eax,%eax)
;}

ieee flag:

julia> iee(x,y,z) = x*y + z
iee (generic function with 1 method)

julia> fast(x,y,z) = @fastmath x*y + z
fast (generic function with 1 method)

julia> @code_native iee(2.,3.,1.)
	.section	__TEXT,__text,regular,pure_instructions
; Function iee {
; Location: REPL[1]:1
; Function *; {
; Location: REPL[1]:1
	vmulsd	%xmm1, %xmm0, %xmm0
;}
; Function +; {
; Location: float.jl:395
	vaddsd	%xmm2, %xmm0, %xmm0
;}
	retl
	nopl	(%eax)
;}

julia> @code_native fast(2.,3.,1.)
	.section	__TEXT,__text,regular,pure_instructions
; Function fast {
; Location: REPL[2]:1
; Function mul_fast; {
; Location: REPL[2]:1
	vmulsd	%xmm1, %xmm0, %xmm0
;}
; Function add_fast; {
; Location: fastmath.jl:161
	vaddsd	%xmm2, %xmm0, %xmm0
;}
	retl
	nopl	(%eax)
;}

1 Like