Example C vs Julia vs Python

Wait, how did you get your original code to run? I get an error when I try to run it because the arrays that aren’t declared as constant are out of scope of the functions that try to use them.

Since they are not declared inside a function are they not inherently in the global scope?

That’s probably good enough, but there are some other style guides out there that people like, such as GitHub - invenia/BlueStyle: A Julia style guide that lives in a blue world. I’m not much one for “style guides”, but they’re at least helpful in the beginning before you absorb what the rest of the community is doing.

Generally, I would strongly recommend against all these global variables, and writing your loops either in the form

for pp = 1:SizeZ-1, nn = 1:SizeY-1, mm = 1:SizeX
    # ... 
end

or

for pp = 1:SizeZ-1
    for nn = 1:SizeY-1
        for mm = 1:SizeX
            # ...
        end
    end
end

I’d also never write a for loop in the global scope.

Are you sure you’re running the revised code? I got the same timings you got on the original code and then my edits brought that down to 0.54 seconds. I suspect you might be accidentally running the old code.

I ran the code on a macbook pro 13 inch 2016 with a 2.3 GHz i5 processor and 8GB of 2133 MHz LPDDR3 ram.

2 Likes

Ah, good call. It wasn’t the original code, but it was a block of code that had another issue. Fixed that and I now get the same as you. Thanks. Comparable machines. Is there a way to globally block bound checking without having to do individual @inbound macros on each of the outer for-loops?

You can always startup julia like julia --check-bounds=no to globally turn off bounds checking. I usually prefer to do things like

@inbounds begin
    for pp = 1:SizeZ-1, nn = 1:SizeY-1, mm = 1:SizeX
        Hx[mm,nn,pp] = (Chxh[mm,nn,pp] * Hx[mm,nn,pp] + 
                        Chxe[mm,nn,pp] * ((Ey[mm,nn,pp+1] - Ey[mm,nn,pp]) - (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])))
    end

    for pp = 1:SizeZ-1, nn = 1:SizeY, mm = 1:SizeX-1
        Hy[mm,nn,pp] = (Chyh[mm,nn,pp] * Hy[mm,nn,pp] + 
                        Chye[mm,nn,pp] * ((Ez[mm+1,nn,pp] - Ez[mm,nn,pp]) - (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])))
    end
    
    for pp = 1:SizeZ, nn = 1:SizeY-1, mm = 1:SizeX-1
        Hz[mm,nn,pp] = (Chzh[mm,nn,pp] * Hz[mm,nn,pp] + 
                        Chze[mm,nn,pp] * ((Ex[mm,nn+1,pp] - Ex[mm,nn,pp]) - (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])))
    end
end

if I’m going to be turning off bounds checks on a big chunk of code.

6 Likes

Summary thus far:
Thanks all for the input. I did some experimentation with the various suggestions and my results are posted below. Further input is welcome. However, this progress got me more firmly in the Julia camp. I was beginning to doubt the “runs like C” bit and was starting to develop a C program based on my Julia prototype for another project - glad I can give up on that approach. I suspect my Python code in the original comparison is probably in need of optimization as well, but no time for that.

FDTD Code CPU Time (seconds)								
	    Original	Case A	Case B	Case C	Case D	Case E	Case F	Case G
Compile	    43.6	  33.9	  24.0	   3.7	   2.4	  1.31	  1.20	 0.828
Run 1	    40.9	  33.2	  21.8	   2.9	   2.1	  1.11	  1.09	 0.609
Run 2	    41.7	  32.9	  23.3	   2.8	   2.2	  1.14	  1.11	 0.595
Run 3	    42.6	  32.3	  22.3	   2.5	   2.0	  1.13	  1.08	 0.657
Run 4	    40.1	  32.2	  23.7	   2.8	   2.1	  1.16	  1.11	 0.609
Run 5	    40.5	  33.1	  22.7	   2.9	   2.1	  1.14	  1.12	 0.625
Avg (1-5)	41.2	  32.7	  22.8	   2.8	   2.1	  1.14	  1.10	 0.620
Ratio to C 	96.6	  76.9	  53.4	   6.5	   4.9	  2.7	  2.6	 1.5

  • Original: Written by an EE relatively new to Julia.
  • Case A: =Nested for-loops reordered to process array columns before rows.
  • Case B: =Case A plus define global constants with “const”
  • Case C: =Case B plus move Ce…, Ch… global fill terms within their respective functions.
  • Case D: =Case C plus wrap entire block (other than ‘using…’) in ‘let…end’. Removed ‘const’ from global constants. Ce…, Ch… terms still in respective functions.
  • Case E: =Case A plus ‘let…end’ wrapping all but the ‘using…’ line. Ce… And Ch… terms are not in their respective functions.
  • Case F: =Case E plus ‘@inbound begin…end’ wrapping all but ‘using…’.
  • Case G: =Case E plus ‘@ inbound’ individually on for-loops used to operate on 3D arrays.

The full code block in the first post corresponds to Case B
The full code block in the post by Mason corresponds to Case G

17 Likes

I tried wrapping the whole code block (minus the package import) in ‘@inbounds begin…end’ and didn’t see the same improvement as was evident with application of the macro to individual for-loops. See Case E, F, and G in the summary post.

I’m not sure if the @inbounds macro handles top-level blocks like that. It may need to operate on a function body.

4 Likes

So the pattern I usually use when writing stuff like this is somewhat like this.

function user_facing_function(args...)
    checkbounds(ags...)
    return _nobody_call_this_function_but_me(args...)
end
@inbounds function _nobody_call_this_function_but_me(args...)
   ...
end

But I’m I’m sure there are other (and better) ways of doing this.

3 Likes

Have you tried with the @avx macro? [ANN] LoopVectorization - #37 by Elrod

@inbounds doesn’t work an a function definition. It needs to be inside the function as

function  _nobody_call_this_function_but_me(args...)
    @inbounds begin
        ...
    end
end

What about this?

Sorry, I meant in a function body.

Would it be relatively feasible (and relatively easy) to have a macro that goes through and does standard tricks? e.g.

@makefaster function f(x)
  #   ... macro adds inbounds to all loops
  #  ... also does standard passes that are weakly better
end

And it could potentially have some extra arguments for various passes. For example

@makefaster function f(x), settings = (inbounds=true, avx=true, views=true)
  #   ... macro adds inbounds to all loops
  #  ... also does standard passes that are weakly better
end

Or whatever, which would add in @avx macros, @view, etc. everywhere. Of course, these things are not always faster so being able to toggle is worth the trouble.

This sort of thing would not be a replacement for doing things right, but might be a good way to tell people with minimal julia experience how to get started - and a decent heuristic? Can’t fix issues such as bad type inference problems, but it could add in the mindless annotations.

As discussed with the global scoping, it seems like a way to declare a “script” that tells the compiler it can compile the whole file as a big let would alleviate this common performance issue. Then the heuristic solution to the problem could be to tell people when running as a .jl file to flag it as a script… Remembering to write scripts without globals is pretty tough for people coming from scripting languages.

2 Likes

If such a thing could exist trivially/those optimizations were always doable, don’t you think they would be done already and such a macro wouldn’t be necessary?

The “ugly” truth of @inbounds and friends is that they are not trivial and always allowed - in fact, the docs even discourage it since you’re basically telling the compiler “don’t check for safety here, I know better than you”. The toggle you’re asking for is here, in the form of asking for the dragons and not in the form of making them vanish.

3 Likes

Exactly, the name of such a macro might as well be @makecrashy.

3 Likes

Even in it’s present form, macros like @inbounds are convenient for the inexperienced like myself as they provide a quick way to get some performance benefits from scrappy code. I suppose the trick is to avoid the temptation of letting that be a crutch rather than learning the proper approach.

You might have been joking, but I think that is a great name for that kind of macro. Makes people think twice before trying it. The old @fastmath gave the wrong message entirely. Or @makeunsafe.

No, I wouldn’t think that at all. The compiler could never know when to take off the safety wheels on its own. Those sorts of decisions can never be done automatically.

In effect, though, people are comparing code with safety wheels in Julia with code without it other languages (either directly in C, or sometimes using unsafe C behind a Python interface) … We tell people to look at the performance guide but scripters have trouble doing it for some reason.

So is it really that bad to have a macro called @makecrashy which we can tell people to use to get a sense of whether the safety wheels are slowing them down? Potentially trying a few toggles to see what helps? The end users are lower skilled programmers people with big scripts (rarely organized in functions) , where they are unlikely to be able to scan the code looking for appropriate function and vector annotations.

2 Likes

There already are flags to turn off various things globally, --check-bounds for example.

However, lower skilled programmers don’t need these macros. Their performance problems will come from type instabilities, unnecessary allocations, bad usage of CPU cache, suboptimal algorithms etc. When they are at the point where they write code where @inbounds would matter (basically only when a bounds check would prevent SIMD) they are at the level where they can be properly taught about these macros.

There is no programmer level where a @random_unsafe_operations is valuable. If anything, such a macro would give a wrong impression on how to program.

5 Likes

I fail to see why this couldn’t be done in a cases where you’re iterating over the indexes of an collection that is not modified at all during the loop, e.g.

function compileme(x)
    q = 0
    for i in eachindex(x)
        q += x[i]
    end
end

compileme( (1,2,3,4) )

Since x is never reassigned in the loop and has fixed length, the compiler should know that x[i] will always be inbounds. I’m pretty sure it already does the optimization