Here are things that I’ve done:
- Devectorized my code. I’ve even converted array multiplications using
.*to explicitly writing them out.
- Reduced temporary array creation in the middle of the loops.
I’m here to see how much more I can squeeze out of my code and have some questions.
Coming from Python, my code reads arrays in a row by row fashion i.e. indices to the right of an array loop faster than those on the right in for loops. Since Julia stores memory column-wise, I was considering changing my program. Would it suffice to merely transpose the arrays that I’m using, perform the computation and then transposing back or would you recommend re-writing the code to be tailored to column-wise computation?
My code estimates certain quantities by Markov sampling. Since each iteration is independent, I spread my jobs across several cores when I performed similar computations in Python. How would I go about doing the same here? I’m not sure the
@parallelflag does what I want.
- I’ve added
@inboundschecks but they don’t seem to have much of an effect. I hesitate using
@fastmathas I don’t want to lose precision.
I could go ahead and post my code her but it’s ~500 lines and I’m not sure it’ll be appropriate to do so.
If you have any other recommendations (such as blog posts that explain how to speed up Julia for dummies), they’re more than welcome.