Hey everyone! A while back when I was learning Python, I read the book “Think Python” which has a similar port on Julia. I read the port recently, and was wondering about the method explained in the book when it came to making functions.
Basically the strategy is that you write your analysis/program in a script manner, having a nice linear idea of what you’re doing. And then when you see repetitive code, or areas which can be generalized into a function you make a function out of them. Back with python (I don’t know if this was the best approach) I used to work with scripts which had a nice linear set of operations with functions in the main body, and functions at the top of the script.
And now in Julia I was pretty much doing the same but the functions are going into a module while the analysis runs from a script in a separate location. Today I was applying the same strategy (working script–>generalize to function) and I saw a piece of code and thought “hmm these lines seem a bit clunky, maybe I should also turn it into a function”:
# Getting all the different conditions and their counts so they can be nested (under a subject)
# by name. The count is so that we can preallocate the number of trials
uniqueConditions = unique(stackedConditions)
uniqueCounts = Dict([(condition,count(x->x==condition, stackedConditions))
for condition in uniqueConditions])
But if this was going to end up as a function, does it make sense to make subfunctions that will only be used in this one function? And if I do make a subfunction, do I add it to this function the same way as stated?:
That’s a matter of personal preference, but writing small kernel functions is a technique very well suited to Julia (because of parametric multiple dispatch, the way the compiler works, and unit testing when applicable).
I would say yes, it does make sense to extract out subfunctions when they do something well-defined and meaningful to you. And if you choose to do so, then I prefer to make them top-level functions in the module, rather than hiding them inside the definition of another function. Making your functions available in the module makes them easy to test independently, and it makes them easier to reuse later on. Even if a given subfunction is only used in one place right now, there’s a pretty good chance it will become useful somewhere else as your code grows. Having it defined and tested in an accessible place makes that reuse more likely to succeed.
Personally I have found that in general having functions that display fully on the screen helps reduce the chance of logic errors. Basically if I can see the whole function (and logic) at once I screw up the logic less. The exception for this is if the function has no (or very small) if blocks and loops, basically little to no logic then the function can be as long as I like and I won’t screw up the (lack of) logic…
This means if I have a large block of code in an “if” statement or a loop, I will often break it out into it’s own function. This helps in the reading of the main function and I just have to verify the logic in it, and verify the logic in the broken out function…separately. Unit tests help immensely here to prove to myself that the logic is sound in the sub functions and parent.
Compilers these days are good enough to optimize this model. So you should really write the code to be readable and error free (as you can), and let the compiler figure out how to make it run fast. Optimizations (if needed) can come later.
As for if these sub functions should be before or after the main function. Coming from a C background I always put them first, otherwise I needed a forward declaration. Lately (it takes a while to change), because compilers are getting better, or maybe it’s just computers are faster so compiler can do more of the grunt work, I’ve been putting them after the main function, since the “primary” reason for opening a file is usually to see that main function. In the end I think you should do what makes it easier for you to read the code.
Less time spent on debugging means producing more code faster, and that makes everyone happy. So do whatever helps you produce bug free code first.
I don’t think it matters either way in Julia. Compilation will happen when you first use the (outer) function. If the inner function is defined by that point, fine (this of course happens if they are included in the same module etc), if not (eg working in the REPL), it will error anyway and compile again if you define it.
Incidentally, it is worthwhile to learn about code navigation features of Julia and the IDE one is using (fast text/regex search in a package etc). Complex code rarely reads linearly.