Hi, I have extensive experience in Python and C++ and am trying out Julia. There’s a lot that I like from what I’ve seen so far, really nice language, packaging, documentation, introspection of generated code (cool!). I might even get used to the 1-based indexing.
Some basic questions:
Modules and packages in Python are fairly straightforward to use. You actually can’t get around them once you split your sources into multiple files, as module names match with source file names. With Julia there seems to be at least two ways to structure larger projects: includes and modules, where modules only have a loose correlation to files. But as modules seems to be the basis for precompilation I’d like to take advantage of them, but I’m searching for a good reference on common ways to structure larger projects in Julia. For example, the docs show an example of constructing a module by including several Julia files. Is that really the idiomatic way to do it? By the way, this relates mostly to structuring during development, not to deploying completed a package.
What’s the best way to have a struct member that is an editable fixed-size 1-dimensional array of simple values (e.g. bool), where the size of the array is only known at run-time (and of small size, say less than 100 items)? In C++ I would use a std::vector<bool> and it seems in Julia Vector{Bool} would be more-or-less the equivalent. But I see SVector and friends being used for performance reasons. I can easily create one of those at run-time as soon as the size is known, but I’m unclear as to specifying the field type in that case. E.g. n = <some length>; v = SVector{n,Bool}(fill(false,n)) works, but how then to add a field b of the correct (and specific) type to a struct S?
Why are certain text outputs being produced so slowly and in chunks? For example, I usually see ERROR: LoadError: being produced, followed by the exception message noticeably later, followed by the first part of the stack trace, etc. It doesn’t help with the perception of Julia being sluggish for interactive work, although it seems to have been noticed by the devs that something might be up (e.g. https://github.com/JuliaLang/julia/issues/36639).
Is there a formal grammar of Julia somewhere, as I can’ seem to find one. I occasionally get surprised by what is/isn’t accepted by the compiler and would like to understand why.
Regarding
„ I’m searching for a good reference on common ways to structure larger projects in Julia.“
I find it helpful to look at existing packages for guidance. I think that the authors/maintainers of heavily used packages generally follow best practices.
Well, I indeed looked at a few packages already, but got more questions out of it. For example, https://github.com/JuliaArrays/StaticArrays.jl contains a src dir with a bunch of Julia files and a top-level Project.toml file. But when using the package a using StaticArrays is enough to make the contents of the Julia files (e.g. SVector from SVector.jl) available but I don’t understand why. Are all Julia files in the src dir implicitly read when using the using statement?
Edit: Ah! There’s a src/StaticArrays.jl that declares the moduleStaticArrays and that imports and includes the other stuff. Didn’t notice that earlier.
This happens the first time you have an error because the JIT is compiling the relevant printing and stacktrace functions. But after that it should be faster.
The easiest way to organize your code during development is to turn your code into a package. Some information on creating packages can be found in the Pkg.jl documentation:
If you would have used std::vector<bool> then yes, Vector{Bool} is a good replacement (and it has the advantage that it’s not annoyingly broken).
If you want a fixed-size container, then StaticArrays is a good choice (check out MVector for a fixed-size but mutable vector), but the size of the array is part of its type (just like std::array in C++). It sounds like you don’t know the size of the array ahead of time, so this may not be useful in your particular case.
However, one of the biggest advantages of Julia over C++ is that you can allow a little bit of dynamic behavior at the top-level, while keeping all of your inner code fast and type-stable. For example, let’s say you have a type holding a fixed (but unknown) number of items, and you want to do a bunch of work on that type. You could write something like:
struct MyType{N} # the number of items is a property of the type
items::SVector{N, Bool}
end
function outer(n::Integer)
t = MyType{n}(zeros(Bool, n))
inner_loop(t)
end
function inner_loop(t::MyType{N}) where {N}
# Expensive code goes here
end
When you call outer(n), the value of n is not known to the compiler, so the {N} parameter in MyType is also unknown. But that’s totally fine. The compiler will do a little bit of dynamic work to construct the right MyType{N} at run-time, then dispatch to the specialized implemention of inner_loop() for that specific type. That means that within inner_loop(), where all the actual work happens, the value of N (and thus the number of items) is fully known by the compiler. The term for this in Julia is a “function barrier” and it’s a common technique for separating dynamic code (where the types aren’t known at compile-time) from the inner loop that does the actual work.
You can’t do this in C++ because there’s no compiler available at run-time to handle any new values of n (which result in new MyType{N} types).
Not really, it is always modules. But if you want to break up the code into multiple files, the common practice is to include code within the module body, which is equivalent to just putting it there but easier to organize. Almost all Julia packages have this structure.
Right, I re-read relevant parts of the manual and it starts to make more sense now. I figured you could use include to fully structure a complex set of files into a single top-level file you include (with all the related downsides), but modules and packages make a lot more sense. Also, the Code Loading chapter contained missing pieces of information, as the Modules chapter does not actually tell you how imported/used modules get resolved to actual files . Perhaps it would make sense to the link to the former chapter from the latter.
Maybe you are already aware of this, but in general, for people coming from Python, I think that it is useful to think in these terms:
Julia modules are more or less like Python modules, although with other mechanisms for importing functions etc. The most important difference in my opinion: in Julia you can explicitly define what will be automatically exported (i.e. available without qualifying the module) via using; this does not exist in Python.
In Julia include is roughly like exec in Python, but it takes a file name instead of a code string or object. But there is a tricky difference: The scope of the code executed by Python’s exec is (unless modified by optional arguments) the same scope where exec was called. In Julia, the scope of included code is the global scope of the module where it was called. This makes a difference if include is called inside a function, a loop or something else that introduces a local scope, because local variables won’t be accessible by the included code, and the variables created at the top level of that code will be left on the global scope.
Well, that was actually one thing I was wondering about, mostly for 2 things.
First, my frame of reference is mostly the OOP way (which I find quite natural), so it might not be the best fit for Julia, but consider writing a “class” (i.e. struct) together with a set of methods that operate on that class and making that available as a module. In C++ the class definition contains all the methods (bar friend functions), so wherever the class is available you’re guaranteed the methods are defined as well. But with Julia a struct definition and the methods that operate on that struct are decoupled, even when writing them in the same module. So it would seem that you manually need to export every method you want to make available as public interface for the class? There doesn’t seem to be a marker (like public:) in a file to denote that everything from that point on would need to be exported.
It’s also somewhat confusing that using Module does not give guarantees on what is actually available or how to use it, as it depends on what types and methods the package provides. Although this is the same in Python the fact that Python is an object-oriented language almost always guarantees a set of classes get imported (which are predictable in the way you use them, i.e. by calling methods or operators on them). With Julia there’s a bit more variation due to multiple dispatch, field access, macro calls, etc. Some examples of the different forms I think I’ve seen so far for Julia package equivalents of Python module I frequently use (with very limited experience with any of the Julia versions, btw):
With using DataFrames you get both a new struct type (DataFrame) and a set extra of methods that operate on them, e.g. push!(df, [...]). But also operators acting on DataFrame, like df[:, "A"]. So a DataFrame almost looks like the Python class-based equivalent, except for the methods not being called on the “class”.
With using Plots you get mostly extra methods, such as plot and plot!, that operate on regular Arrays, but no new package-specific data types, so there’s no OO feel to it.
With GitHub - JuliaWeb/HTTP.jl: HTTP for Julia after using HTTP you need to call methods hanging of the module in a sort of OOP-stylish way (but not really), such as
HTTP.listen() do http::HTTP.Stream
...
HTTP.setstatus(http, 404)
end
I’m not sure why the HTTP.<method> form was chosen, perhaps to not pollute the namespace where module is being used? There’s actually hardly any useful things exported here, so using HTTP might not make much sense and import HTTP would be just as good.
using HDF5
h5open("test.h5", "w") do file
g = g_create(file, "mygroup") # create a group
g["dset1"] = 3.2 # create a scalar dataset inside the group
attrs(g)["Description"] = "This group contains only a single dataset" # an attribute
end
I would say there is far more variation here than you would see in the Python world when it comes to the way these APIs are structured. Some of the things used above are a direct consequence of not having an OO-style method syntax obj.method(), e.g. HTTP.setstatus(http, ...) and attrs(g)["Description"].
Actually, if the equivalent of using Module in Julia is from module import * in Python then you can use the __all__ variable in the module file to limit what gets exported (see here)
Regarding the rest of your answer: yes, that’s right. Julia is not object-oriented, so if you create a type and specific methods for it in a module, they have to be exported separately. This is reasonable, though, since methods can dispatch on more than one specific type, you can also define methods for a type outside the module where the type was defined, etc. So it’s not as clear as in OO-programming when methods should be implicitly exported because a type is.
That can be seen as a disadvantage if you are coming from OO-programming, but has its own advantages, which I won’t discuss here because you may have already read about them, and I don’t want to convert this thread in yet another debate about “Julia is better/worse than Python or C++ because of this and that feature”. (There are already plenty of them.)
I actually see the benefits of multiple dispatch versus OO. It’s also something I just accept as being the basis of Julia (versus OO as the basis for Python). It’s just that certain things that are easy in an OO language become quirky in Julia, but I’m sure that’s also something to get used to.
you can generally (always?) get around the do syntax (comes down to preference and ‘automatic’ closing of stuff)
you can always import names/methods/symbols (if you do not want to prefix HTTP all the time)
see below
someFile=mktemp()[1]
open(someFile, "w") do io
write(io, "Hello world!")
end
someFile=mktemp()[1]
fio=open(someFile,"w")
write(fio,"miha")
close(fio)
#you can always import methods/symbols
using HTTP
import HTTP.listen
#now you can use listen without
listen
Note also that in many cases a large fraction of the methods you want to make available are actually extensions of functions defined in Base (or other modules), in which case you don’t need to export them (the caller is already using Base).
So, for example, if you define a new numeric type and define methods for Base.+, Base.sqrt, Base.show, Base./, etcetera, you don’t need to export these.
But yes, you need to explicitly export new generic functions foo (i.e., new function names, not just new methods of existing functions) that you define, unless you want callers to access them via MyModule.foo.
I’ll never understand the fuss about indexing, most of the time I just don’t have to care about indices: you can iterate over the elements of a container without explicitly referencing the indices, you can use stuff like eachindex, enumerate, CartesianIndices to abstract away the underlying indices, you can use the keywords begin/end or the functions firstindex/lastindex to get the first and the last indices of an iterator. There are very few cases where I need to think about what the index is, most of which 1-based indexing is perfectly fine (0-based indexing is great when you work with offsets, how often do you need them?).
Additionally, Julia has native support for arbitrary indexing, implemented in packages like OffsetArrays.jl (to set your favourite starting index), RandomBasedArrays.jl (arrays with a random starting index), StarWarsArrays.jl (arrays with indices following the order of Star Wars movies).
Then I’ll bite for the reply The point (for me) is more that it adds a form of indexing that is slightly different from what I’m used to working with over the past decades, meaning indexing that I would do correct subconsciously in C++/Python doesn’t translate 1:1 (pun intended) to Julia, where I need to be careful if I write the correct thing. Especially (say) when you’re doing debugging where you actually get confronted with the indices used in loops, or parsing a piece of text that is split into parts based on some delimiter and you want to test the first field to know what to do with the second and third. You’re right that there’s ways to not actually have to deal with indices explicitly, but in C++ I still find using iterators much less convenient that just indexing with an int in a for-loop. The iterator-style usually adds too much complexity for too little benefit for me, so then indices will be there explicitly.