What are the uses of some of the numerous "types" in Julia e.g., structs, modules?

As a non-computer-scientist, I don’t understand when I’d use e.g., struct or module (which seem to be fundamental in the essentials) beyond “properness” or if designing a package? I find that the examples presented are so simple as to not be particularly informative as to why you would use them.

I seem to get by using just functions, arrays, and tuples but looking at the base documentation makes me wonder whether I’m working suboptimally or untidily?

From conversations with others I gather that there comes a point when you may be supplying very numerous or complicated arguments and defining a struct is a better/safer way to manage things, but I don’t have a clear idea of what such a situation would look like or where the “tipping point” would be where this would become more optimal.

4 Likes

This isn’t specific to Julia but applies to all languages that have object-orientation or user-defined data structures. There are two typical uses for structs/objects: (1) abstraction, and (2) encapsulation.

Abstraction is for when you have a domain-specific “object” that you want to reason about or operate on. Since you probably have a math background, consider that mathematics behaves similarly. You might say that numbers are the “built-in” data types of mathematics. Not using structs in Julia is the equivalent of doing “just algebra” in math. But then you start defining abstractions once you move to a more specific domain, e.g., geometry. You then ask questions like “At what point do two lines cross?” after you’ve defined “line” and “point” as something you want to reason about. As such, the Point struct in the documentation is not just a trivial example. If you’re writing geometry-related code, you actually want to define such structs to simplify your code.

Another aspect of abstractions is that they are often the key to generalization: consider that you could define the abstract behavior of points (via methods) which would then still apply if you wanted to formulate a Point3D or PointSphericalCoordinates struct later on.

The second reason to use structs is for the encapsulation and management of state. Consider what happens internally when you use DifferentialEquations to solve the ODE that you set up with ModelingToolkit:

integrator = init(prob, alg; kwargs...)

The prob here again is an example of “abstraction”, but the integrator is an object that holds the state of the solver between steps (the state vector, the current time, etc.)

Another example for “state management” is the situation where you find your function has 20 input parameters (which are then maybe passed to another function). To simplify the code, you can wrap all of these parameters into a struct to be able to pass it around more easily.

State management is a much trickier use of structs than abstraction, simply because management of state is basically the problem of programming (even more so once you get to questions of concurrency). Managing state is hard, and virtually all non-trivial bugs stem from the mismanagement of state, which is probably why the example in the documentation deals with the easier “abstraction” use case.

An important part of encapsulation is that it can hide complexity by having private fields. For example, the integrator only documents “useful” fields. This is especially true in Julia, where it is quite common for the fields of structs to be considered completely private, maybe accessible as properties or more commonly via accessor methods. In many cases, interfaces are defined purely in terms of methods.

Generally, “abstraction” structs should be immutable, while “state management” structs are mutable (or at least have mutable components), so Julia actually makes the distinction somewhat more clear than other languages by having the mutable keyword.

18 Likes

I’m not sure how you use or want to use Julia, but if you get by with tuples, functions and arrays … there isn’t anything necessarily wrong with what you are doing - but it may be possible to express your ideas more abstractly (and more maintainable) with structures.

Before I made efforts to make my code idiomatic in the past, I found that if I came back to it later - I couldn’t read it anymore.

EDIT: link is to ChatGPT-generated definition of idomatic on Quora.

9 Likes

It’s perfectly ok if you use only functions, arrays and tuples. Maybe the structure of the data and routines that you use is sufficiently simple and that’s all you need.

That’s right. And an additional advantage of structs is that they can be used to define particular methods of any function for those types. (See a gentle explanation by Emma Boudreau about this.)

1 Like

I forgot that you also mentioned the modules. They have different uses, the most remarkable being that they are the basis of packages. But they can also be used for something as simple as to create separate namespaces for functions, variables, etc.

2 Likes

Hopefully I can provide a simple cliff notes version with no links or much theory.

Modules are not types. A module creates an independent global scope that encapsulates names, aka namespace. Use imports to allow modules to share select names with each other.

Structs are not a Julia thing, they’re in every language I know of. People already mentioned Julia-specific details, so I’ll stick to the universal basics. A struct is actually very close to a tuple; they both package multiple fields of data (or pointers to data) into a fixed size of memory. The differences are (1) the struct has a name, and (2) the fields have names. Besides names being important for description and subtyping, it can be necessary to distinguish two types with identical internal structure that share functions. For a simple example, let’s say I want to implement a plotting library, nothing fancy, just 2D Cartesian and polar plots with Float64 values. But both kinds of coordinates have 2 points, so if I go with tuples I must use the function name to distinguish them:

function polarplot(coordinates::Vector{Tuple{Float64, Float64}}) ... end
function Cartesianplot(coordinates::Vector{Tuple{Float64, Float64}}) ... end

But wouldn’t it be nice if that information was put in the Vector itself and I could just plot either without explicit checks of whether it’s polar or Cartesian? It’d be necessary if I receive a Vector with no external indication of which. Well with structs we can handle this:

struct Cartesian
  x::Float64
  y::Float64
end

struct Polar
  r::Float64
  p::Float64
end

function plot(coordinates::Vector{Cartesian}) ... end
function plot(coordinates::Vector{Polar}) ... end

If you don’t ever need to distinguish structurally identical types like this, you could probably pull off everything with tuples, but even then, you may appreciate the clarity that names give here. You can also name fields without naming the overall type with NamedTuples, the (2) without the (1).

5 Likes

Maybe you gathered this from the other comments but: don’t worry about modules. If you don’t know what they do you don’t need them.

Structs on the other hand are really useful, it’s probably worth familiarizing yourself. It’s like class in oop languages, it defines a Type.

2 Likes

If you don’t develope packages of your own, that’s sufficient for most purposes. I’d only add NamedTuple into the mix.

That can be one of the uses for NamedTuple

2 Likes

Blown away by all the help and advice given here, thanks everyone! I’ll try and digest this all

2 Likes

Another fundamental utility of structs in Julia is to extend methods, for example:

module MyModule
     export MyData, plot
     import Plots: plot
     struct MyData
        x::Vector{Float64}
        y::Vector{Float64}
     end
     plot(data::MyData) = plot(data.x, data.y, title="MyData Plot")
end

And then, you can do:

julia> module MyModule
            export MyData, plot
            import Plots: plot
            struct MyData
               x::Vector{Float64}
               y::Vector{Float64}
            end
            plot(data::MyData) = plot(data.x, data.y, title="MyData Plot")
       end
Main.MyModule

julia> using .MyModule

julia> data = MyData(rand(10),rand(10));

julia> plot(data)

and get the customized plot for your data.

(I added a module here to help understanding what modules can be useful for, organization-wise)

4 Likes

I agree with this.

And partly with this too, but I wouldn’t put those two suggestions together. For people who don’t need the advantages of modules, and feel more comfortable leaving aside the complexities of working with them, the situation will probably be the same with respect to structs/types, and they can use named tuples instead, as @Eben60 suggests.

It should be taken into account that modules and types can bring some headaches when you are working on the code, renaming or rewriting parts of it. And if that changing code includes type definitions, the best way of working around those complications is to use modules to encapsulate them!

I, too, am no computer scientist. Just a lowly data scientist with a degree in philosophy. That said, I’ve found structs and types very useful. Just a small example that I hit recently:

I work with health care data where National Provider Identifiers (NPI) come up a lot. NPIs are 10 digits and always start with a ‘1’. But depending on the data source I’m reading, it might come to me as an integer, float, or string. There might be missing values. Ah, the wonders of data cleaning.

Also, strings and integers are bad representations of NPI because they allow all sorts of silly operations that don’t make sense for an ID. What does npi * 10 mean?

But, here is where types come in:

struct Npi
    x::Int64
    function Npi(x::Int64)
        @assert 999_999_999 < x < 2_000_000_000
        return new(x)
    end
end
Npi(x::Float64) = Npi(convert(Int64, x))
function Npi(x::AbstractString)
    digits_only = replace(x, r"[^\d]*"=> "")
    int_npi = parse(Int64, digits_only)
    return Npi(int_npi)
end

Now, I can take the NPI column of any dataframe and do df[!, :npi] = Npi.(df[:, :npi]) to convert it. I know that any time I interact with a column of type Npi, I have some guarantees about its data quality.

Furthermore, I can no longer do silly things like add two NPIs together. I only have the functions available that I explicitly define. This means that at the level of the type system, I am protected from careless errors that can turn into really hairy bugs.

11 Likes

That‘s a great and simple example of how the right abstractions make your code less error-prone.

2 Likes

To try to add to the simple examples: recently I converted someone’s code in another language to Julia. The hardest part was that while that language does support its own version of structs, a lot of the code you see in the wild seems to try encode “meaning by position” within an array. For example, when reading or modifying the code you just have to know that experiment[1] is the date on which an experiment was performed, experiment[2] was the animal ID, etc. The code becomes much easier to read if you can instead can use experiment.date and experiment.animal to extract specific bits of information. It’s also easier to keep working if you at some point decide you want to change the data stored in an experiment: you don’t have the horrific task of changing all relevant 2s in your code to a 3 and so on.

8 Likes