I need to implement a struct that represents a vector of vectors (a.k.a. a ragged/jagged array) in compressed form:
struct Foo{T}
data::Vector{T}
ptrs::Vector{Int}
end
where the field data contains the flatted values of the jagged array and ptrs contains the start position in the data vector for each of the sub-vectors. Just to clarify, the vector of vectors v = [[1,2],[2,5,1],[7]] would be represented as f = Foo([1,2,2,5,1,7],[1,3,6,7])
My question is this one: is a good style to direct access the fields of Foo in the user code, namely writing f.data and f.ptrs ?
or is it better to define the getter functions data(f::Foo) = f.data, and ptrs(f::Foo) = f.ptrs and use them instead?
The problem with the latter options is that it is likely that data is already defined in some package leading to name collisions. I could use longer names like get_foo_data but I don’t find it elegant.
Here I am missing the name space provided by classes in object oriented languages. What is the way to go in julia in this context?
Julia lets you define accessors later, e.g. if you rename the fields, you can still provide old names using getproperty():
struct Foo{T}
new_data::Vector{T}
new_ptrs::Vector{Int}
end
function Base.getproperty(foo::Foo, p::Symbol)
if p == :data
return getfield(foo, :new_data) # getfield() reads fields directly and it cannot be overloaded
elseif p == :ptrs
return getfield(foo, :new_ptrs)
else
return getfield(foo, p)
end
end
f = Foo([1,2,2,5,1,7],[1,3,6,7])
println(f.data)
println(f.ptrs)
Note that Julia compiler is smart enough to still translate simple getproperty() calls into direct field pointers:
While true, I think defining functions is more idiomatic. The upside being you can alter the underlying representation without a need to change the API.
At the same time, I don’t know that defining data() as a function is a good idea… Seems like way too generic a name. I suppose as long as you’re sure you’re only defining it on your own types it’s ok.
There have been some discussions about this here on discourse. See, for example, Mutable struct vs closure, especially the part of the discussion starting with the following post:
I’m under the impression that there is a weak consensus that defining accessor functions in the public API of your module is rather better style. But there is also a lot of code (including some parts of the standard library) which exposes internal fields in the user documentation, without it causing much trouble to anybody (and indeed, clever uses of getproperty/setproperty! make this rather future-proof as demonstrated above). So I don’t think there is (yet) a strong and universally shared opinion on this.
If it were me, I think I would go for accessor methods. Or even better, since the type in question is not POD, I would perhaps not worry much about accessing the internal fields of the structure (data and ptrs), and rather try to expose a meaningful API, which allows manipulating this structure in the most relevant way: maybe define getindex/setindex! methods for it to be indexable like an array, maybe in conjunction with eachindex in order to get a list of valid indices. Or an iterate method to iterate over the values in it. In short, with the correct API, maybe the client code does not need to access the data field (or even know it exists!)
Before getproperty & friends, I would have said that accessor functions are the cleanest, but now I think that it is OK to explore an API with property accessors (x.something instead of getsomething(x)). That said, property names live in a single namespace (equivalent to Symbol), so one does not need to export/import them etc. This has advantages and disadvantages, and should be taken into account in the API design.
Whatever the choice is, the key part is documenting the API very clearly, at least in the docstring of the type or an abstract type, so that the user should have not wonder whether property accessors are supported, or just happen to work and may break.