Custom iterators over custom types


#1

I am exploring the type system of Julia seriously for the first time and I got stuck with a practical problem I have in one of my packages. Suppose I have various grid types following an interface:

@doc doc"""
  A data point in an `N`-dimensional space represented by
  `N` scalars (a.k.a. coordinates) of type `T`, holding a
  value of type `V`.
""" ->
abstract AbstractDataPoint{N,T,V}

@doc doc"""
  An iterable collection of points of type `P`.
""" ->
abstract AbstractPointCollection{P}

@doc doc"""
  A grid of points of type `P`.
""" ->
abstract AbstractGrid{P}

@doc doc"""
  Valid representations of a spatial domain.
""" ->
AbstractDomain = Union{AbstractPointCollection,AbstractGrid}

@doc doc"""
  A path in a spatial domain of type `D`.
""" ->
abstract AbstractPath{D}

In this interface, I have various representations of a spatial domain AbstractDomain, either grids types or point collections. I would like to traverse such domains with different types of paths. For achieving this goal I tried to introduce the AbstractPath type.

Now consider one possible implementation of the interface:

immutable DataPoint{N} <: AbstractDataPoint{N,Real,Real}
  coords::NTuple{N,Real}
  value::Real
end
coords(p::DataPoint) = p.coords
value(p::DataPoint) = p.value

type PointCollection{N} <: AbstractPointCollection{DataPoint{N}}
  points::AbstractVector{DataPoint{N}}
end
npoints(pc::PointCollection) = length(pc.points)

type ImageGrid{N} <: AbstractGrid{DataPoint{N}}
  data::AbstractArray{Real,N}
  dx::Real
  dy::Real
  dz::Real

  ImageGrid(data) = ImageGrid(data,1.,1.,1.)
end
npoints(g::ImageGrid) = prod(size(g.data))

Finally, the question. Suppose there is a natural order for traversing spatial domains. For example, an image grid is naturally traversed in column-major order, a point collection is traversed in forward order, etc. I would like to define a DefaultPath for all AbstractDomain:

type DefaultPath{AbstractDomain} <: AbstractPath{AbstractDomain}
  # TODO
end

How would you approach this problem? Should I define a start, next and done for all new types, make them part of the interface specification and then call these methods inside of the DefaultPath? How to do that in a Julian way?

Besides the question, please feel free to give feedback on the interface, if there is anything that can be done better in terms of performance, etc.


#2

There’s one issue here for performance, you are using abstract types (i.e. Real) in your concrete types, which is a “real” performance killer. Also, does ImageGrid need to be a type instead of an immutable? (do you need to modify the dx/dy/dz values, or replace the data value (which is another case of having an abstract type in a concrete type).


#3

Thank you @ScottPJones I was thinking about this Real issue too, I will probably replace it by Float64 in the concrete types. Regarding the type versus immutable issue, I don’t plan to change the array to which the type points to, but the content of the array instead (data[i,j,k] = foo), can I use immutable safely in this case?


#4

Normally you would do parameterize the type, so that it can be both generic and concrete:

immutable DataPoint{N,T<:Real,S<:Real} <: AbstractDataPoint{N,T,S}
  coords::NTuple{N,T}
  value::S
end

If you have an array of DataPoint and only need data[...] = foo, then it being immutable is not a problem.


#5

Yes, Steven is correct, parameterization of types is your friend in Julia, will let you avoid the performance problems of run-time dispatching, and allow the structures themselves to be much more efficient as far as memory and avoiding double indirection to get to the values.


#6

Thank you @stevengj , in that case the type is concrete? Could you please elaborate? I am still trying to grasp the syntax for parametric types in Julia, could you please explain the meaning of the expressions below?

abstract Base{T}

# expression 1
type Foo{T} <: Base{T} end

# expression 2
type Foo{T<:Real} <: Base{T} end

# expression 3
type Foo <: Base{Real} end

What are the implications of using each?


#7
  1. T is able to take any type it got constructed from.
  2. T restrict to subtypes of Real
  3. Base{Real} will alway be Base{Real} independent of what Foo is doing.
    More about parametric types:
    http://docs.julialang.org/en/release-0.5/manual/types/#man-parametric-types

#8

Thank you @sdanisch, that is very clear.


#9

@stevengj how can the type be concrete? Also, in my case I want to have DataPoint depending only on the dimension N. I want to fix T and V to be Float64 for example.


#10

Does anyone have a suggestion for the implementation of DefaultPath? What I have in mind is something like:

immutable DefaultPath{D<:AbstractDomain} <: AbstractPath{D} # is this the best way to write it?
  domain::D # pointer to domain object?
end
function start{D<:AbstractDomain}(p::DefaultPath{D}) # do we need this boilerplate?
  if isa(p.domain, PointCollection) # any cleaner way of doing this?
    return start(p.domain.points)
  elseif isa(p.domain, ImageGrid)
    return start(p.domain.data)
  else
    error("not implemented")
  end
end

As you can see, this code isn’t very elegant and I actually don’t know if it works, just brainstorming and learning the syntax.


#11

You can do

start{D <: PointCollection}(p::DefaultPath{D}) = start(p.domain.points)
start{D <: ImageGrid}(p::DefaultPath{D})       = start(p.domain.data)
start(p::DefaultPath)                          = error("not implemented")

instead of type checking inside a function. I would argue this is typically more “Julian” way of programming.


#12

If you declare e.g. immutable Foo{T<:Real}; x::T; end, then it effectively defines an abstract type Foo and a family of concrete types Foo{T} for different types T. (Foo{Int}, Foo{Float64}, etcetera).

If, in turn, you use a concrete type T, e.g. Foo{Int}, then the resulting Foo{Int} type can have all of the benefits of a type where you just declared the fields explicitly to be Int (i.e. it is a “pointer-free” type and x is stored “inline” in the structure rather than as a pointer to some “boxed” heap object, and Array{Foo{Int}} is as efficient as Array{Int}). That’s why parameterized types allow you to write type-generic data-structures that still have the efficiency of fully concrete types.


#13

Also, you would normally not bother with this method; just omit it, and it will throw a MethodError if you try to call it.


#14

Thank you @kristoffer.carlsson, what do you think of doing:

start(p::DefaultPath{PointCollection}) = start(p.domain.points)
start(p::DefaultPath{ImageGrid}) = start(p.domain.data)
start(p::DefaultPath) = error("not implemented")

Is there an advantage in restricting D <: PointCollection or D <: ImageGrid?


#15

It depends if ImageGrid and PointCollection are abstract types or not. If they are concrete type, then doing your way works just well. If it is abstract (or parameterized) then this will happen:

julia> immutable A{T}
       x::T
       end

julia> f(x::Vector{A}) = print("a")
f (generic function with 1 method)

julia> f([A(1), A(2)])
ERROR: MethodError: no method matching f(::Array{A{Int64},1})
Closest candidates are:
  f(::Array{A,1}) at REPL[19]:1

julia> f{T <: A}(x::Vector{T}) = print("b")
f (generic function with 2 methods)

julia> f([A(1), A(2)])
b

#16

@kristoffer.carlsson that is tricky, why the first attempt fails? Shouldn’t it work just fine?


#17

See the section about “invariant types” in the Julia manual. This is very important to understand when working with parametric types.


#18

@stevengj I understand the “invariant types” property of Julia, but I don’t understand how defining a method for the abstract type A is not enough for catching the case with A{Int64}. I understood that A{T} <: A always.


#19

Yes but Vector{A} is not <: Vector{A{Int64}}


#20

you mean the other way around for this particular situation (?)