Custom image loader in Julia using MLUtils.jl

I am new to Julia and feel a little bit confused about how to implement custom image loader, first I have generated 10 sample images and saved it in the temp folder, then I have defined “ImageFolder” but really got stuck in understanding how to use numobs and getobs since documentation is not very comprehensible for me this is what I got:

using FileIO, ImageIO, ImageCore, MLUtils

struct ImageFolder
    files::Vector{String}
    function ImageFolder(dir::String)
        return new(readdir(dir; join = true))
    end
end


MLUtils.numobs(data::ImageFolder) = data.files
MLUtils.getobs(data::ImageFolder, idx::AbstractVector{<:Integer} = rand(3, 4, length(idx)))

dir = tempname()
mkpath(dir)

@info "Writing random images to $(dir)"

for i in 1:10
    save(joinpath(dir, "$i.jpeg"), colorview(RGB, rand(3, 10, 10)))
end

would appreciate any help, thank you in advance…

Tried to implement numobs and getobs so that MLUtils.getobs(data::ImageFolder, idx::AbstractVector{<:Integer}) should return an Array{Float32,4} of dimension d1xd2xd3xn where d1, d2 are the dimensions of the images and n is equal to length(idx).

For usage of a custom DataLoader with images saved on disk, the transfer learning tutorial provides an example: model-zoo/tutorials/transfer_learning at master · FluxML/model-zoo · GitHub

1 Like

Thank you for the answer, nevertheless I would like to use numobs and getobs but could not find any comprehensive tutorials, will see if I can come up with something

What’s the issue with the above referenced length and getindex approach?
These are aligned with the MLUtils DataLoader extension methodology and provides performance as good as it gets AFAIK.

1 Like

Note that the docs for getobs says:

getobs should only be implemented for types where there is a difference between getobs and Base.getindex (such as multi-dimensional arrays).

Same story for numobs. So the usage above is correct. You are of course free to override it instead, but you’ll miss out on the nice [...] indexing syntax and other features Julia provides for types with getindex/length defined.

1 Like

Hi all, yes I am aware of the getobs usage only where is a difference, the thing is for me it is required to use getobs and numobs, but anyway I will try to implement the code with both, thanks for the input:)

If you implement getobs and numobs like you would implement getindex and length, then you’re likely 90% of the way there. If you feel like the docs are missing some important detail on either, feel free to file an issue or PR.

Hi ATR, given a list of files, you can also use mapobs to turn those into a data container of images:

using MLUtils, FileIO

files = [file for file in readdir(DIR; recursive=true) if endswith(file, ".jpeg")]
images = MLUtils.mapobs(FileIO.load, files)

Then you can get individual images like this:

MLUtils.getobs(images, 1)

One more note: where you’re creating some test images, you’re creating arrays with size (3, 10, 10) like one would have in Python to represent a 2D image. In Julia however, images are usually represented as 2-dimensional arrays with an element type like RGB that includes a complete pixel value (e.g. 3 colors).

To convert such a 2D image to a 3D array with the color channels expanded, you can use Images.channelview:

using Images
imagetensors = MLUtils.mapobs(Images.channelview, images)
getobs(imagetensors, 1)
1 Like