Efficiently Loading and Processing Large Number of Images

Hello, I’m very new to Julia and as such am still familiarizing myself with its quirks. I’m working with a dataset from Kaggle containing 40,000 images that I would like to load in and process. The data can be found here: Surface Crack Detection | Kaggle

However, I’m unsure how to perform this task efficiently. The below code accomplishes two things:

  1. Finds the path to all image files. (Not very interesting, but included for reproducibility.)
using Images
using ImageIO

#Assumes current directory contains both image-containing folders.
base_path = pwd()

#Finds all file names in given path
file_pos = readdir(base_path*"\\Positive\\")
file_neg = readdir(base_path*"\\Negative\\")

#Joins the paths with the file names
path_pos = base_path*"\\Positive\\".*file_pos
path_neg = base_path*"\\Negative\\".*file_neg

img_paths = [path_pos; path_neg]

and 2. Loads in each image and performs some very basic operations on it.

#Loads an image from a given path and performs some basic transformations to it.
function process_image(path)
    img = load(path)
    img = Gray.(img)
    img = imresize(img,(80,80))
    img = vec(img)
    img = convert(Array{Float64,1},img)
    return img
end

#Processes all images
processed_imgs =  process_image.(img_paths)

The below statistics were produced by a second run of the @time and @code_warntype commands.

394.428738 seconds (4.28 M allocations: 23.492 GiB, 5.85% gc time)
Variables
  #self#::Core.Const(var"##dotfunction#257#7"())
  x1::Vector{String}

Body::AbstractVector{var"#s831"} where var"#s831"
1 ─ %1 = Base.broadcasted(Main.process_image, x1)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(process_image), Tuple{Vector{String}}}
│   %2 = Base.materialize(%1)::AbstractVector{var"#s831"} where var"#s831"

I’m not quite sure how to interpret this other than that there’s some types somewhere in the function that Julia had trouble discerning that are likely at least partially responsible for the slow run time.

The process_image function currently takes around 6 minutes to work through the 40,000 images and I’m certain the function could be completely rewritten, but I’m not sure what a “Julian” way to go about that is.

I tried chaining with |> in process_image but can’t seem to figure out how to chain when the current result is not the left-most argument of the next function. In R you would be able to write something mid-chain like
convert(Array{Float64,1},.) (assuming there was an equivalent convert function)
where the . tells the chain to use the value of the chained object as the second variable. Obviously . is much more important to Julia’s ecosystem than R’s, so I wouldn’t expect the syntax to be identical, but I’m wondering if there’s an alternative way to accomplish this.

I’m also assuming that the CUDA library could also be used to speed things up? I tried (admittedly not very hard) to put the loaded images on my GPU with img = load(path) |> device(), but that threw a "objects of type CuDevice are not callable" error. Any advice on how this could be done?

readdir(base_path*"\\Positive\\")

A little off topic, this is fine if everyone running the code is only on Windows, but joinpath is a cross platform function for joining paths.

julia> joinpath(pwd(), "Positive")
"/home/chriselrod/.julia/dev/Positive"

julia> joinpath(pwd(), "Octavian", "test")
"/home/chriselrod/.julia/dev/Octavian/test"

I’d also suggest

@code_warntype process_image(img_paths[1])

to look into process_image and see where the problem is, or using Cthulhu.jl to be able to dig deeper as needed as you search for the type instability.
The problem may be with load itself if the contents of the file can determine the result type.

I tried chaining with |> in process_image but can’t seem to figure out how to chain when the current result is not the left-most argument of the next function. In R you would be able to write something mid-chain like

You can use anonymous functions

julia> 3 |>
           x -> 2+4x |>
           x -> 3x^2-8 |>
           log
6.363028103540465

Also, you must pipe to functions. Note I ended with log and not log(). That means this should work:

img = load(path) |> device

Depending on the GPU, it’s likely that you should prefer Float32 over Float64. Consumer GPUs tend to have drastically worse Float64 performance, but if you’re using a cloud service then it is likely Float64 performs just as well relative to Float32 as is the case on CPUs.

3 Likes

Thanks for the response!

I’m happy to learn about the joinpath function. It definitely made the first part much cleaner.

I was able to get the runtime from 394 seconds all the way down to 145 seconds by simply declaring the input types in the function header and all of its variables. This was definitely a valuable lesson in the importance of doing so. I’m still not quite satisfied with the run time, especially since I may want to apply some more computationally-demanding filters to each image at some point down the line. Maybe I’ll spend some time trying to figure out Cthulhu when that time comes.

Have you tried, instead of broadcasting the function, just looping over all paths inside the function?
So, giving the path array as parameter into the function and loop over the array?

Ya know, R had me so conditioned to avoid for loops that I hadn’t even considered that that might be faster here. Turns out it is! Takes around 2 minutes to complete.

The function’s currently looking like:

function process_image(path_vec::Vector{String},h::Int,w::Int)
        result = zeros((h*w),length(path_vec))
        
        for i in enumerate(path_vec)
                img = load(i[2])::Array{RGB{N0f8},2}
                img = Gray.(img)::Array{Gray{N0f8},2}
                img = imresize(img,(h,w))::Array{Gray{N0f8},2}
                img = vec(img)::Vector{Gray{N0f8}}
                img = convert(Array{Float32,1},img)::Vector{Float32}
                result[:,i[1]] = img::Array{Float32,1}
        end
        return result
end

Thanks!

1 Like