A faster way of getting image size from jpg files?

I was trying to figure out why my Python script was well over 1000 times faster than my translation to Julia and the profiler narrowed it down to a single line, the loading of the images to get image dimensions for clipping bounding boxes.

The slowness is probably due to the way that I’m using the Images.load() function, or not cleaning up after.

The question is, What am I doing wrong here?


Slow version

function init_subjects(by_subject, image_dir)::Vector{Subject}
    # <<snip>>

    for old_sub in ProgressBar(by_subject)
        image_file = old_sub.subject_Filename

        image = try
            load("$image_dir/$image_file")
        catch
            @warn "Could not load: $image_file"
            continue
        end

        image_size = size(image)
        # <<snip>>

Fast version

function init_subjects(by_subject, image_dir)::Vector{Subject}
    PILImage = pyimport("PIL.Image")

   # <<snip>>

    for old_sub in ProgressBar(by_subject)
        image_file = old_sub.subject_Filename

        image = try
            PILImage.open("$image_dir/$image_file")
        catch
            @warn "Could not load: $image_file"
            continue
        end

        image_size = reverse(image.size)
        image.close()
        # <<snip>>

I suppose that I can keep the pycall version, but I’d prefer to not have to juggle virtual environments in Julia.

1 Like

While still learning myself, I’ve found try/catch/finally/end are slow. It looks like you’re using try/catch because you’re not sure if the image file is there or PILImage.open will fail. You could use isfile() as a quick test to confirm the file is there. And use filesize() to get the files as an Int.

1 Like

Sorry, you wanted image size, not filesize. Sounds like images.jl does what you want.

1 Like

I think PIL is loading the images lazily (ref):

This is a lazy operation; this function identifies the file, but the file remains open and the actual image data is not read from the file until you try to process the data (or call the load() method).

PIL is probably identifying the image size from the jpeg header, avoiding the need to load the entire image. You could probably do the same thing via LibJpeg along the lines of this post: Get DCT coefficients of jpeg image - #5 by stevengj

1 Like

I think your analysis about why the speed difference is occurring is right.

But Images must be reading the header info too to properly render the image. I guess that I should look into short-circuiting the full load either with the library you mention or in the Images package itself.

thanks.

If you are willing to use ImageMagick.jl, this seems substantially faster on my machine:

using ImageMagick

function imagesize(filename)
       wand = ImageMagick.MagickWand()
       ccall((:MagickPingImage, ImageMagick.libwand), Bool, (Ptr{Cvoid}, Ptr{UInt8}), wand, filename)
       size(wand)
end
3 Likes

I’m already using ImageMagick.jl so that’s easy. The code works as fast as the PIL.Image solution. I just needed to reverse(size(wand)) for my particular case.

Thank you very much.

Edit: This points to a very useful strategy of calling into libraries directly. TIL.

That’s great, glad it helped. It is probably worth checking the return value of the :MagickPingImage if you use this in real code, it is just a boolean indicating success or failure.

Good catch about reversing the order of size I didn’t notice that.