I was trying to figure out why my Python script was well over 1000 times faster than my translation to Julia and the profiler narrowed it down to a single line, the loading of the images to get image dimensions for clipping bounding boxes.
The slowness is probably due to the way that I’m using the Images.load() function, or not cleaning up after.
The question is, What am I doing wrong here?
Slow version
function init_subjects(by_subject, image_dir)::Vector{Subject}
# <<snip>>
for old_sub in ProgressBar(by_subject)
image_file = old_sub.subject_Filename
image = try
load("$image_dir/$image_file")
catch
@warn "Could not load: $image_file"
continue
end
image_size = size(image)
# <<snip>>
Fast version
function init_subjects(by_subject, image_dir)::Vector{Subject}
PILImage = pyimport("PIL.Image")
# <<snip>>
for old_sub in ProgressBar(by_subject)
image_file = old_sub.subject_Filename
image = try
PILImage.open("$image_dir/$image_file")
catch
@warn "Could not load: $image_file"
continue
end
image_size = reverse(image.size)
image.close()
# <<snip>>
I suppose that I can keep the pycall version, but I’d prefer to not have to juggle virtual environments in Julia.
1 Like
While still learning myself, I’ve found try/catch/finally/end are slow. It looks like you’re using try/catch because you’re not sure if the image file is there or PILImage.open will fail. You could use isfile() as a quick test to confirm the file is there. And use filesize() to get the files as an Int.
1 Like
Sorry, you wanted image size, not filesize. Sounds like images.jl does what you want.
1 Like
I think PIL is loading the images lazily (ref):
This is a lazy operation; this function identifies the file, but the file remains open and the actual image data is not read from the file until you try to process the data (or call the load()
method).
PIL is probably identifying the image size from the jpeg header, avoiding the need to load the entire image. You could probably do the same thing via LibJpeg along the lines of this post: Get DCT coefficients of jpeg image - #5 by stevengj
1 Like
I think your analysis about why the speed difference is occurring is right.
But Images must be reading the header info too to properly render the image. I guess that I should look into short-circuiting the full load either with the library you mention or in the Images package itself.
thanks.
If you are willing to use ImageMagick.jl, this seems substantially faster on my machine:
using ImageMagick
function imagesize(filename)
wand = ImageMagick.MagickWand()
ccall((:MagickPingImage, ImageMagick.libwand), Bool, (Ptr{Cvoid}, Ptr{UInt8}), wand, filename)
size(wand)
end
3 Likes
I’m already using ImageMagick.jl so that’s easy. The code works as fast as the PIL.Image solution. I just needed to reverse(size(wand))
for my particular case.
Thank you very much.
Edit: This points to a very useful strategy of calling into libraries directly. TIL.
That’s great, glad it helped. It is probably worth checking the return value of the :MagickPingImage
if you use this in real code, it is just a boolean indicating success or failure.
Good catch about reversing the order of size I didn’t notice that.