Related to this topic, I’d like to read the modification time in the metadata of images. Right now I can do this by “shelling out” and using exiftool. While this works, it’s painfully slow (when I need to process hundreds of images for instance). So I was wondering which of these two alternatives would be easiest for me to learn, and if you people have some pointers on how to get started on either:
figure out how to not “shell out” and use exiftool libraries directly?
rely on existing mechanisms in ImageMagick.jl and somehow write a function that extracts that specific field (modification time)?
I have to admit that both are beyond me, but I would love to be able to rapidly access that metadata. Any help is appreciated!
Other ideas:
if you are doing hundreds of thousands of images,
and are currently using run serially,
switch to using asyncs (e.g. asyncmap) and do them in parallel.
Asyncs are not normally truly parallel but they should be for run I think.
If not async+spawn+wait, certainly will be.
If the problem is the fixed cost of the shell startup (don’t know, did not benchmark), then you could use a single shell run to extract data from multiple images using exiftool, which can write to CSV or text files in custom formats.
Wow, ok, so now it takes just ~0.3 seconds for 100 images. To summarize up to now:
@oxinabox’s asyncmap improves the time by a factor of 3.
@Tamas_Papp’s passing exiftool the whole list of images improves it by a factor of 30.
I might need to change the title of this topic
So now my question is, had I been smart enough to BinDeps the libraries of exiftool and figure out their API and use it from within my package, how much better would it really be? Is it even worth it? I tried @Tamas_Papp’s trick with 1000 images and it only took ~2 seconds, so it seems like most of that time is overhead…
I would still take the lazy way out: eg when each image is created, write the output of exiftool into a sidecar file, either in JLD2 format or CSV or anything you can parse quickly. This is assuming that you use the images more times than you create them.
Yea, I was talking to @oxinabox about using DataDeps as a means to create halfway files (like your sidecar files), eliminating the need to reprocess those every time.