Programmatic Image Tagging

Here is my quick test of programmatic image tagging. My approach uses exiftool. The end goal is to be able to save useful metadata pertaining to plots into the saved images themselves.

Definite Includes

  • Specific keywords, plot specific
  • versioninfo()
  • Pkg.status()
  • Executing code header comments
  • Executing code path

Other Possible Includes

  • Executing script in its entirety, anything more is impractical
  • Author
  • DOI/publication reference, when image will be published
  • Executing script hash/commit number

How can versioninfo() and Pkg.status() results be saved? They both return Void.

It’s probably not the perfect place to ask this but does anybody know a better image organization program than Google’s old Picasa program? It’ great for embedded tagging and searching through tagged images but it’s showing its age. The best option would be cross platform but I would still like to hear of options specific to Linux, Windows, and MacOS.


using PyPlot

#################
##  Constants  ##
#################
src = "/home/user/Desktop/IMG_0042.jpg" # Existing image
src2 = "/home/user/Desktop/test.jpg" # Will be generated
new_keywords1 = ["julia","Hello","cruel world"]

################################
##  Plot and Save an Example  ##
################################
if isfile(src2)
	rm(src2)
end
fig = figure("TaggingExample",figsize=(4,4))
p = plot(rand(10,1))
savefig(src2)
close(fig)

#########################
##  Load All Metadata  ##
#########################
mdata = readstring(`exiftool $src`)

f = findin(mdata,'\n')
tags = Array{String}(length(f))
values = Array{String}(length(f))
dict_input = Array{Tuple{String,String}}(length(f))
searchstart = 1
sep = 0
for i = 1:length(f)
	sep = findfirst(mdata[searchstart:end],':')
	ending = findfirst(mdata[searchstart:end],'\n')
	tags[i] = strip(mdata[searchstart:sep + searchstart - 2])
	values[i] = mdata[sep + searchstart + 1:searchstart + ending - 2]
	dict_input[i] = (tags[i],values[i])
	searchstart = searchstart + ending
end
settings = Dict(dict_input)
keywords = String.(strip.(readcsv(IOBuffer(settings["Keywords"]))))
keywords = reshape(keywords,length(keywords),1)

##########################
##  Load ONLY Keywords  ##
##########################
mdata2 = readstring(`exiftool -Keywords $src`)
f = findfirst(mdata2,':')
keywords2 = mdata2[f+1:end-1]

####################
##  Add Keywords  ##
####################
del_arg = `-overwrite_original`
arg = `exiftool $del_arg`
for i in new_keywords1
	temp = "-Keywords+=$i"
	arg = `$arg $temp`
end
arg = `$arg $src2`
cmd = readstring(arg)

@yakir12

Cool. There is @tim.holy’s ImageMetadata.jl, but not sure how that would work with this.
Another thought is that most publication ready images, especially plots, will be in some vector graphics, not bitmaps. The metadata of those files might be managed differently than how you were planning this, or how exiftools works.
Ideally, a user would only need to using SaveTags (just madeup name), and all subsequent saves will include all the metadata you mentioned (users should be allowed to change the default tags). Not sure where and how this should happen: as a plot recipe or in FileIO… Basically, it’ll be an enhancement to all saves, so that people could trace back what was going on, and therefore should not be limited to images alone. Having said that, I have no idea if it’s even possible to attach such metadata to non-image files… But it’s be nice…
Sorry, I don’t have any input on the image software. I managed without, but I’d love to know if you find one.

ImageMetadata.jl looks like it’s for images already loaded into Julia but the documentation is pretty sparse so I can’t be sure about that.

This method is only limited to whatever exiftool is limited to. exiftool does work on a variety of image (include vector image formats) so depending on the format you want to use it is possible. See this list of formats that exiftool can deal with. I successfully tested writing and reading tags to an EPS image.

I hadn’t thought about overloading the standard save functions but it’s a good idea; anything to overcome the laziness and forgetfulness that limits how much metadata is saved. I’m a bit torn if it should be default because of the added save time and I haven’t dealt with Plots.jl yet so I’m not sure how it would be implemented.

Picasa is still quite functional if you deal with lots of images but it might not deal with tags from much other than JPGs. It’s part of why I’m looking for something better.

Maybe we can connect ImageMetadata to FileIO so that metadata written there carries through to the resulting file. This should somehow done for Plots too. Theoretically, it might be useful to add something similar to a tagging recipe so that any package that uses FileIO can allow for the tags to follow through and be saved to the file. In practicality, this might only be relevant to images (but HDF5 might like this as well). I’d never suggest baking this in as a default to FileIO (or anything else), but the default fields you’d save with the file should be sane to start with and easy to omit (like, what if I’d rather not save the author’s name for some reason).
This is an exciting idea!

1 Like

ImageMetadata.jl looks like it’s for images already loaded into Julia but the documentation is pretty sparse so I can’t be sure about that.

Yep, you got it. I haven’t put any thought into saving metadata (except “use JLD2” or formats like NRRD), so you’re way ahead of me here. Seems like a great direction to explore. Ideally one would like direct access to the library functions rather than relying on shelling out (it would be much faster).

2 Likes

I never could figure out how to use (for example) exiftools libraries, so I’ve only “shelled out”…

ImageMagick.jl uses EXIF properties to get the alignment of an image, so you can check it for an example. Not sure whether ImageMagick provides access to all EXIF properties, though.

1 Like

If someone is planning to put effort in attaching metadata to plots (and other images) then might I suggest going with XMP, rather than EXIF? It is probably an easier place to start.

I don’t know much about either, but I remember trying to write code to insert some simple exif data (decades ago) and finding that it was nearly impossible, because you have to parse every single exif tag, even the ones you don’t care about. For the same reason it is also basically impossible to define your own tags.

Returning to this a bit:
To at least start modestly and try to get the modification date & time of the image, what would be easiest:

  1. figure out of to not “shell out” and use exiftool libraries?
  2. add a function that gets that specific field to ImageMagick.jl?

I have to admit that both are beyond me, but I would love to be able to rapidly access that metadata. Right now I shell out to exiftool…

2 Likes