Hello! I am new here.
I have been trying to use Metalhead.jl/Flux.jl to train custom image classifiers but I cant seem to be able to load data from a folder directly. (Not part of the standard datasets).
Say I have a folder with 10 sub folders → each of them with a 1000 images.
How can I load this up to perform a simple classification task?
Thank you!
Great to see such an active community
1 Like
Welcome, @SubhadityaMukherjee!
It is great that you want to use Flux.
There is true that there is not a DataLoader able to read from folder directly.
I found the same problem, so I have to implement it myself, I am going to give you the code to help you:
using Metalhead
using FileIO
using Images
using Serialization
using Random
function absolute(dir::AbstractString)
return expanduser(abspath(dir))
end
"""
read_resnet(dir::AbstractString, output::AbstractString)
Read the images in the directory dir, apply the resnet and save the results
to an new directory diroutput.
- dir should have the structure partitionX/test/<category>.
- diroutput is a new directory that store the resnet structure, with
the structure partitionX/test, partitionX/train in which the results are serialized
as the pair matrix, category.
"""
function apply_model(apply_model::Function, dir::AbstractString, diroutput::AbstractString)
partitions = readdir(dir)
subdirs = ["test", "train"]
categories = String[]
if (!isdir(diroutput))
mkdir(diroutput)
end
# Put in absolute path
dir = absolute(dir)
diroutput = absolute(diroutput)
# Check categories
for subdir in subdirs, partition in partitions
cats = readdir(joinpath(dir, partition, subdir))
if isempty(categories)
categories = cats
else
@assert size(cats) == size(categories) && all(cats .== categories) "Error, categories '$categories' and '$cats' are not the same"
end
end
for subdir in subdirs
for partition in partitions
for category in categories
files = readdir(joinpath(dir, partition, subdir, category), join=true)
outputdir = joinpath(diroutput, partition)
if (!isdir(outputdir))
mkdir(outputdir)
end
outputfile = joinpath(outputdir, "$(subdir)_$(category).bin")
if (isfile(outputfile))
println("Ignore '$outputfile'")
continue
end
output = reduce(hcat, [apply_model(file)::Array{Float32,2} for file in files])
open(outputfile, "w") do file
serialize(file, output)
end
println("Written '$outputfile'")
end
end
end
end
function main_apply_resnet()
Random.seed!(42)
model = ResNet()
function apply_resnet(file)
img = RGB.(Images.load(file))
output = model.layers(Metalhead.preprocess(img))
return output
end
apply_model(apply_resnet, "data/", "resnet_data/")
end
isinteractive() || main_apply_resnet()
Nowadays, there is effort in create a DataLoader that allow this type of things.
I hope it could help you.
1 Like
Thank you so much! This really helps a lot. @dmolina
I was actually trying to make something like fastais(from the Pytorch world) data loader for Julia. I thought it would be really helpful as this bit of Flux isnt that developed yet and it is really awesome so why not contribute a bit haha. But I couldn’t figure out how to get the files in an array in usable time.
Now hopefully I will be able to
Have a great day!
1 Like