Best way to import the dogscats dataset

natema · June 8, 2020, 5:33pm

I would like to reproduce in julia some pytorch transfer-learning experiments which use Resnet18 on the Kaggle’s Dogs vs. Cats dataset (direct download from fast.ai).
I would like some feedback on how I’m importing the data.

I’m using the package Images to import the images represented as CHW arrays (this should be what the code I’m trying to reproduce does, by calling torchvision.datasets.ImageFolder() with torchvision.transforms.ToTensor() as a parameter).
Resnet18 takes as input 224\times 224 images, so I’m also using Images.PaddedView to crop them.

Here’s my code:

using Images

cd("~/data/dogscats/")

# Storing data as an array of tuples
img_example = load("train/dogs/dog.1933.jpg")
data_elem_example = (img = copy(channelview(img_example)), class = "dog", filename = "dog.1933.jpg")
data_elem_type = typeof(data_elem_example)

train_set = Array{data_elem_type}(undef,0)
valid_set = Array{data_elem_type}(undef,0)

function crop_center(new_size::Number, img::Array{RGB{Normed{UInt8,8}},2})
   radius = new_size/2
   h_size, v_size = size(img)
   h_shift = floor(Int32, radius - h_size/2)
   v_shift = floor(Int32, radius - v_size/2)

   shift_img = (h_shift, v_shift)
   out_dims= (new_size, new_size)
   return copy(PaddedView(0, img, out_dims, shift_img))
end

for s in ["train", "valid"]
   for (root, dirs, files) in walkdir(data_path*"/"*s)
      for file in files
         img_path = joinpath(root, file)
         img_cropped = crop_center(224, load(img_path))
         CHW_img = copy(channelview(img_cropped))
         class = splitpath(root)[end]
         target_set = s == "train" ? train_set : valid_set
         push!(target_set, (img = CHW_img, class = class, filename = file))
      end
   end
end

Thanks in advance for your time and suggestions.

Topic		Replies	Views
I am trying to import a image dataset from kaggle New to Julia question	4	639	March 27, 2022
Loading 60k images from a folder. Python code is way faster than Julia New to Julia question , images , speed-optimization	11	499	June 14, 2025
How to load a portion of image dataset New to Julia question	5	1027	April 2, 2022
How to use dataloader New to Julia flux	0	289	October 31, 2020
How to implement custom image dataset for cnn in Julia Flux? Machine Learning	6	1026	January 9, 2023

Best way to import the dogscats dataset

Related topics