Help with Malaria dataset

Hello, just starting out with Julia here.

I’ve downloaded the Malaria dataset from kaggle and I have two folders of images, one parasitized cells and the other normal cells. How do I go about loading all these images into an array?

I could do a for loop but I don’t know the command to get the number of image files in a folder. The file names aren’t consistent either.

Many thanks

You could obtain all the files and directories: readdir(dir_path).
Then check if an element is a file or a directory: isfile(file_path).
Then for each element, you could check for the file extension: endswith(file_name, suffix).

Deano have you looked at Metalhead https://github.com/FluxML/Metalhead.jl

thanks! I’m getting error no method matching joinpath() when I try to filter the files

filter(isfile(), readdir())

I’m guessing I have to format the strings somehow?

Use

filter(isfile, readdir())

instead-- you want to pass the function isfile to filter, whereas in your version, you’re calling isfile (with no arguments), and then passing the result to filter.

1 Like

I should have seen that. thank you!

1 Like

Glob.jl is useful for this sort of thing:

using Glob
readdir(glob"*.jpg", "/path/to/image/dir")

Brilliant responses thank you!

I now have

parapath = "../Kaggle Datasets/cell-images-for-detecting-malaria/cell_images/Parasitized/"
parafiles = readdir(glob"*.png", parapath)
paraimages = hcat(reshape(map(load,parafiles),:))

and the same for the uninfected images. I then use

X = vcat(paraimages,unimages)

to get my input array. Is this the standard input type for flux?