How to convert a JPEG image (from mobile) to a stantard 60k Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8, 8}}, 2}) (from Flux.Data.MNIST)?

For an exercise I want to let students draw their own digits, scan them with a mobile and let a MINST-trained flux NN to category them.

I see that the single MINST images are of type Matrix{ColorTypes.Gray{FixedPointNumbers.N0f8}} (alias for Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8, 8}}, 2}).

How do I transform my NxN greyscale mynumber.jpg to the same form as the MNIST images ?

I solved (I think, I still need to try running the actual classifier) with:

using Images, FileIO, ImageTransformations
img_path = "./data/test5.jpg"
img  = load(img_path)
img2 = Gray.(img)
img3 = imresize(img2, (28,28))
img4 = 1.0 .- img3

A bit OT, why the images loaded using MLDatasets are mirrored and rotated compared to those loaded by using Flux.Data.MNIST ?

using Flux, Flux.Data.MNIST
imgs = MNIST.images()
firstImg = imgs[1]


using MLDatasets 
train_x, train_y = MLDatasets .MNIST.traindata()
firstimg_MLD   = convert(Matrix{Gray{N0f8}},train_x[:,:,1])


How do I get them back in “normal mode” (although I think this is not a big issue in classification using NN, as the rotation/mirroring should be learned) ?

For those interested, this is the complete script together with a grid for letting students write their own digits… works quite well :slight_smile: :slight_smile:

(the actual classification script is from various tutorials, mainly this one)

using Pkg

using DelimitedFiles
using Statistics
using Flux
using Flux: Data.DataLoader
using Flux: onehotbatch, onecold, crossentropy
using Flux: @epochs
using MLDatasets # For loading the training data
using Images, FileIO, ImageTransformations # For loading the actual images

# Training of the model

x_train, y_train = MLDatasets.MNIST.traindata()
x_train          = permutedims(x_train,(2,1,3)) # For correct img axis
x_train_imgs     = convert(Array{Gray{N0f8},3},deepcopy(x_train))
x_train          = convert(Array{Float32,3},x_train)
x_train          = reshape(x_train,(28,28,1,60000))

y_train          = onehotbatch(y_train, 0:9)
train_data       = DataLoader((x_train, y_train), batchsize=128)
model = Chain(
    # 28x28 => 14x14
    Conv((5, 5), 1=>8, pad=2, stride=2, relu),
    # 14x14 => 7x7
    Conv((3, 3), 8=>16, pad=1, stride=2, relu),
    # 7x7 => 4x4
    Conv((3, 3), 16=>32, pad=1, stride=2, relu),
    # 4x4 => 2x2
    Conv((3, 3), 32=>32, pad=1, stride=2, relu),
    # Average pooling on each width x height feature map
    Dense(32, 10),
accuracy(ŷ, y) =  (mean(onecold(ŷ) .== onecold(y)))
loss(x, y)     = Flux.crossentropy(model(x), y)
# learning rate
opt = Descent(0.1)
#opt = Flux.ADAM()
ps = Flux.params(model)

number_epochs = 10
@epochs number_epochs Flux.train!(loss, ps, train_data, opt)

accuracy(model(x_train), y_train) # 0.981

# Loading imgs
function cleanImg!(img,threshold=0.3,radius=0)
    (R,C) = size(img)
    for c in 1:C
        for r in 1:R
            if img[r,c] <= threshold
                allneighmoursunderthreshold = true
                for c2 in max(1,c-radius):min(C,c+radius)
                    for r2 in max(1,r-radius):min(R,r+radius)
                        if img[r2,c2] > threshold
                            allneighmoursunderthreshold = false
                if allneighmoursunderthreshold
                    img[r,c] = Gray(0.0)
    return img
imgs_y = convert(Array{Int64,1},dropdims(readdlm("./data/img_labels.txt"),dims=2))
imgs_path = ["./data/test$(i).png" for i in 1:24]
imgs = load.(imgs_path)
imgs = [Gray.(i) for i in imgs]
imgs = [imresize(i, (28,28)) for i in imgs]
imgs = [1.0 .- i for i in imgs]
imgs = cleanImg!.(imgs, 0.3,1)
imgs = cat(imgs...,dims=3)
imgs = reshape(imgs,(28,28,1,size(imgs,3)))

# Doing the actual classification

imgs_est = model(imgs)

imgs_ŷ = onecold(imgs_est, 0:9)

probs = maximum(imgs_est,dims=1)

mean(imgs_ŷ .== imgs_y)


1 Like

FWIW, you can avoid almost all of that conversion logic by using MNIST · MLDatasets.jl.

…hmmm… even with “basic” preprocessing (GIMP → levels → reducing the max input levels) and the “cleaning” in the script I can’t go over 60%… ok, it’s better than 10%, but remains quite unsatisfactory… I don’t think the problem is in the NN model itself, rather in the preprocessing so that the handmade digits look more like the ones for which the model has been trained… any ideas ?