CNN for MNIST

Hello guys,

i am trying to make my own CNN for MNIST classification. But i am getting a weird error.

LoadError: DimensionMismatch("Rank of x and w must match! (2 vs. 4)")
in expression starting at C:\Users\tomic\OneDrive\Plocha\BP\julia\3_3.jl:28
DenseConvDims(::Array{Float64,2}, ::Array{Float32,4}; kwargs::Base.Iterators.Pairs{Symbol,Tuple{Int64,Int64},Tuple{Symbol,Symbol,Symbol},NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64},Tuple{Int64,Int64}}}}) at DenseConvDims.jl:50
(::Core.var"#Type##kw")(::NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64},Tuple{Int64,Int64}}}, ::Type{DenseConvDims}, ::Array{Float64,2}, ::Array{Float32,4}) at DenseConvDims.jl:49
#adjoint#1133 at nnlib.jl:37 [inlined]
(::ZygoteRules.var"#adjoint##kw")(::NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64},Tuple{Int64,Int64}}}, ::typeof(ZygoteRules.adjoint), ::Zygote.Context, ::Type{DenseConvDims}, ::Array{Float64,2}, ::Array{Float32,4}) at none:0
_pullback at adjoint.jl:53 [inlined]
Conv at conv.jl:146 [inlined]
_pullback(::Zygote.Context, ::Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}}, ::Array{Float64,2}) at interface2.jl:0
applychain at basic.jl:36 [inlined]
_pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}},MaxPool{2,4},typeof(flatten),Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}, ::Array{Float64,2}) at interface2.jl:0
Chain at basic.jl:38 [inlined]
_pullback(::Zygote.Context, ::Chain{Tuple{Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),Array{Float32,4},Array{Float32,1}},MaxPool{2,4},typeof(flatten),Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}}, ::Array{Float64,2}) at interface2.jl:0
L at 3_3.jl:25 [inlined]
_pullback(::Zygote.Context, ::typeof(L), ::Array{Float64,2}, ::Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}) at interface2.jl:0
adjoint at lib.jl:188 [inlined]
_pullback at adjoint.jl:47 [inlined]
#14 at train.jl:103 [inlined]
_pullback(::Zygote.Context, ::Flux.Optimise.var"#14#20"{typeof(L),Tuple{Array{Float64,2},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}) at interface2.jl:0
pullback(::Function, ::Zygote.Params) at interface.jl:167
gradient(::Function, ::Zygote.Params) at interface.jl:48
macro expansion at train.jl:102 [inlined]
macro expansion at progress.jl:119 [inlined]
train!(::Function, ::Zygote.Params, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{Array{Float64,2},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}}, ::Descent; cb::Flux.var"#throttled#42"{Flux.var"#throttled#38#43"{Bool,Bool,var"#88#89",Int64}}) at train.jl:100
(::Flux.Optimise.var"#train!##kw")(::NamedTuple{(:cb,),Tuple{Flux.var"#throttled#42"{Flux.var"#throttled#38#43"{Bool,Bool,var"#88#89",Int64}}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Zygote.Params, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{Array{Float64,2},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}}, ::Descent) at train.jl:98
top-level scope at 3_3.jl:28
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088

And this is my code… i cant find out what array has 4 dimensions, really confused…

using Pkg
Pkg.add("Flux")
Pkg.add("Images")
Pkg.add("Plots")
using Flux, Flux.Data.MNIST, Images, Plots

labels = MNIST.labels();
images = MNIST.images();

xs = [vec(Float64.(img)) for img in images[1:5000]]
ys = [Flux.onehot(label, 0:9) for label in labels[1:5000]]

imgsize = (28,28,1)
model = Chain(Conv((3, 3), imgsize[3]=>16, pad=(1,1), relu),
        MaxPool((2,2)),
        Conv((3, 3), 16=>32, pad=(1,1), relu),
        MaxPool((2,2)),
        Conv((3, 3), 32=>32, pad=(1,1), relu),
        MaxPool((2,2)),
        flatten,
        Dense(prod(Int.(floor.([imgsize[1]/8,imgsize[2]/8,32]))), 10))



L(x, y) = Flux.crossentropy(model(x), y)
opt = Descent(0.1)
databatch = (Flux.batch(xs), Flux.batch(ys))
Flux.train!(L, params(model), Iterators.repeated(databatch, 1000), opt,
cb = Flux.throttle(() -> println("Probíhá trénování"), 5))

test(i) = findmax(model(vec(Float64.(images[i]))))[2]-1
sum(test(i) == labels[i] for i in 1:60000)/60000code here
julia> xs[1] |> size
(784,)

julia> Flux.batch(xs) |> size
(784, 5000)

The inputs are being flattened to 1D (28\times28 = 784) by vec instead of imgsize. Float32.(img) already returns a 2D array, so all that needs to be done is adding a 3rd unit dimension for the channels:

...
xs = [Flux.unsqueeze(Float32.(img), 3) for img in images[1:5000]]
...

julia> xs[1] |> size
(28, 28, 1)

julia> Flux.batch(xs) |> size
(28, 28, 1, 5000)

Did that and my error changed to following…

oadError: DomainError with -0.16674832:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
in expression starting at C:\Users\tomic\OneDrive\Plocha\BP\julia\test3_3.jl:28
throw_complex_domainerror(::Symbol, ::Float32) at math.jl:33
log(::Float32) at log.jl:321
xlogy at utils.jl:22 [inlined]
_broadcast_getindex_evalf at broadcast.jl:648 [inlined]
_broadcast_getindex at broadcast.jl:621 [inlined]
getindex at broadcast.jl:575 [inlined]
macro expansion at broadcast.jl:932 [inlined]
macro expansion at simdloop.jl:77 [inlined]
copyto! at broadcast.jl:931 [inlined]
copyto! at broadcast.jl:886 [inlined]
copy at broadcast.jl:862 [inlined]
materialize at broadcast.jl:837 [inlined]
adjoint at utils.jl:32 [inlined]
_pullback at adjoint.jl:47 [inlined]
#crossentropy#9 at functions.jl:69 [inlined]
_pullback(::Zygote.Context, ::Flux.Losses.var"##crossentropy#9", ::Int64, ::typeof(Statistics.mean), ::Float32, ::typeof(Flux.Losses.crossentropy), ::Array{Float32,2}, ::Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}) at interface2.jl:0
crossentropy at functions.jl:69 [inlined]
_pullback(::Zygote.Context, ::typeof(Flux.Losses.crossentropy), ::Array{Float32,2}, ::Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}) at interface2.jl:0
L at test3_3.jl:25 [inlined]
_pullback(::Zygote.Context, ::typeof(L), ::Array{Float32,4}, ::Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}) at interface2.jl:0
adjoint at lib.jl:188 [inlined]
_pullback at adjoint.jl:47 [inlined]
#14 at train.jl:103 [inlined]
_pullback(::Zygote.Context, ::Flux.Optimise.var"#14#20"{typeof(L),Tuple{Array{Float32,4},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}) at interface2.jl:0
pullback(::Function, ::Zygote.Params) at interface.jl:167
gradient(::Function, ::Zygote.Params) at interface.jl:48
macro expansion at train.jl:102 [inlined]
macro expansion at progress.jl:119 [inlined]
train!(::Function, ::Zygote.Params, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{Array{Float32,4},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}}, ::Descent; cb::Flux.var"#throttled#42"{Flux.var"#throttled#38#43"{Bool,Bool,var"#19#20",Int64}}) at train.jl:100
(::Flux.Optimise.var"#train!##kw")(::NamedTuple{(:cb,),Tuple{Flux.var"#throttled#42"{Flux.var"#throttled#38#43"{Bool,Bool,var"#19#20",Int64}}}}, ::typeof(Flux.Optimise.train!), ::Function, ::Zygote.Params, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{Array{Float32,4},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}}}}, ::Descent) at train.jl:98
top-level scope at test3_3.jl:28
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088

Made some other changes as moving to GPU and now trainings works and are fast. But i have problems viewing my results on the last two lines. Given error will be below code.

using Pkg
Pkg.add("Flux")
Pkg.add("Images")
Pkg.add("Plots")
Pkg.add("CUDA")
using Flux, Flux.Data.MNIST, Images, Plots, CUDA

labels = MNIST.labels();
images = MNIST.images();

xs = [Flux.unsqueeze(Float64.(img), 3) for img in images[1:5000]]
ys = [Flux.onehot(label, 0:9) for label in labels[1:5000]]

imgsize = (28,28,1)
model = Chain(Conv((3, 3), imgsize[3]=>16, pad=(1,1), relu),
        MaxPool((2,2)),
        Conv((3, 3), 16=>32, pad=(1,1), relu),
        MaxPool((2,2)),
        Conv((3, 3), 32=>32, pad=(1,1), relu),
        MaxPool((2,2)),
        flatten,
        Dense(prod([3,3,32]), 10)) |> gpu

L(x, y) = Flux.crossentropy(model(x), y)
opt = Descent(0.1)
databatch = (Flux.batch(xs), Flux.batch(ys)) |> gpu
Flux.train!(L, params(model), Iterators.repeated(databatch, 500), opt,
cb = Flux.throttle(() -> println("Probíhá trénování"), 5))

test(i) = findmax(model(vec(Float64.(images[i]))))[2]-1
sum(test(i) == labels[i] for i in 1:60000)/60000
LoadError: DimensionMismatch("Rank of x and w must match! (1 vs. 4)")
in expression starting at C:\Users\tomic\OneDrive\Plocha\BP\julia\3_3.jl:37
DenseConvDims(::Array{Float64,1}, ::CuArray{Float32,4}; kwargs::Base.Iterators.Pairs{Symbol,Tuple{Int64,Int64},Tuple{Symbol,Symbol,Symbol},NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64},Tuple{Int64,Int64}}}}) at DenseConvDims.jl:50
(::Core.var"#Type##kw")(::NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64},Tuple{Int64,Int64}}}, ::Type{DenseConvDims}, ::Array{Float64,1}, ::CuArray{Float32,4}) at DenseConvDims.jl:49
(::Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}})(::Array{Float64,1}) at conv.jl:146
applychain(::Tuple{Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}},MaxPool{2,4},typeof(flatten),Dense{typeof(identity),CuArray{Float32,2},CuArray{Float32,1}}}, ::Array{Float64,1}) at basic.jl:36
(::Chain{Tuple{Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}},MaxPool{2,4},Conv{2,2,typeof(relu),CuArray{Float32,4},CuArray{Float32,1}},MaxPool{2,4},typeof(flatten),Dense{typeof(identity),CuArray{Float32,2},CuArray{Float32,1}}}})(::Array{Float64,1}) at basic.jl:38
test(::Int64) at 3_3.jl:36
(::var"#39#40")(::Int64) at none:0
MappingRF at reduce.jl:93 [inlined]
_foldl_impl(::Base.MappingRF{var"#39#40",Base.BottomRF{typeof(Base.add_sum)}}, ::Base._InitialValue, ::UnitRange{Int64}) at reduce.jl:58
foldl_impl at reduce.jl:48 [inlined]
mapfoldl_impl at reduce.jl:44 [inlined]
#mapfoldl#204 at reduce.jl:160 [inlined]
mapfoldl at reduce.jl:160 [inlined]
#mapreduce#208 at reduce.jl:287 [inlined]
mapreduce at reduce.jl:287 [inlined]
sum at reduce.jl:494 [inlined]
sum(::Base.Generator{UnitRange{Int64},var"#39#40"}) at reduce.jl:511
top-level scope at 3_3.jl:37
include_string(::Function, ::Module, ::String, ::String) at loading.jl:1088

This line is still flattening images out into a 1D array instead of reshaping them into a 4D (WxHxCxB) array like model is expecting. Stack traces can be a little noisy, but if you ignore all the library code the error pops out.

I found that later on, but i still cannot find out how to make it 4D in the test phase… Can you help?

I tried now to change it to this. Now i am getting no errors, but it cannot calculate the test even for one image.

test(i) = findmax(model(Flux.batch(xs)))[2]-1
sum(test(i) == labels[i] for i in 1:1)/1

Can you post the full code snippet again and elaborate on what “cannot calculate the test even for one image” means?

I changed it to this. Now its works fine. I wanted to test it on its train data, because i need it as short and simple as possible, but with CNN it seems impossible to use train data for testing.

using Pkg
Pkg.add("Flux")
Pkg.add("CUDA")
Pkg.add("Statistics")
using Flux, Flux.Data.MNIST, CUDA, Statistics

labels = MNIST.labels();
images = MNIST.images();

xs = [Flux.unsqueeze(Float64.(img), 3) for img in images[1:5000]]
ys = [Flux.onehot(label, 0:9) for label in labels[1:5000]]

model = Chain(Conv((3, 3), 1=>16, pad=(1,1), relu),
        MaxPool((2,2)),
        Conv((3, 3), 16=>32, pad=(1,1), relu),
        MaxPool((2,2)),
        Conv((3, 3), 32=>32, pad=(1,1), relu),
        MaxPool((2,2)),
        flatten,
        Dense(288,10),
        softmax) |> gpu

L(x, y) = Flux.logitcrossentropy(model(x), y)
opt = Descent(0.1)
databatch = (Flux.batch(xs), Flux.batch(ys)) |> gpu
Flux.train!(L, params(model), Iterators.repeated(databatch, 1000), opt,
cb = Flux.throttle(() -> println("Probíhá trénování"), 5))

test_images = MNIST.images(:test)
test_labels = MNIST.labels(:test)
txs = [Flux.unsqueeze(Float64.(img), 3) for img in test_images[1:1000]]
tys = [Flux.onehot(label, 0:9) for label in test_labels[1:1000]]

test_set = (Flux.batch(txs), Flux.batch(tys)) |> gpu

accuracy(x, y, model) = mean(Flux.onecold(cpu(model(x))) .== Flux.onecold(cpu(y)))
acc = accuracy(test_set..., model)

The thing to remember about CNNs is that their input is 3d (instead of the “typical” 1d vector input): X + Y + channel. Thus, when you batch lots of them together, you get a 4d input (instead of the “typical” matrix).

I think what might be tripping you up is that the MNIST dataset is implicitly 1-channel, so you’ve used unsqueeze to add in that third dimension. The batching is what adds that fourth dimension. You can of course test single images from either the test or train set — but you just need to either batch them together or make a single one 4d (again with unsqueeze).

2 Likes