Lux tutorial: AMDGPU 20x slower than CPU

So far, during my quest to get deep learning working in Julia, I found FastAI to have multiple dependency issues (see my previous posts). And Flux itself currently requires a downgrade of AMDGPU to v0.6.x, so I also had issues there.

But progress is being made!

In the case of Lux, I only needed to downgrade from v0.8 to v0.7.6 and was able to use the gpu to train a model.* However, it is much slower than on CPU. In the tutorial (presumably using nVidia/CUDA), they report ~6 sec for first epoch then ~0.4 s for the rest:

I ran that code in two fresh REPL sessions, only commenting out using LuxAMDGPU for the CPU trial. In case someone wants to try it, here is the actual script used: using Zygote, ComponentArrays, Lux, SciMLSensitivity, Optimisers, Ordinar - Pastebin.com

CPU:

julia> train(NeuralODE)
β”Œ Warning: No functional GPU backend found! Defaulting to CPU.
β”‚ 
β”‚ 1. If no GPU is available, nothing needs to be done.
β”‚ 2. If GPU is available, load the corresponding trigger package.
β”‚     a. LuxCUDA.jl for NVIDIA CUDA Support!
β”‚     b. LuxAMDGPU.jl for AMD GPU ROCM Support!
β”‚     c. Metal.jl for Apple Metal GPU Support!
β”” @ LuxDeviceUtils ~/.julia/packages/LuxDeviceUtils/Dee3d/src/LuxDeviceUtils.jl:158
[1/9] 	 Time 1.33s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.1s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.08s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.08s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.08s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.09s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.08s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.08s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.09s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

GPU:

julia> train(NeuralODE)
[1/9] 	 Time 2.88s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 1.38s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 2.12s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 1.87s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 2.14s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 2.21s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 4.31s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 2.83s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 2.78s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

Can anyone verify whether that is a bad model to benchmark GPU vs CPU?

When monitoring GPU usage, I did see that very little time was spent actually doing computations (it was very β€œspiky”). Also, when I reran the GPU model in the same julia session, VRAM usage kept growing and it got slower and slower.

From looking at the repos, I’m starting to suspect I decided to try Julia just a few months before the ML ecosystem worked out the AMDGPU kinks.

*Also I decided to try the 1.10 release candidate since it had a newer version of LLVM than the 1.9.4 I got from juliaup for some reason. Here is the current versioninfo:


julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 Γ— AMD Ryzen Threadripper 2990WX 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
  Threads: 11 on 64 virtual cores

julia> AMDGPU.versioninfo()
ROCm provided by: system
[+] HSA Runtime v1.1.0
    @ /opt/rocm-5.7.1/lib/libhsa-runtime64.so
[+] ld.lld
    @ /opt/rocm/llvm/bin/ld.lld
[+] ROCm-Device-Libs
    @ /home/user1/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode
[+] HIP Runtime v5.7.31921
    @ /opt/rocm-5.7.1/lib/libamdhip64.so
[+] rocBLAS v3.1.0
    @ /opt/rocm-5.7.1/lib/librocblas.so
[+] rocSOLVER v3.23.0
    @ /opt/rocm-5.7.1/lib/librocsolver.so
[+] rocALUTION
    @ /opt/rocm-5.7.1/lib/librocalution.so
[+] rocSPARSE
    @ /opt/rocm-5.7.1/lib/librocsparse.so.0
[+] rocRAND v2.10.5
    @ /opt/rocm-5.7.1/lib/librocrand.so
[+] rocFFT v1.0.21
    @ /opt/rocm-5.7.1/lib/librocfft.so
[+] MIOpen v2.20.0
    @ /opt/rocm-5.7.1/lib/libMIOpen.so

HIP Devices [2]
    1. HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc+:xnack-)
    2. HIPDevice(name="Radeon RX 580 Series", id=2, gcn_arch=gfx803)

3 Likes

Which CPU are you using?
Which GPU are you using?
Which datatype are you using?

If you use FP64 this is to be expected with most consumer GPUs. Consumer GPUs are usually only fast when using FP32.

The versioninfo() is at the bottom of my post. The CPU is Threadripper 2990WX and GPU is AMD Radeon VII. Basically state of the art HEDT from ~5 years ago.

As for datatype, it looks like float32:

julia> train_dataloader, test_dataloader = loadmnist(128, 0.9)
(DataLoader(::Tuple{Array{Float32, 4}, Matrix{Bool}}, shuffle=true, batchsize=128), DataLoader(::Tuple{Array{Float32, 4}, Matrix{Bool}}, batchsize=128))

And here is what happens if I repeat the training in the same REPL session:

julia> train(NeuralODE)
[1/9] 	 Time 2.98s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 1.45s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 1.22s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 1.52s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 1.45s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 1.5s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 1.81s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 1.81s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 1.76s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

julia> train(NeuralODE)
[1/9] 	 Time 2.74s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 1.91s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 2.99s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 4.42s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 4.64s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 5.02s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 4.97s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 4.9s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 5.02s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

julia> train(NeuralODE)
[1/9] 	 Time 7.79s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 4.57s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 6.9s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 4.85s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 4.98s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 5.22s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 5.24s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 5.61s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 5.45s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

julia> train(NeuralODE)
[1/9] 	 Time 7.84s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 5.03s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 8.43s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 5.3s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 5.41s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 5.74s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 5.72s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 6.17s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 5.98s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

julia> train(NeuralODE)
[1/9] 	 Time 8.78s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 5.49s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 14.37s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 8.77s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 10.41s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 7.08s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 10.43s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 9.05s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 10.07s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

julia> train(NeuralODE)
[1/9] 	 Time 19.6s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 14.36s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 59.38s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 6.28s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 6.51s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 6.8s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 7.02s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 7.16s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 7.96s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%

VRAM usage gradually grew from ~300 MB after the first trial, to ~1000 MB after the last. There are 16 GB of VRAM so it isn’t hitting any limits, but you can see it clearly gets slower.

The Radeon VII should be much faster for highly parallel tasks… The question remains, is your testcase highly parallel…

There is clearly something going wrong there. Increasing VRAM and processing time hints to some accumulation and state entanglement… I am really not an expert there, but I am sure it’s not the testcase. Even for a bad (not GPU-friendly) testcase, the times and memory consuptions should be similar for multiple runs.

1 Like

As a fellow Radeon VII user who hasn’t tried it out with AMDGPU yet, I’d be very interested in hearing what the issue was if you manage to resolve it.

Can reproduce this with my RX6700 XT, not sure what’s causing it yet.
I don’t see such behavior with other compute intensive projects though, will try to figure out.
Would be good to create a smaller MWE as this one is quite involved.

Thanks, I will also try to come up with a MWE. This was the closest I found:

The problem is: after running sufficiently long, buffers do not get reclaimed by CuArrays quickly enough, which causes the GPU to run out of memory, and performance to slow to a crawl.

However, I wasn’t close to running out of memory.

Just noticed that stream-ordered allocator (async) is ~300x slower than non-async.

#include <hip/hip_runtime.h>
#include <iostream>

using namespace std;

void check(int res) {
    if (res != 0) {
        std::cerr << "Fail" << std::endl;
    }
}

int main(int argc, char* argv[]) {
    hipStream_t s;
    check(hipStreamCreateWithPriority(&s, 0, 0));

    /*
    std::cout << "Regular" << std::endl;
    for (int i = 1; i < 100000; i++) {
        float *x;
        check(hipMalloc((void**)&x, 4));
        check(hipFree(x));
    }
    */

    std::cout << "Async" << std::endl;
    for (int i = 1; i < 100000; i++) {
        float *x;
        check(hipMallocAsync((void**)&x, 4, s));
        check(hipFreeAsync(x, s));
    }

    return 0;
}
pxl-th@Leleka:~/code$ time ./a.out 
Regular

real	0m0,256s
user	0m0,206s
sys	0m0,033s

pxl-th@Leleka:~/code$ time ./a.out 
Async

real	1m15,237s
user	1m47,751s
sys	0m0,828s

That is consistent with profiling results that show that most of the time is spent in either hipMallocAsync or hipFreeAsync.

Indeed, moving to non-async allocator improves performance significantly:

julia> train(NeuralODE)
[1/9] 	Time 1.71
[2/9] 	Time 0.23
[3/9] 	Time 0.25
[4/9] 	Time 0.18
[5/9] 	Time 0.25
[6/9] 	Time 0.18
[7/9] 	Time 0.26
[8/9] 	Time 0.27
[9/9] 	Time 0.18

julia> train(NeuralODE)
[1/9] 	Time 0.17
[2/9] 	Time 0.27
[3/9] 	Time 0.16
[4/9] 	Time 0.18
[5/9] 	Time 0.2
[6/9] 	Time 0.37
[7/9] 	Time 0.18
[8/9] 	Time 0.3
[9/9] 	Time 0.18

julia> train(NeuralODE)
[1/9] 	Time 0.18
[2/9] 	Time 0.25
[3/9] 	Time 0.17
[4/9] 	Time 0.19
[5/9] 	Time 0.21
[6/9] 	Time 0.32
[7/9] 	Time 0.24
[8/9] 	Time 0.25
[9/9] 	Time 0.18

I’ll create a PR to do that.

5 Likes

But after this change, is AMDGPU still slower than the CPU?

Yes, because I think this problem is bound by the memory transfer, not compute.
You have lots of small allocations + memory transfers and performing little computation.

Just tried on Nvidia RTX 2060 and it gives similar performance to AMD GPU.

2 Likes

Thanks, is there anything else I need to try it out?

I think I’d have to modify my local copy of LuxAMDGPU/project.toml so its compatible with v0.8 right?

Probably also master branches of NNlib and Flux (if Lux relies on it).

To clarify this example is going to be quite slow on GPU given the size is quite small. I also left a note β€œFor a model this size, you will notice that training time is significantly lower for training on CPU than on GPU.” in the tutorial but for some reason the CPU timings on the CI is really bad. Timings on all other servers / local machines I have suggest CPU to be faster.

Just NNlib and LuxAMDGPU need to be updated

Yes, that solved the speed and vram usage. Thanks.

=====
1
[1/9] 	 Time 2.78s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.46s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.32s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.54s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.32s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.54s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.34s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.55s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.34s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
2
[1/9] 	 Time 0.32s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.31s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.6s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.33s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.79s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.34s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.35s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.33s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.46s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
3
[1/9] 	 Time 0.32s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.44s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.3s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.31s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.59s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.54s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.64s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.35s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.32s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
4
[1/9] 	 Time 0.34s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.44s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.32s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.36s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.34s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.52s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.68s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.62s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.32s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
5
[1/9] 	 Time 0.31s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.43s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.31s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.35s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.31s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.5s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.64s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.6s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.47s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
6
[1/9] 	 Time 0.3s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.43s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.31s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.34s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.35s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.52s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.66s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.61s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.45s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
7
[1/9] 	 Time 0.32s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.43s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.31s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.34s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.33s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.52s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.66s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.59s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.47s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
8
[1/9] 	 Time 0.32s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.42s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.3s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.34s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.32s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.51s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.69s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.6s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.48s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
9
[1/9] 	 Time 0.32s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.43s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.3s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.34s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.32s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.52s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.66s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.6s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.45s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
=====
10
[1/9] 	 Time 0.32s 	 Training Accuracy: 50.96% 	 Test Accuracy: 43.33%
[2/9] 	 Time 0.42s 	 Training Accuracy: 69.63% 	 Test Accuracy: 66.0%
[3/9] 	 Time 0.3s 	 Training Accuracy: 77.93% 	 Test Accuracy: 71.33%
[4/9] 	 Time 0.34s 	 Training Accuracy: 80.74% 	 Test Accuracy: 76.67%
[5/9] 	 Time 0.32s 	 Training Accuracy: 82.52% 	 Test Accuracy: 78.0%
[6/9] 	 Time 0.52s 	 Training Accuracy: 84.07% 	 Test Accuracy: 78.67%
[7/9] 	 Time 0.67s 	 Training Accuracy: 85.33% 	 Test Accuracy: 80.67%
[8/9] 	 Time 0.6s 	 Training Accuracy: 86.59% 	 Test Accuracy: 81.33%
[9/9] 	 Time 0.46s 	 Training Accuracy: 87.7% 	 Test Accuracy: 82.0%
2 Likes