Matrix multiplication yields slightly different results with CPU and CUDA. Is this a normal behaviour?
julia> using CUDA
julia> X = randn(Float32, 8, 16)
8×16 Array{Float32,2}:
0.933839 0.535854 0.054124 … -0.397162 0.795055 -0.797737
0.0457853 -0.130413 -1.96217 -0.883076 -0.382235 0.349118
1.09144 1.56768 -0.924807 -0.386719 1.31583 1.12212
0.936833 0.598771 0.278092 0.184339 0.598756 -0.130373
-0.379447 -0.326397 0.374349 0.258037 -1.08479 0.0759822
-1.32479 -1.19413 0.124332 … 0.836484 -1.80901 0.28113
-0.665023 -1.7046 -0.609449 -0.298331 2.9212 0.554513
0.29803 -1.30962 -0.980602 0.469455 -0.740527 -0.967735
julia> Xn = X ./ sqrt.(sum(abs2, X; dims=1))
8×16 Array{Float32,2}:
0.402862 0.175964 0.0216077 … -0.267963 0.19633 -0.434993
0.019752 -0.042825 -0.783351 -0.595807 -0.0943884 0.190368
0.470852 0.514796 -0.369207 -0.260918 0.324928 0.611874
0.404154 0.196625 0.111021 0.124373 0.147856 -0.0710901
-0.163695 -0.107182 0.14945 0.174096 -0.267877 0.0414318
-0.57152 -0.39213 0.0496365 … 0.564372 -0.446714 0.153296
-0.286894 -0.559757 -0.243308 -0.201282 0.721356 0.302367
0.128571 -0.430052 -0.391482 0.316739 -0.182864 -0.52769
julia> Xngpu = CuArray(Xn)
8×16 CuArray{Float32,2}:
0.402862 0.175964 0.0216077 … -0.267963 0.19633 -0.434993
0.019752 -0.042825 -0.783351 -0.595807 -0.0943884 0.190368
0.470852 0.514796 -0.369207 -0.260918 0.324928 0.611874
0.404154 0.196625 0.111021 0.124373 0.147856 -0.0710901
-0.163695 -0.107182 0.14945 0.174096 -0.267877 0.0414318
-0.57152 -0.39213 0.0496365 … 0.564372 -0.446714 0.153296
-0.286894 -0.559757 -0.243308 -0.201282 0.721356 0.302367
0.128571 -0.430052 -0.391482 0.316739 -0.182864 -0.52769
julia> Xn'Xn
16×16 Array{Float32,2}:
1.0 0.738857 -0.169102 … -0.444887 0.358672 -0.161098
0.738857 1.0 0.138181 -0.395013 0.113672 0.209446
-0.169102 0.138181 1.0 0.550084 -0.1915 -0.245511
-0.282271 0.0657576 -0.530405 -0.304412 -0.216795 0.710804
-0.256974 -0.0361658 0.0866007 -0.194733 -0.218947 -0.0136854
-0.201131 -0.43224 0.136756 … 0.558831 -0.287434 -0.610079
0.1918 -0.0818162 -0.593462 -0.666823 0.617977 0.376696
-0.0702096 0.261633 -0.013336 -0.299576 -0.0797219 0.234997
0.155916 0.133923 0.613856 0.238527 -0.228438 -0.694853
0.0748265 0.186705 -0.670178 -0.576346 0.117481 0.563981
-0.369942 -0.404038 0.590167 … 0.522179 0.135644 -0.249161
0.285386 0.463637 -0.246384 -0.584528 0.133876 0.339589
0.201663 0.228354 -0.0933126 -0.332328 0.0289431 0.0490236
-0.444887 -0.395013 0.550084 1.0 -0.564628 -0.299623
0.358672 0.113672 -0.1915 -0.564628 1.0 0.319965
-0.161098 0.209446 -0.245511 … -0.299623 0.319965 1.0
julia> Xngpu'Xngpu
16×16 CuArray{Float32,2}:
0.999695 0.738645 -0.1691 … -0.444913 0.358664 -0.161021
0.738645 0.999493 0.138218 -0.394975 0.113817 0.209457
-0.1691 0.138218 0.999792 0.549904 -0.191429 -0.245223
-0.282113 0.0656881 -0.530367 -0.304383 -0.216708 0.710804
-0.257117 -0.0362208 0.086632 -0.194721 -0.218912 -0.0135966
-0.201098 -0.432148 0.136684 … 0.558802 -0.287445 -0.610156
0.191717 -0.0817629 -0.593494 -0.666916 0.617858 0.376594
-0.0701617 0.261592 -0.0133105 -0.299631 -0.0795134 0.235098
0.155724 0.133949 0.613852 0.238599 -0.228488 -0.69481
0.0749329 0.186722 -0.669925 -0.576274 0.117621 0.563793
-0.369988 -0.403988 0.590042 … 0.522104 0.135522 -0.249156
0.285362 0.463679 -0.24624 -0.584576 0.134029 0.339626
0.201541 0.228333 -0.0931169 -0.332243 0.0289078 0.0490151
-0.444913 -0.394975 0.549904 1.0 -0.564707 -0.299485
0.358664 0.113817 -0.191429 -0.564707 0.999773 0.319842
-0.161021 0.209457 -0.245223 … -0.299485 0.319842 1.00005
Julia and package versions:
julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
(jl_2M2wMs) pkg> st
Status `/tmp/jl_2M2wMs/Project.toml`
[052768ef] CUDA v2.4.1