Flux failing on GPU


#1

Hello all,

I’ve been trying to get up and running with Flux to create some DNNs in Julia, but I’ve been running into some issues with getting CUDA working with Flux.

I have some preliminary work in this notebook. I tried to adapt the VAE in the model zoo to a gpu-implementation to no avail (my last attempt at coercing the code to use the GPU is near the top of the notebook). I was able to run the autoencoder in the model zoo sucessfully (again see the notebook).

Lastly, I tried to load a custom dataset into a CNN defined in Flux (see the end of the notebook). This works on the CPU, but with the GPU I get the message:

conversion to pointer not defined for CuArray{Float32,2}

I’m thinking my problems are mostly due to me being new to the language and package, so any help in how to get any of this working would be greatly appreciated. I provide versioning details in the notebook.

Thank you for the help!


#2

Maybe https://github.com/FluxML/Flux.jl/issues/286? Hard to tell without a backtrace, package version numbers, etc.


#3

Hello @maleadt, thanks for the response. I have the version numbers inside this notebook (hosted on a gist on github), which I posted previously. The version numbers are in the second block. The stacktraces for all failures are all present if you scroll down along with the code that caused them (see the block 8 and 18 for the stacktraces).

In response to your connection to the issue, I had seen that issue and checked to make sure that CuDNN was installed (which is why I printed the version number at the top of the notebook). I believe it is properly installed since I am able to use CuArrays with the autoencoder model and I have been able to use this machine for training models on the GPU with pytorch >=v1 and tensorflow v1.12.

Any other ideas based on this information? Please let me know if you cannot see the notebook.

Thanks for the help


#4

Any chance you figured this out? I think I’m hitting the same issue. The autoencoder example runs fine on the GPU, but the MNIST convolutional classifier example only runs on the CPU, and gives me “conversion to pointer not defined for CuArray{Float32,2}” error on the GPU.


#5

I’ve not yet figured it out. I tried to test this on another GPU-enabled machine I have access to, but the install of CuArrays failed with this issue.

The reason I wanted to test this on different machines was to see if a different version of CUDA/CuDNN (or perhaps a correctly installed version of CUDA/CuDNN) made a difference. If you have root access, perhaps try re-installing CuDNN? As @maleadt previously brought up in this issue, the problem was associated with CuDNN. I unfortunately do not have root access to either machine, so I am waiting for those system admins to respond and/or some helpful individual to respond on here (I’d be extremely grateful for any assistance).

If you happen to figure out what the problem was, I’d greatly appreciate it if you could post your solution. Good luck :+1:


#6

I do have root access, and I’m happy to try anything you suggest. As my name suggests, I’m a noob, so I really have very little idea what I’m doing though. On the other hand, when I run the cuDNN MNIST example, it ends with this:
“Test passed!”.

Other possibly useful info:
cudnnGetVersion() : 7402 , CUDNN_VERSION from cudnn.h : 7402 (7.4.2)
and, in case it matters, nvidia-smi tells me I have 410.48.


#7

Update: When I Pkg.test(“Flux”), I get the same error as this: https://github.com/FluxML/Flux.jl/issues/417
Which is meant to be from a lack of cuDNN. But the cuDNN MNIST example test runs just fine.
So maybe Julia/Flux can’t see my cuDNN installation? Or Flux doesn’t like my cuDNN installation? Do I need a particular cuDNN version?

@jcreinhold - what happens when you Pkg.test(“Flux”)?


#8

@jcreinhold Fixed mine.

Basically, I just had to Pkg.build(“CuArrays”) and Pkg.build(“Flux”) after installing cuDNN. This should have been obvious, but I’m a noob.

This doesn’t sound like it applies to you though. One thing that might be relevant to you from my experience:
“I believe it is properly installed since I am able to use CuArrays with the autoencoder model” - it appears that the autoencoder model works fine without cuDNN, so maybe your cuDNN is just missing? Are you able to run any of the cuDNN tests from the cuDNN samples, like “mnistCUDNN”?


#9

Thanks for the feedback! It is really helpful. I ran test Flux and received the same message as in the issue you previously posted.

For completeness, this is the specific unittest report I receive along with the stack trace.

Test Summary:    | Pass  Error  Total
Flux             |  359      1    360
  Throttle       |   11            11
  Jacobian       |    1             1
  Initialization |   14            14
  Params         |    2             2
  onecold        |    4             4
  Optimise       |   10            10
  Training Loop  |    1             1
  basic          |   17            17
  Dropout        |    8             8
  BatchNorm      |   13            13
  losses         |   12            12
  Pooling        |    2             2
  CNN            |    1             1
  Tracker        |  248           248
  CuArrays       |    7      1      8
ERROR: LoadError: Some tests did not pass: 359 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:24
ERROR: LoadError: failed process: Process(`/home/jacobr/code/julia-1.0.3/bin/julia -Cnative -J/home/jacobr/code/julia-1.0.3/lib/julia/sys.so --compile=yes --depwarn=yes --color=yes --compiled-modules=yes --startup-file=no --code-coverage=none /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl`, ProcessExited(1)) [1]
Stacktrace:
 [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at ./error.jl:42
 [2] pipeline_error at ./process.jl:705 [inlined]
 [3] #run#503(::Bool, ::Function, ::Cmd) at ./process.jl:663
 [4] run(::Cmd) at ./process.jl:661
 [5] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:5
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1044
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] include(::String) at ./client.jl:392
 [10] top-level scope at none:0
in expression starting at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:3
ERROR: Package Flux errored during testing

So clearly this is a problem with my cudnn installation. However, it is not clear to me what the problem is. The above error was after exporting LD_LIBRARY_PATH=/usr/lib64 (which is where my libcudnn.so lies) and reinstalling CuArrays and Flux. That is, I removed my .julia directory, exported LD_LIBRARY_PATH=/usr/lib64 and added CuArrays and Flux. See the below commands which show a little bit more info.

(synthnn) 04:40:22 ~$ env | grep -i LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/lib64
(synthnn) 04:40:28 ~$ ls /usr/lib64 | grep -i cudnn
libcudnn.so.7
libcudnn.so.7.2.1

This seems to have worked for the issue that @maleadt referenced, but unfortunately this does not seem to have solved my problem. Unless anyone sees that I am clearly doing something wrong, I’ll either talk to my sysadmin about the cudnn installation or try to do a local install of CUDA and see if that works (which would allow me to test mnistCUDNN). Thanks for the help!


#10

Please post your CuArrays/deps/ext.jl, it should show whether cudnn got detected properly.


#11

Thanks for the response. As expected, the file shows that cuDNN is not being detected. Here is the ext.jl contents:

const libcufft = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcufft.so"
const libcublas = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcublas.so"
const configured = true
const libcusolver = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusolver.so"
const libcurand = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcurand.so"
const libcudnn = nothing

Any thoughts on how to force the installation to see cuDNN? Or is there just something wrong with this installation of CUDA/cuDNN?

For what it’s worth, I realized that my previous comment about tensorflow and pytorch working was sort of meaningless. I installed them via conda which provided the runtime environments for both CUDA and cuDNN in the conda environment, so the CUDA and cuDNN libraries installed on the system weren’t used (from what I can tell reading this).


#12

CUDAapi is responsible for detecting cuDNN, so you can go ahead and open an issue there. Please attach the output of Pkg.build("CuArrays") with JULIA_DEBUG=CUDAapi in your environment.

As a workaround, you can just hack the ext.jl to contain the path to libcudnn.so :slight_smile:


#13

I opened an issue on the CUDAapi repo here. However, I modified the ext.jl to the following:

# autogenerated file, do not edit
const libcufft = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcufft.so"
const libcublas = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcublas.so"
const configured = true
const libcusolver = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcusolver.so"
const libcurand = "/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcurand.so"
const libcudnn = "/usr/lib64/libcudnn.so.7.2.1"

and reran test Flux and receive the following (truncated) output:

[ Info: Testing Flux/CUDNN
batch_size = 1: Error During Test at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:9
  Got exception outside of a @test
  CUDNNError(code 3, CUDNN_STATUS_BAD_PARAM)
  Stacktrace:
   [1] macro expansion at /home/jacobr/.julia/packages/CuArrays/f4Eke/src/dnn/error.jl:19 [inlined]
   [2] cudnnRNNBackwardData(::Flux.CUDA.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::Ptr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::Ptr{Nothing}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::Ptr{Nothing}, ::CuArray{UInt8,1}, ::CuArray{UInt8,1}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/cuda/cudnn.jl:193
   [3] backwardData(::Flux.CUDA.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Nothing, ::CuArray{Float32,1}, ::Nothing, ::CuArray{UInt8,1}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/cuda/cudnn.jl:210
   [4] backwardData(::Flux.CUDA.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::CuArray{UInt8,1}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/cuda/cudnn.jl:218
   [5] (::getfield(Flux.CUDA, Symbol("##11#12")){Flux.GRUCell{TrackedArray{\u2026,CuArray{Float32,2}},TrackedArray{\u2026,CuArray{Float32,1}}},TrackedArray{\u2026,CuArray{Float32,1}},TrackedArray{\u2026,CuArray{Float32,1}},CuArray{UInt8,1},Tuple{CuArray{Float32,1},CuArray{Float32,1}}})(::Tuple{CuArray{Float32,1},CuArray{Float32,1}}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/cuda/cudnn.jl:329
   [6] back_(::Flux.Tracker.Call{getfield(Flux.CUDA, Symbol("##11#12")){Flux.GRUCell{TrackedArray{\u2026,CuArray{Float32,2}},TrackedArray{\u2026,CuArray{Float32,1}}},TrackedArray{\u2026,CuArray{Float32,1}},TrackedArray{\u2026,CuArray{Float32,1}},CuArray{UInt8,1},Tuple{CuArray{Float32,1},CuArray{Float32,1}}},Tuple{Flux.Tracker.Tracked{CuArray{Float32,1}},Flux.Tracker.Tracked{CuArray{Float32,1}},Flux.Tracker.Tracked{CuArray{Float32,2}},Flux.Tracker.Tracked{CuArray{Float32,2}},Flux.Tracker.Tracked{CuArray{Float32,1}}}}, ::Tuple{CuArray{Float32,1},CuArray{Float32,1}}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/tracker/back.jl:23
   [7] back(::Flux.Tracker.Tracked{Tuple{CuArray{Float32,1},CuArray{Float32,1}}}, ::Tuple{CuArray{Float32,1},Int64}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/tracker/back.jl:43
   [8] foreach(::Function, ::Tuple{Flux.Tracker.Tracked{Tuple{CuArray{Float32,1},CuArray{Float32,1}}},Nothing}, ::Tuple{Tuple{CuArray{Float32,1},Int64},Nothing}) at ./abstractarray.jl:1836
   [9] back_(::Flux.Tracker.Call{getfield(Flux.Tracker, Symbol("##328#330")){Flux.Tracker.TrackedTuple{Tuple{CuArray{Float32,1},CuArray{Float32,1}}},Int64},Tuple{Flux.Tracker.Tracked{Tuple{CuArray{Float32,1},CuArray{Float32,1}}},Nothing}}, ::CuArray{Float32,1}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/tracker/back.jl:26
   [10] back(::Flux.Tracker.Tracked{CuArray{Float32,1}}, ::CuArray{Float32,1}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/tracker/back.jl:45
   [11] back!(::TrackedArray{\u2026,CuArray{Float32,1}}, ::CuArray{Float32,1}) at /home/jacobr/.julia/packages/Flux/jsf3Y/src/tracker/back.jl:62
   [12] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:25 [inlined]
   [13] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
   [14] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:9 [inlined]
   [15] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
   [16] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6 [inlined]
   [17] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined]
   [18] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6
   [19] include at ./boot.jl:317 [inlined]
   [20] include_relative(::Module, ::String) at ./loading.jl:1044
   [21] include(::Module, ::String) at ./sysimg.jl:29
   [22] include(::String) at ./client.jl:392
   [23] top-level scope at none:0
   [24] include at ./boot.jl:317 [inlined]
   [25] include_relative(::Module, ::String) at ./loading.jl:1044
   [26] include(::Module, ::String) at ./sysimg.jl:29
   [27] include(::String) at ./client.jl:392
   [28] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:45 [inlined]
   [29] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined]
   [30] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:26
   [31] include at ./boot.jl:317 [inlined]
   [32] include_relative(::Module, ::String) at ./loading.jl:1044
   [33] include(::Module, ::String) at ./sysimg.jl:29
   [34] exec_options(::Base.JLOptions) at ./client.jl:266
   [35] _start() at ./client.jl:425
batch_size = 5: Test Failed at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:28
  Expression: ((rnn.cell).Wi).grad \u2248 collect(((curnn.cell).Wi).grad)
   Evaluated: [-0.00367632 -0.00351105 \u2026 -0.00199258 -0.00363324; 0.0218888 0.0180698 \u2026 0.0237639 0.0238772; \u2026 ; -1.5432 -1.33318 \u2026 -1.58462 -1.74017; -1.05911 -1.06296 \u2026 -1.05237 -1.35042] \u2248 Float32[-0.00157259 -0.00118586 \u2026 -0.00106842 -0.0011475; 0.0174023 0.0131109 \u2026 0.021793 0.018576; \u2026 ; -1.82545 -1.64515 \u2026 -1.70861 -2.07368; -0.937059 -0.928057 \u2026 -0.998756 -1.2062]
Stacktrace:
 [1] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:28 [inlined]
 [2] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [3] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:9 [inlined]
 [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [5] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6 [inlined]
 [6] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined]
 [7] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6
batch_size = 5: Test Failed at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:29
  Expression: ((rnn.cell).Wh).grad \u2248 collect(((curnn.cell).Wh).grad)
   Evaluated: [-0.00167179 -0.000571634 \u2026 -0.00737623 -0.00230114; 0.00472922 -0.000953073 \u2026 -0.00443951 0.0117585; \u2026 ; -0.0160675 0.0515638 \u2026 0.208142 -0.385053; 0.0282091 0.044212 \u2026 0.154527 -0.288217] \u2248 Float32[-0.00680693 -0.00125408 \u2026 -0.0138485 0.00544874; 0.00551455 -0.000585967 \u2026 -0.0031398 0.0100578; \u2026 ; -0.00432964 0.0610827 \u2026 0.232323 -0.418382; 0.0216776 0.0394996 \u2026 0.14176 -0.270818]
Stacktrace:
 [1] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:29 [inlined]
 [2] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [3] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:9 [inlined]
 [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [5] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6 [inlined]
 [6] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined]
 [7] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6
batch_size = 5: Test Failed at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:30
  Expression: ((rnn.cell).b).grad \u2248 collect(((curnn.cell).b).grad)
   Evaluated: [-0.00485622, 0.0356765, 0.0566717, -0.0967002, -0.0736521, 0.415462, 0.169964, -0.183906, 1.05687, -0.911806, 0.00403009, -1.50994, -0.834565, -2.4546, -1.56953] \u2248 Float32[-0.00230207, 0.0302294, 0.0633711, -0.100256, -0.0690399, 0.270476, 0.153396, -0.207993, 1.27673, -0.803211, 0.344462, -0.880549, -1.12154, -2.79729, -1.42135]
Stacktrace:
 [1] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:30 [inlined]
 [2] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [3] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:9 [inlined]
 [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [5] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6 [inlined]
 [6] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined]
 [7] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6
batch_size = 5: Test Failed at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:31
  Expression: ((rnn.cell).h).grad \u2248 collect(((curnn.cell).h).grad)
   Evaluated: [-0.236697, -0.686411, -0.373723, -0.637998, -1.26944] \u2248 Float32[-0.0674128, -0.592256, -0.351478, -0.822845, -1.20205]
Stacktrace:
 [1] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:31 [inlined]
 [2] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [3] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:9 [inlined]
 [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1156 [inlined]
 [5] macro expansion at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6 [inlined]
 [6] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined]
 [7] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/cuda/cudnn.jl:6
Test Summary:        | Pass  Fail  Error  Total
Flux                 |  399     4      1    404
  Throttle           |   11                  11
  Jacobian           |    1                   1
  Initialization     |   14                  14
  Params             |    2                   2
  onecold            |    4                   4
  Optimise           |   10                  10
  Training Loop      |    1                   1
  basic              |   17                  17
  Dropout            |    8                   8
  BatchNorm          |   13                  13
  losses             |   12                  12
  Pooling            |    2                   2
  CNN                |    1                   1
  Tracker            |  248                 248
  CuArrays           |    7                   7
  RNN                |   40     4      1     45
    R = Flux.RNN     |   16                  16
    R = Flux.GRU     |    6     4      1     11
      batch_size = 1 |    2            1      3
      batch_size = 5 |    4     4             8
    R = Flux.LSTM    |   18                  18
ERROR: LoadError: Some tests did not pass: 399 passed, 4 failed, 1 errored, 0 broken.
in expression starting at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:24
ERROR: LoadError: failed process: Process(`/home/jacobr/code/julia-1.0.3/bin/julia -Cnative -J/home/jacobr/code/julia-1.0.3/lib/julia/sys.so --compile=yes --depwarn=yes --color=yes --compiled-modules=yes --startup-file=no --code-coverage=none /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl`, ProcessExited(1)) [1]
Stacktrace:
 [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at ./error.jl:42
 [2] pipeline_error at ./process.jl:705 [inlined]
 [3] #run#503(::Bool, ::Function, ::Cmd) at ./process.jl:663
 [4] run(::Cmd) at ./process.jl:661
 [5] top-level scope at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:5
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1044
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] include(::String) at ./client.jl:392
 [10] top-level scope at none:0
in expression starting at /home/jacobr/.julia/packages/Flux/jsf3Y/test/runtests.jl:3
ERROR: Package Flux errored during testing

Does this indicate that there is a problem with my cuDNN library? Should I just open an issue on Flux?

For what it’s worth, the error message I originally reported receiving is gone after modifying the ext.jl file with the location of libcudnn.so. That is, the experiment that resulted in

conversion to pointer not defined for CuArray{Float32,2}

(as described in the first post of this thread), now completes without throwing an error :+1:.

Thank you very much for all the help, really appreciate it and great work!


#14

Are you sure your cuDNN copy is compatible with your CUDA version? Try downloading it from nvidia.com and using that path instead, preferably the same version as you have on your system now to confirm or rule out that possibility.


#15

Just to close the loop on what was discussed here. The issue of not finding cuDNN was resolved in the raised issue.

However, even with this fix and a fresh install, I am still getting the same failures when running the Flux unittests. It looks like this is—most likely—a problem with the CUDA/cuDNN installation.

I downloaded the cuDNN version associated with CUDA 9.0 (which is the CUDA version installed on my system) and replaced the cudnn entry in CuArrays/deps/ext.jl and ran test Flux, but the same errors occur. I’ll talk to the administrator of the system and see if we can reinstall CUDA and cudnn. I’ll update when that is done and see if the issue is resolved.

Thanks for all the help!


#16

Actually, it looks like my installation might be fine. According to this open issue this is expected and is seemingly a problem with the cuDNN RNN API. @Noob, if you are still around, do you receive a similar error when you test Flux now that you have an installation that sees cuDNN?


#17

Yes, I get that error when my installation is “working”. It doesn’t seem to prevent it from actually running models on the GPU though. Does your conv example run yet?


#18

Yup, my conv example does work now :+1:. Just to check, do you happen to also receive this error?

CUDNNError(code 3, CUDNN_STATUS_BAD_PARAM)

I just want to check to see if this is specific to my case. Thanks for the feedback!


#19

Yup! Same error, but only on the test. Haven’t run into it with any actual NNs yet.