I am experiencing different behavior running the same command:
include(raw"/home/james/Julia_projs/calling_pytorch_proj/main.jl")
using the VSCode extension versus using the julia REPL called from the command line (Ubuntu 18.04). My program is crashing in the command line setting, but working correctly in the VSCode extension setting (it has never crashed in 10+ trials in the VSCode setting, while it crashes every time within a couple of minutes in the terminal setting).
I haven’t been able to get a minimum reproducible example: The program is complex and uses PyCall to call PyTorch, CUDA, etc, and the error messages seem to vary non-deterministically and usually have something to do with the GPU or multiprocessing (examples below). However, I have ruled out the obvious things to check in each setting: The Julia versions are the same (1.4.2), the Python versions and conda environments are the same, and the working directories are the same (checked in both Julia and in PyCall/Python code in each setting). I even checked and made sure the cuda and torch versions are the same, and that the GPU device is the same.
So what are the differences between running something in the VSCode extension terminal, and running it in the command line? What could be causing the difference in behavior? I apologize if this is a trivial misunderstanding about the basics of how the REPL works (or some other similarly trivial mistake/misunderstanding). Thank you in advance for the help.
Examples of some errors in the terminal setting (I don’t think these are necessary to answer my question, but just in case they are helpful):
ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/james/.julia/packages/PyCall/BcTLp/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'RuntimeError'> RuntimeError('CUDA error: out of memory') File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 230, in _apply param_applied = fn(param) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 430, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
or
ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/james/.julia/packages/PyCall/BcTLp/src/pyiterator.jl:9 =# @pysym(:PyObject_GetIter), PyPtr, (PyPtr,), po))))) <class 'OSError'> OSError(24, 'Too many open files') File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 278, in __iter__ return _MultiProcessingDataLoaderIter(self) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 667, in __init__ index_queue = multiprocessing_context.Queue() File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/context.py", line 102, in Queue return Queue(maxsize, ctx=self.get_context()) File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/queues.py", line 42, in __init__ self._rlock = ctx.Lock() File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/context.py", line 67, in Lock return Lock(ctx=self.get_context()) File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/synchronize.py", line 162, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx) File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/synchronize.py", line 59, in __init__ unlink_now)