Why Might One Get Different Behavior in the VSCode Extension vs the Terminal?

I am experiencing different behavior running the same command:
include(raw"/home/james/Julia_projs/calling_pytorch_proj/main.jl")
using the VSCode extension versus using the julia REPL called from the command line (Ubuntu 18.04). My program is crashing in the command line setting, but working correctly in the VSCode extension setting (it has never crashed in 10+ trials in the VSCode setting, while it crashes every time within a couple of minutes in the terminal setting).

I haven’t been able to get a minimum reproducible example: The program is complex and uses PyCall to call PyTorch, CUDA, etc, and the error messages seem to vary non-deterministically and usually have something to do with the GPU or multiprocessing (examples below). However, I have ruled out the obvious things to check in each setting: The Julia versions are the same (1.4.2), the Python versions and conda environments are the same, and the working directories are the same (checked in both Julia and in PyCall/Python code in each setting). I even checked and made sure the cuda and torch versions are the same, and that the GPU device is the same.

So what are the differences between running something in the VSCode extension terminal, and running it in the command line? What could be causing the difference in behavior? I apologize if this is a trivial misunderstanding about the basics of how the REPL works (or some other similarly trivial mistake/misunderstanding). Thank you in advance for the help.

Examples of some errors in the terminal setting (I don’t think these are necessary to answer my question, but just in case they are helpful):

ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/james/.julia/packages/PyCall/BcTLp/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'RuntimeError'> RuntimeError('CUDA error: out of memory') File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 230, in _apply param_applied = fn(param) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 430, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
or
ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/james/.julia/packages/PyCall/BcTLp/src/pyiterator.jl:9 =# @pysym(:PyObject_GetIter), PyPtr, (PyPtr,), po))))) <class 'OSError'> OSError(24, 'Too many open files') File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 278, in __iter__ return _MultiProcessingDataLoaderIter(self) File "/home/james/anaconda3/envs/myenv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 667, in __init__ index_queue = multiprocessing_context.Queue() File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/context.py", line 102, in Queue return Queue(maxsize, ctx=self.get_context()) File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/queues.py", line 42, in __init__ self._rlock = ctx.Lock() File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/context.py", line 67, in Lock return Lock(ctx=self.get_context()) File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/synchronize.py", line 162, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx) File "/home/james/anaconda3/envs/myenv/lib/python3.7/multiprocessing/synchronize.py", line 59, in __init__ unlink_now)

Are the number of threads the same for both? How to change the number of threads?

It might also be worthwhile to check for differences in your whole ENV (especially the PATH entry) in both environments…

1 Like

Thank you for the suggestions, those were good things to check.

I fixed the problem, but was not able to find exactly why the difference in behavior was occurring. Like user pfitzseb suggested checking, there were many differences in the ENV, and a few in the PATH, but after spending about an hour investigating, I was unable to discover the exact cause (all the most obvious relevant things, such as conda and Python related variables, seemed to be the same in both settings).

I was, however, able to stop the crashes in the terminal setting (without hurting anything in the VSCode setting) by adding (in Python called by PyCall)

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

as suggested here. That seems to fix all the error messages in my case. While this solution is not a complete answer to my original question, perhaps it will help a future reader.

1 Like