Could (non) @view be made faster? I.e. @view be implicit using read-only arrays?

I see the suggestion “use @view/@views” often, to fix a performance problem; possibly for people coming from Python (is it also the default in MATLAB?).

I know the difference, and support having a (at least a conceptional) copy by default, since it seems safer, no overwriting the original array by accident (also was done for historical reasons, to keep compatibility). But you need to explain this difference. So still could views be automatic?

I’ll start with a simplified/non-multithreaded picture:

If you do not use @view (e.g. in a loop), then you make a copy, O(n), which is of course slow, but on top of that, you make an allocation for that copy and rely on the GC to clean up.

What seems could happen, since you allocated in a loop (assuming you didn’t call others giving out a pointer to the copy, or return the array from your function), the compiler could do a Libc.free on it at the end of the loop-iteration, since there’s only one owner of the copy. That would at least reduce the GC pressure. This would also work in the multi-threaded picture.

But I’m thinking why do the copy at all? If you return a read-only array view, then things will be as fast as in Python? But we broke compatibility since you were promised a copy you could write to. But that’s very often not done anyway, so this seems cool. You could copy-on-write to an actual copy, on demand, if you would want to support both ways, but I’m thinking, should the compiler notice there’s no write to the array, and only do this read-only view if you never try to write to it? Otherwise just do status quo.

Now for the multi-threaded view.

When you take a @view to an array, you are taking a risk, threads could be changing the array you are basing on while you use the view, i.e. a race condition. A copy seems safer, but it actually seems to have the same problem, just that then you are only exposed to the problem while it’s being copied (i.e. while done for you).

So having an implicit @view is slightly less safe, I’m not sure if that should be considered a breaking change, since it’s a matter of degree, not that you were adding risk and had none before. This might though be likely to break code, and not be tested enough, since multi-threading is not yet the default… and it would always be a potential problem thereafter or if people enable.

So I’m thinking should not just only views, but arrays too, be read-only from the start? I think this has been suggested already. Here, right now, I’m focusing on what the compiler should do (with no help; or possibly with such a flag for each array). Could it see such a read-only flag that users would not ever have to worry about?

btw, I think this is very similar to Rust’s Slice type

1 Like

I think all you describe can be addressed by escape analysis, and be non-breaking (unless one considered that not performing an unnecessary allocation to be a breaking change).

1 Like

Obligatory

1 Like

To quote @StefanKarpinski from hacker news:

No, there is no copy-on-write. We try to keep the behavior and performance as transparent as possible and cow is distinctly non-transparent. If you want a copy, you have to explicitly make a copy.

1 Like

My understanding is that Rust’s Slice type is simply a view, analogous to Julia’s SubArray. This is orthogonal to whether slices create a view or a copy by default, or whether there are copy-on-write optimizations for the latter.

2 Likes

If you don’t use @view, you are asking for a copy, and I was proposing not doing that as an optimization. That only works if you don’t try to write to that copy (and I would be ok with not supporting that with COW, just status quo), but even if you would, and use COW, then the performance would be the same.

1 Like

Fun enough, I recently changed some allocation tests of a package because they started to fail, as I was asking the allocations to be approximately something instead of less than something, and in 1.10 the number of allocations decreased.

1 Like

Slices are often not contiguous, can’t SIMD or cache much there. Just checking NumPy, which generally does not mind doing allocations out of your control, it seems like it does not make internal copies for contiguity and will incur performance costs:

x = np.arange(0, 1000, 0.001)

y = x[::5] # basic slicing is guaranteed view

yc = np.ascontiguousarray(y) # make contiguous copy

%timeit 2*y
676 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit 2*yc
114 µs ± 620 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
4 Likes

Right, I was thinking of a contiguous slice, but even a seeming one isn’t necessarily faster with a @view so that advise is flawed (for generic code, e.g. in packages), unless you know your code is dealing with a regular Julia Array or some subtypes of AbstractArray such as an OffsetArrays. There are however many more e.g. LazyArray that will not be faster with a @view. And more unusual types of arrays were it may not apply, see many at (I suppose NumPy has only same as Array in Julia, or a COW variant of):

To be fair (I did my own timing):

In [13]: %timeit yc = np.ascontiguousarray(y)
400 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [14]: %timeit 2*y
375 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [15]: %timeit 2*yc
143 µs ± 705 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The contiguous array will be faster after you make it so, but the cost of doing that, i.e. combined is slower. That may no longer apply if you do 2 or more operations on it. It seems in Julia, you would not want to

In Python (NumPy that is) do you get a view, or do you get a COW view? It seems if you get a view (as in Julia, when you ask for it) that it’s not good in the case if you return it from a function (unless it’s COW).

I suppose if we want the Python semantics, we could make PArray.jl, and have it there under JuliaArray, and it would just use NumPy… it would be similar, but exactly not the same as using PythonCall.jl directly (or PyCall.jl), though would likely be implemented with it.

[I was using ipython for the first time from the shell, is it basically the REPL most Python users use? It seems in some, maybe all, cases better? Better than default Python that is, not better than Julia’s REPL. Is it the best Python REPL? It’s the same exact one as you get in IPython/Jupyther notebooks(?).]

AFAIK NumPy doesn’t introduce any copy-on-write semantics.

It’s fine if you know what you’re doing. NumPy guarantees that “basic slicing” returns a view so you know to deal or leverage the data-sharing. Copying and “advanced slicing” would be done manually to make independent data. I prefer Julia having view versus getindex methods with A[i] defaulting to the latter, but either way it’s explicit control.

If I am summing this up right, are you basically asking for a copy-on-write-or-noncontiguity Array upon getindex?