Const FileData = AbstractVector{UInt8}

Hi,

I thought const FileData = AbstractVector{UInt8} it is just an alias, but afaik it is not.

buf = obj.buf  # buf is defined as ::FileData
while start < size
    ...
    fn(@view buf[start:stop])
    ...
end

The version above allocates memory in every @view invocation. In my case, it is several GBs of data.
The version below, allocates (almost) nothing. I expected this also with the version above. Any explanation why that is not the case?

buf::Vector{UInt8} = obj.buf  # buf is defined as FileData
while start < size
   ...
   fn(@view buf[start:stop])
   ...
end

thanks, Juergen

AbstractVector{UInt8} and Vector{UInt8} are two very different things, why did you not compare using const FileData = Vector{UInt8} instead of const FileData = AbstractVector{UInt8}?

1 Like

Now I completely removed const FileData = AbstractVector{UInt8} from the source code.

This is still slow

buf = obj.buf  # buf is defined as FileData
while start < size
   ...
   fn(@view buf[start:stop])
   ...
end

This is 30x times faster (and has almost no allocations):

buf::Vector{UInt8} = obj.buf  # buf is defined as FileData
while start < size
   ...
   fn(@view buf[start:stop])
   ...
end

Did you replace const FileData = AbstractVector{UInt8} with const FileData = Vector{UInt8}? That was the suggestion @Henrique_Becker made. You seem to be doing something different.

2 Likes

Try wrapping your calculation in a function. You are working in a global scope and it is not a very good idea for performance.

1 Like

@Skoffer: this is just a code snippet, actually taken from a function

@johnmyleswhite / @Henrique_Becker I now reverted the latest changes and only changed const FileData = AbstractVector{UInt8} with const FileData = Vector{UInt8}. I had to make some changes to our test code, which is using b"…" a lot, which creates CodeUnits{UInt8,String}. That was my original reason to use AbstractVector. With this change the code now runs much faster and much fewer allocations.

And even though I understand the performance increase, I still don’t understand the reason why Julia needs to do so many allocations (copies I assume) when passing abstract types around. I thought variables and function arguments are just binding to the same “slot”, no matter whether abstract or not. At least as long as I’m not converting to another concrete type.

-Juergen

Hmmm, I am not sure this has anything with copies. Often the allocations come from the fact the type cannot be unboxed this is, just stored as it is, it is instead behind a pointer which also stores the current concrete type saved in the same field (this indirection layer needs to be separately allocated many times). Also, if you had an struct that is concrete, but had a boxed/abstract field inside, any function receiving a struct will specialize for this concrete struct, but when it has to deal with the inner field then it has to work around any possible type that may be there what is know to generate allocation. One way to avoid that is not manipulating the boxed field directly in the function that takes your struct, but instead grouping all computation over that field inside a function and just passing the field to it, so this inner function will be specialized to the right type. This technique is called “function barrier” and described in the docs.

Note that it would be much easier to help with a self-contained MWE.

1 Like