BitVector Adapt GPUs

I have a custom type which contains a BitVector.
struct customType{T<:AbstractFloat}


I’m trying to use Adapt.@adapt_structure customType{Float64}

It still fails with isbitstype(customType{Float64}), where other customTypes without the BitVector do not.

I tried adding Adapt.@adapt_structure BitVector, but that doesn’t seem to help.

I imagine I need the adapt_storage on the BitVector, but being this is the first time I’ve tried to do this, I thought I’d ask for guidance.

What’s the best way to make this work for GPUs? Do BitVectors give any advantage, should I have just used a Vector{Bool} ? I kind of would like to know how to fix it though with Adapt even if that is the case.

Best Regards

BitArrays are not supported on the GPU, as they contain hard-coded Arrays. The structure would need to be generalized in Base first.

So its probably better to do something like Vector{Int} or Vector{Uint8} is there a performance difference if all I’m doing is checking true and false on the array really?

You’ll want a parametric type so that the Vector can become a CuVector for GPU execution.
It probably won’t be slower, but obviously will consume much more memory.

I’ll probably just leave it as a Vector unspecified and see if that solves my problem, and make it more restrictive if necessary.

struct BinaryCode{T<:AbstractFloat,C<:UInt} <:PulseModulationStyle{T,C}
	BinaryCode{T,C}(code::Vector{C}, s_chipLength::T) where {T,C} = new{T,C}(code,s_chipLength, (length(code))*s_chipLength)

import Adapt
Adapt.@adapt_structure BinaryCode

What’s the syntax to get Adapt.@adapt_structure BinaryCode to work? I’ve been beating my head against this for several hours now. I think I need to do something magical with the Vector. Not quite understanding the flow from the Docs. Thank you.

No, you can’t use a Vector. As I mentioned before, it needs to be a parametric type so that (1) you can create a version of your struct containing a CuVector (this is not automatic, as the CPU->GPU upload is costly and hence cannot be done automatically when launching a kernel) and (2) the @cuda macro can convert that object to one containing a CuDeviceVector that’s compatible with on-device execution. See the docs for an example, Using custom structs · CUDA.jl. If you can’t get it to work, please post a MWE (e.g. with simplified struct definitions and dummy kernel launches) so that people can help you out.

That gets tricky. Can it be a Tuple or a StaticArray? It’s static, it won’t change after creation.

You already have a parametric type, so it’s mostly a matter of adding another type param :). This works locally for me:

# Note: I changed UInt to Unsigned here because UInt === UInt64 which is a concrete type.
# If you only care C == UInt, then you can remove the type param entirely.
# I also removed the <: PulseModulationStyle for local testing, but that shouldn't affect anything.
struct BinaryCode{T<:AbstractFloat,C<:Unsigned,V<:AbstractVector{C}} <: PulseModulationStyle{T,C}

BinaryCode(code::AbstractVector, s_chipLength::AbstractFloat) = BinaryCode(code, s_chipLength, length(code)*s_chipLength)

using CUDA
using Adapt
Adapt.@adapt_structure BinaryCode

bc = BinaryCode(UInt[1, 0], 2.)
@show bc
@show cu(bc)

# Output
# bc = BinaryCode{Float64, UInt64, Vector{UInt64}}(UInt64[0x0000000000000001, 0x0000000000000000], 2.0, 4.0)
# cu(bc) = BinaryCode{Float64, UInt64, CuArray{UInt64, 1, CUDA.Mem.DeviceBuffer}}(UInt64[0x0000000000000001, 0x0000000000000000], 2.0, 4.0)
1 Like

Thank you. I think I like this better. I don’t know why I was over-constraining the AbstractVector Type. Should this be replaced with a Tuple or a StaticArray so it doesn’t have to be a cuda array at all? In this case, the vector is only used for playback and the struct is not mutable and i don’t need the vector to be especially while it is on GPU.

I don’t know how a static would affect code generation as now I would expect I’m generating new code for every size from 0 to N that I might decided to put here. Would that play into type instability which GPUs don’t seem to care for?

I still get an error:

module ...
export BinaryCode
struct BinaryCode{T<:AbstractFloat,V<:AbstractVector} <:PulseModulationStyle{T,V}
	code::V        # 1 is 0 phase, 0 is pi phase ## Is this the best representation?
	s_chipLength::T     # time to hold each chip 
	s_totalLength::T    # time to wrap calculated length
	BinaryCode{T,V}(code::V, s_chipLength::T) where {T,V} = new{T,V}(code,s_chipLength, (length(code))*s_chipLength)

Adapt.@adapt_structure BinaryCode

.... # stuff

end #module

#In Test

ERROR: MethodError: no method matching BinaryCode(::CuArray{UInt64, 1, CUDA.Mem.DeviceBuffer}, ::Float64, ::Float64)
 [1] adapt_structure(to::CUDA.CuArrayAdaptor{CUDA.Mem.DeviceBuffer}, obj::BinaryCode{Float64, Vector{UInt64}})
   @ Main C:\Users\bakerar\.julia\packages\Adapt\LAQOx\src\macro.jl:11
 [2] adapt(to::CUDA.CuArrayAdaptor{CUDA.Mem.DeviceBuffer}, x::BinaryCode{Float64, Vector{UInt64}})
   @ Adapt C:\Users\bakerar\.julia\packages\Adapt\LAQOx\src\Adapt.jl:40
 [3] #cu#193
   @ C:\Users\bakerar\.julia\packages\CUDA\DfvRa\src\array.jl:591 [inlined]
 [4] cu(xs::BinaryCode{Float64, Vector{UInt64}})
   @ CUDA C:\Users\bakerar\.julia\packages\CUDA\DfvRa\src\array.jl:591
 [5] top-level scope
   @ REPL[18]:1
BinaryCode(code::AbstractVector, s_chipLength::AbstractFloat) = BinaryCode(code,s_chipLength, (length(code))*s_chipLength)

I think this is where my knowledge is deficient. I had the constructor as a new function.
You took care of it outside without needing to specify the types. It just figured them out. I need to find the place to read up on this style as it seems to work, while my style does not.

This line is important:

I’ve defined an “outer” constructor, whereas you have an “inner” one. Defining an inner constructor will override the “default” constructor, which is the 3-arg BinaryCode(code, s_chipLength, s_totalLength). If you move that constructor outside of the struct definition and clean it up like so, things should just work:

# Type annotations and generics are not needed here, the default BinaryCode constructor can handle them
BinaryCode(code, s_chipLength) = BinaryCode(code,s_chipLength, (length(code))*s_chipLength)

I don’t understand your domain modelling well enough to say, but the parametric type does let you swap in a SArray seamlessly and Adapt will still work. So worth giving a try!

I’ve reworked things to follow your example (not Tuples). And converts to cu fine, but it doesn’t pass the isbitstype(BinaryCode) test. This causes the kernel to reject it.

The basics I am trying to do, is have a function that uses one of these objects passed into the map function. A Cu array representing time interacts with the parameters in these objects and additional math, but the vector themselves contained in the BinaryCode is not parallelized on the GPU in any other way. So I’m wondering if putting it into a StaticVector would solve this problem.

That’s odd, I thought it would be converted if you’re passing a BinaryCode to a kernel. Are you passing an array of them by chance?

Maybe my understanding is flawed, but do I need to convert a custom type if it is only auxiliary to the parallelization, like a parameter set?
I’ve embarked on converting it to a Tuple… but I can’t figure out to have a custom type for a specific type of type, but a variable number of elements.

struct dummy{ N<:Int, CT:<Unsigned}

This doesn’t dispatcy, because N is not a Type of Int, it is a Number.
Any idea what the syntax is? Can’t find it anywhere in docs or online.

I just left code as NTuple without additional qualifications…

struct dummy{ CT:<NTuple}

My dumbed down way still doesn’t work.

var needs to have either a concrete type annotation or a type parameter. If you don’t add the ::SomeType, it implicitly acts like ::Any (which is not isbits).

Note that for parametric types, you have to fill in the type params to make isbitstype report what you want:

julia> isbitstype(Tuple)

julia> isbitstype(Tuple{Int, Int})

It’s a little unexpected, but thankfully this rule holds for all parametric types.

But back to the topic at hand, I tried passing a BinaryCode to the following CUDA kernel function and it worked:

bc = BinaryCode(UInt[1, 0], 2.)
@show bc
@show cu(bc)

function gpu_add1!(bc)
    code = bc.code
    for i = 1:length(code)
        @inbounds code[i] += 1
    return nothing

@cuda gpu_add1!(bc |> cu)

Have you tried this yourself? If so, did you run into issues with it?

I think I have that BinaryCode working now. I had to go to something like this, but it passes the isbitstype check.

struct BinaryCode{T<:AbstractFloat,CT<:NTuple} <:PulseModulationStyle{T,CT}
	code::CT        # 1 is 0 phase, 0 is pi phase ## Is this the best representation?
	s_chipLength::T     # time to hold each chip 
	s_totalLength::T    # time to wrap calculated length

But there is one more parent struct that contains the PulseModulationStyle{T,CT} type, and I think that is stopping that parent struct from being compliant. And I’m wondering how to best fix that. I’m actually testing at this parent level with the GPU.

I deleted some of the content since I don’t think it is necessary for explanation.

const ValidWFType = AbstractFloat

struct Waveform{T<:ValidWFType,CT<:NTuple}  
    pulsed::Bool # If the waveform is pulsed then true, if CW then false (calculated)

Adapt.@adapt_structure Waveform

All of the otherfields are of type “T” in the structure.

I’m wondering if it can’t figure things out due to the Abstract PulseModulationStyle. type in the structure. I don’t have a away of adapting an empty structure.

Thanks for all of the help. I may try to step through isbits and see why it comes up with false.

This is also interesting.

julia>  wfcw64=Waveform(;Hz_BBRefFreq=3e9,Hz_Frequency=3e9+5e6);

julia> isbits(wfcw64)

julia> isbits(wfcw64.compression)