BitVector Adapt GPUs

Allan_Baker · August 29, 2022, 1:37pm

I have a custom type which contains a BitVector.
struct customType{T<:AbstractFloat}
code::BitVector
s_chipLength::T
…
end

I’m trying to use Adapt.@adapt_structure customType{Float64}

It still fails with isbitstype(customType{Float64}), where other customTypes without the BitVector do not.

I tried adding Adapt.@adapt_structure BitVector, but that doesn’t seem to help.

I imagine I need the adapt_storage on the BitVector, but being this is the first time I’ve tried to do this, I thought I’d ask for guidance.

What’s the best way to make this work for GPUs? Do BitVectors give any advantage, should I have just used a Vector{Bool} ? I kind of would like to know how to fix it though with Adapt even if that is the case.

Best Regards
Allan

maleadt · August 29, 2022, 1:59pm

BitArrays are not supported on the GPU, as they contain hard-coded Arrays. The structure would need to be generalized in Base first.

github.com

JuliaLang/julia/blob/91f068c5c219275f1115056084417057a66240b7/base/bitarray.jl#L25


      
          By default, Julia returns `BitArrays` from [broadcasting](@ref Broadcasting) operations
          that generate boolean elements (including dotted-comparisons like `.==`) as well as from
          the functions [`trues`](@ref) and [`falses`](@ref).
          
          
!!! note
              Due to its packed storage format, concurrent access to the elements of a `BitArray`
              where at least one of them is a write is not thread safe.
          
          
"""
          mutable struct BitArray{N} <: AbstractArray{Bool, N}
              chunks::Vector{UInt64}
              len::Int
              dims::NTuple{N,Int}
              function BitArray{N}(::UndefInitializer, dims::Vararg{Int,N}) where N
                  n = 1
                  i = 1
                  for d in dims
                      d >= 0 || throw(ArgumentError("dimension size must be ≥ 0, got $d for dimension $i"))
                      n *= d
                      i += 1
                  end

Allan_Baker · August 29, 2022, 2:01pm

So its probably better to do something like Vector{Int} or Vector{Uint8} is there a performance difference if all I’m doing is checking true and false on the array really?

maleadt · August 29, 2022, 2:02pm

You’ll want a parametric type so that the Vector can become a CuVector for GPU execution.
It probably won’t be slower, but obviously will consume much more memory.

Allan_Baker · August 29, 2022, 2:08pm

I’ll probably just leave it as a Vector unspecified and see if that solves my problem, and make it more restrictive if necessary.

Allan_Baker · August 29, 2022, 7:04pm

struct BinaryCode{T<:AbstractFloat,C<:UInt} <:PulseModulationStyle{T,C}
	code::Vector{C}        
	s_chipLength::T     
	s_totalLength::T    
	BinaryCode{T,C}(code::Vector{C}, s_chipLength::T) where {T,C} = new{T,C}(code,s_chipLength, (length(code))*s_chipLength)
end

import Adapt
Adapt.@adapt_structure BinaryCode

What’s the syntax to get Adapt.@adapt_structure BinaryCode to work? I’ve been beating my head against this for several hours now. I think I need to do something magical with the Vector. Not quite understanding the flow from the Docs. Thank you.

maleadt · August 29, 2022, 7:22pm

No, you can’t use a Vector. As I mentioned before, it needs to be a parametric type so that (1) you can create a version of your struct containing a CuVector (this is not automatic, as the CPU->GPU upload is costly and hence cannot be done automatically when launching a kernel) and (2) the @cuda macro can convert that object to one containing a CuDeviceVector that’s compatible with on-device execution. See the docs for an example, Using custom structs · CUDA.jl. If you can’t get it to work, please post a MWE (e.g. with simplified struct definitions and dummy kernel launches) so that people can help you out.

Allan_Baker · August 29, 2022, 9:57pm

That gets tricky. Can it be a Tuple or a StaticArray? It’s static, it won’t change after creation.

ToucheSir · August 29, 2022, 10:54pm

You already have a parametric type, so it’s mostly a matter of adding another type param :). This works locally for me:

# Note: I changed UInt to Unsigned here because UInt === UInt64 which is a concrete type.
# If you only care C == UInt, then you can remove the type param entirely.
# I also removed the <: PulseModulationStyle for local testing, but that shouldn't affect anything.
struct BinaryCode{T<:AbstractFloat,C<:Unsigned,V<:AbstractVector{C}} <: PulseModulationStyle{T,C}
	code::V
	s_chipLength::T
	s_totalLength::T
end

BinaryCode(code::AbstractVector, s_chipLength::AbstractFloat) = BinaryCode(code, s_chipLength, length(code)*s_chipLength)

using CUDA
using Adapt
Adapt.@adapt_structure BinaryCode

bc = BinaryCode(UInt[1, 0], 2.)
@show bc
@show cu(bc)

# Output
# bc = BinaryCode{Float64, UInt64, Vector{UInt64}}(UInt64[0x0000000000000001, 0x0000000000000000], 2.0, 4.0)
# cu(bc) = BinaryCode{Float64, UInt64, CuArray{UInt64, 1, CUDA.Mem.DeviceBuffer}}(UInt64[0x0000000000000001, 0x0000000000000000], 2.0, 4.0)

Allan_Baker · August 30, 2022, 11:09am

Thank you. I think I like this better. I don’t know why I was over-constraining the AbstractVector Type. Should this be replaced with a Tuple or a StaticArray so it doesn’t have to be a cuda array at all? In this case, the vector is only used for playback and the struct is not mutable and i don’t need the vector to be especially while it is on GPU.

I don’t know how a static would affect code generation as now I would expect I’m generating new code for every size from 0 to N that I might decided to put here. Would that play into type instability which GPUs don’t seem to care for?

Allan_Baker · August 30, 2022, 2:31pm

I still get an error:

module ...
export BinaryCode
struct BinaryCode{T<:AbstractFloat,V<:AbstractVector} <:PulseModulationStyle{T,V}
	code::V        # 1 is 0 phase, 0 is pi phase ## Is this the best representation?
	s_chipLength::T     # time to hold each chip 
	s_totalLength::T    # time to wrap calculated length
	BinaryCode{T,V}(code::V, s_chipLength::T) where {T,V} = new{T,V}(code,s_chipLength, (length(code))*s_chipLength)
end

Adapt.@adapt_structure BinaryCode

.... # stuff

end #module

#In Test
 bc=BinaryCode{Float64,Vector{UInt64}}(UInt64[1,0,1,0,1,0],1e-6)
cu(bc)

ERROR: MethodError: no method matching BinaryCode(::CuArray{UInt64, 1, CUDA.Mem.DeviceBuffer}, ::Float64, ::Float64)
Stacktrace:
 [1] adapt_structure(to::CUDA.CuArrayAdaptor{CUDA.Mem.DeviceBuffer}, obj::BinaryCode{Float64, Vector{UInt64}})
   @ Main C:\Users\bakerar\.julia\packages\Adapt\LAQOx\src\macro.jl:11
 [2] adapt(to::CUDA.CuArrayAdaptor{CUDA.Mem.DeviceBuffer}, x::BinaryCode{Float64, Vector{UInt64}})
   @ Adapt C:\Users\bakerar\.julia\packages\Adapt\LAQOx\src\Adapt.jl:40
 [3] #cu#193
   @ C:\Users\bakerar\.julia\packages\CUDA\DfvRa\src\array.jl:591 [inlined]
 [4] cu(xs::BinaryCode{Float64, Vector{UInt64}})
   @ CUDA C:\Users\bakerar\.julia\packages\CUDA\DfvRa\src\array.jl:591
 [5] top-level scope
   @ REPL[18]:1

Allan_Baker · August 30, 2022, 3:25pm

BinaryCode(code::AbstractVector, s_chipLength::AbstractFloat) = BinaryCode(code,s_chipLength, (length(code))*s_chipLength)

I think this is where my knowledge is deficient. I had the constructor as a new function.
You took care of it outside without needing to specify the types. It just figured them out. I need to find the place to read up on this style as it seems to work, while my style does not.

ToucheSir · August 30, 2022, 3:25pm

This line is important:

I’ve defined an “outer” constructor, whereas you have an “inner” one. Defining an inner constructor will override the “default” constructor, which is the 3-arg BinaryCode(code, s_chipLength, s_totalLength). If you move that constructor outside of the struct definition and clean it up like so, things should just work:

# Type annotations and generics are not needed here, the default BinaryCode constructor can handle them
BinaryCode(code, s_chipLength) = BinaryCode(code,s_chipLength, (length(code))*s_chipLength)

I don’t understand your domain modelling well enough to say, but the parametric type does let you swap in a SArray seamlessly and Adapt will still work. So worth giving a try!

Allan_Baker · August 30, 2022, 5:51pm

I’ve reworked things to follow your example (not Tuples). And converts to cu fine, but it doesn’t pass the isbitstype(BinaryCode) test. This causes the kernel to reject it.

The basics I am trying to do, is have a function that uses one of these objects passed into the map function. A Cu array representing time interacts with the parameters in these objects and additional math, but the vector themselves contained in the BinaryCode is not parallelized on the GPU in any other way. So I’m wondering if putting it into a StaticVector would solve this problem.

ToucheSir · August 30, 2022, 7:14pm

That’s odd, I thought it would be converted if you’re passing a BinaryCode to a kernel. Are you passing an array of them by chance?

Allan_Baker · August 30, 2022, 7:50pm

Maybe my understanding is flawed, but do I need to convert a custom type if it is only auxiliary to the parallelization, like a parameter set?
I’ve embarked on converting it to a Tuple… but I can’t figure out to have a custom type for a specific type of type, but a variable number of elements.

struct dummy{ N<:Int, CT:<Unsigned}
     code::NTuple{N,CT}
     var
end

This doesn’t dispatcy, because N is not a Type of Int, it is a Number.
Any idea what the syntax is? Can’t find it anywhere in docs or online.

I just left code as NTuple without additional qualifications…

struct dummy{ CT:<NTuple}
     code::CT
     var
end

Allan_Baker · August 30, 2022, 7:54pm

My dumbed down way still doesn’t work.

ToucheSir · August 30, 2022, 11:26pm

var needs to have either a concrete type annotation or a type parameter. If you don’t add the ::SomeType, it implicitly acts like ::Any (which is not isbits).

Note that for parametric types, you have to fill in the type params to make isbitstype report what you want:

julia> isbitstype(Tuple)
false

julia> isbitstype(Tuple{Int, Int})
true

It’s a little unexpected, but thankfully this rule holds for all parametric types.

But back to the topic at hand, I tried passing a BinaryCode to the following CUDA kernel function and it worked:

...
bc = BinaryCode(UInt[1, 0], 2.)
@show bc
@show cu(bc)

function gpu_add1!(bc)
    code = bc.code
    for i = 1:length(code)
        @inbounds code[i] += 1
    end
    return nothing
end

@cuda gpu_add1!(bc |> cu)

Have you tried this yourself? If so, did you run into issues with it?

Allan_Baker · August 31, 2022, 12:21pm

I think I have that BinaryCode working now. I had to go to something like this, but it passes the isbitstype check.

struct BinaryCode{T<:AbstractFloat,CT<:NTuple} <:PulseModulationStyle{T,CT}
	code::CT        # 1 is 0 phase, 0 is pi phase ## Is this the best representation?
	s_chipLength::T     # time to hold each chip 
	s_totalLength::T    # time to wrap calculated length
end

But there is one more parent struct that contains the PulseModulationStyle{T,CT} type, and I think that is stopping that parent struct from being compliant. And I’m wondering how to best fix that. I’m actually testing at this parent level with the GPU.

I deleted some of the content since I don’t think it is necessary for explanation.

const ValidWFType = AbstractFloat

struct Waveform{T<:ValidWFType,CT<:NTuple}  
	Hz_Frequency::T 
    pulsed::Bool # If the waveform is pulsed then true, if CW then false (calculated)
        ...
	compression::PulseModulationStyle{T,CT} 
end


Adapt.@adapt_structure Waveform

All of the otherfields are of type “T” in the structure.

I’m wondering if it can’t figure things out due to the Abstract PulseModulationStyle. type in the structure. I don’t have a away of adapting an empty structure.

Thanks for all of the help. I may try to step through isbits and see why it comes up with false.

Allan_Baker · August 31, 2022, 12:27pm

This is also interesting.

julia>  wfcw64=Waveform(;Hz_BBRefFreq=3e9,Hz_Frequency=3e9+5e6);

julia> isbits(wfcw64)
false

julia> isbits(wfcw64.compression)
true

Topic		Replies	Views
Need a basic example on using custom structs in CUDA.jl with Adapt.jl GPU cuda , struct , adapt	2	248	August 31, 2024
Dreaded CuArray only supports element types that are stored inline GPU	10	1243	September 22, 2022
Passing mutable struct to kernel GPU	7	2114	January 29, 2023
Passing custom struct into kernel function GPU	1	297	February 4, 2023
Broadcasting a function on GPU GPU	3	769	October 14, 2021

BitVector Adapt GPUs

Related topics