Error in Cuda function : ERROR: LoadError: CuError(1, nothing)

That’s now a minimal example. Please see Please read: make it easier to help you

Debugging an issue like this first requires getting rid of as much as code as possible while preserving the error. Doing so would reduce your last kernel to SegmentsOnElement*segmentlength, a multiplication if a device array and a scalar. That’s not supported on the GPU (requires allocating a new array).