My test was on CUDA 1.0, retrying on master I am getting same amount of allocation estimate. Maybe this needs to be looked into.
My test was on CUDA 1.0, retrying on master I am getting same amount of allocation estimate. Maybe this needs to be looked into.