I’m not sure what exactly you mean by “parallel on each column at the same time”, but it absolutely does make use of GPU parallelism (just like the cross entropy loss function in any other ML framework). The easiest way to find out if it works for your particular use case is just to run it with some GPU arrays—I think you’ll be pleasantly surprised with the results ![]()
1 Like