How difficult would it be to do this in Julia?
It looks like it might be possible to do this with PyTorch:
And here are some papers on the topic:
I currently use StringDistances.jl but I occasionally need to do comparisons across really large data sets so I’ve started looking into the possibility of accelerating these calculations on a GPU.
Has anyone else explored this? I’m not sure I would know where to start to be honest but I would certainly be willing to invest some time into this.
Is it just a matter of finding a CUDA implementation and then writing a Julia wrapper around it?
Did you manage to find a good solution? I’m working on a similar problem.
Unfortunately I did not achieve a GPU-based solution. The solution I settled on involved building a vantage point tree with one data set (VPTrees.jl) and then looping through the second data set in parallel to search for matches. This has worked well enough for me so I haven’t explored any alternatives. I would still be very interested in working on a GPU solution though…
It seems the most straightforward (but not easy) approach would be to port the kernels in pytorch-edit-distance/edit-distance.cu at master · 1ytic/pytorch-edit-distance · GitHub to CUDA.jl.
Actually it’s not that hard to translate the stuff. The good thing is that in julia you just need half the code. Just go through it line by line, remove the type annotations, edit the indexing appropriately to account for 0/1 based indexing and take care to get the block/thread indices right. Then go ahead and adjust the data layout for column major, which means that you have to reverse the indexing order. After you have the first pass running, you can adapt one based indexing and try to make it more “julian”.