String Distance Calculations on GPU

mthelm85 · March 19, 2021, 3:32pm

How difficult would it be to do this in Julia?

It looks like it might be possible to do this with PyTorch:

And here are some papers on the topic:

https://www.researchgate.net/publication/300042590_Using_GPUs_to_Speed-Up_Levenshtein_Edit_Distance_Computation

I currently use StringDistances.jl but I occasionally need to do comparisons across really large data sets so I’ve started looking into the possibility of accelerating these calculations on a GPU.

Has anyone else explored this? I’m not sure I would know where to start to be honest but I would certainly be willing to invest some time into this.

Is it just a matter of finding a CUDA implementation and then writing a Julia wrapper around it?

Soren · February 13, 2022, 3:20pm

Hi!

Did you manage to find a good solution? I’m working on a similar problem.

Soren

mthelm85 · February 13, 2022, 3:33pm

Unfortunately I did not achieve a GPU-based solution. The solution I settled on involved building a vantage point tree with one data set (VPTrees.jl) and then looping through the second data set in parallel to search for matches. This has worked well enough for me so I haven’t explored any alternatives. I would still be very interested in working on a GPU solution though…

ToucheSir · February 13, 2022, 9:16pm

It seems the most straightforward (but not easy) approach would be to port the kernels in https://github.com/1ytic/pytorch-edit-distance/blob/master/edit-distance.cu to CUDA.jl.

maxfreu · February 17, 2022, 3:38pm

Actually it’s not that hard to translate the stuff. The good thing is that in julia you just need half the code. Just go through it line by line, remove the type annotations, edit the indexing appropriately to account for 0/1 based indexing and take care to get the block/thread indices right. Then go ahead and adjust the data layout for column major, which means that you have to reverse the indexing order. After you have the first pass running, you can adapt one based indexing and try to make it more “julian”.

Topic		Replies	Views
Is there a GPU optimized `Distances` package? General Usage	4	1009	September 12, 2020
Thinking through distance matrix calculation GPU gpu , multithreading	5	2756	May 16, 2017
How to make LoopVectorization work with computing Euclidean distance for every column? New to Julia tullio , loopvectorization	13	1119	August 21, 2020
Improving an algorithm that compute gps distances Performance tullio	41	1947	April 27, 2020
Fast tile search GPU	6	533	November 11, 2022

String Distance Calculations on GPU

Related topics