Compute intensive function rewrite

I have been using a package in R that really only has one main function which does a lot behind the scenes of which a portion is prohibitively slow to the point where the timescales involved are far too long to ever achieve an output on the size of data I need to run it with. So, I want to use this as an opportunity to learn and I’m hopeful the community is willing to guide me in my inexperience. My plan is as follows:

  1. Profile the existing code to ascertain exactly what is taking the most time (currently narrowed down to two functions which I am now going through)
  2. Write the identified process above in plain language ignoring objects/structures that were used to implement it in R but focusing on what is actually being executed
  3. Find existing code written in Julia that is closest to what I need to carry out based on outcome of steps 1 and 2
  4. Modify the existing code in small steps using tests to ensure I am getting the expected results until the process described in 2 is achieved. Use the same input data in R and the rewrite to ensure same output.
  5. Investigate whether the types initially used in Julia are able to be optimised for serial execution: adjust accordingly
  6. Implement in parallel on CPU
  7. Any hope of executing on GPU?
    From those of you who have done this and more before, is this an advisable way to proceed?

If you’re willing to specify what the R package is doing or even just its domain, there are lots of folks here who can help you track down related and/or similar Julia code!

Yes certainly, I was just hoping to pinpoint the exact slow code out of respect for everyone’s time. It’s using convex analysis, crudely as follows:
data → perspective projection → outliers removed → clustering (kmeans) → quickhull → identify points around vertices as cluster centres.
And the slow culprit/s is/are somewhere in the final three steps.