I have some sparse matrices and I thought using “MKLSparse.jl” could faster the execution, but it did not. However, in the documentation, there is this note
The integer type that should be used in order for MKL to be called is the same as used by the Julia BLAS library, see Base.USE_BLAS64.
I am not sure if it is because of this issue. If so, how to set it given that I am using vs code?
I’d assume yes. But it’s hard to say given that you haven’t provided much information. For example, it would be relevant to know which CPU you’re running on. (My CPU has probably more threads etc. than yours)
@carstenbauer Is there another issue to check the reason of the slow performance?
Is there a special functions that I can use in MKLSparse for muliplication rather than * or it is overwrite?
@stevengj @lmiq @DNF @jling
I am sorry if my mention is not proper. I just sucked at this for long time and I did know why I am not having the same fast speed as my colleague after using “MKLSparse”. I really appreciate any help from you. Thank you!
Based on just the benchmark itself without MKL it is likely the case that @carstenbauer has a better CPU, and consequently a CPU that better utilizes MKL functionality over your CPU. We won’t know for sure though unless he posts his CPU info.
From this thread, it seems that he gets a 20x speed up and you get a 4x speed.
The complicated answer I could give you is that you should confirm if you get similar MKL performance gains without using Julia’s MKL package and see if similar times are observed. This would at least rule out if using Julia’s MKL is the cause for a potential slowdown, if any.
The easy answer I can give you though is get a better CPU.
Excuse me, I did not get it. From the post above, I have similar results with carstenbauer when not importing using MKLSparse and slower with him when using it.
if your sparse matrix does not change for several iterations. The CSB data structure and code has been observed to be faster than MKL for many sparse matrix families.
Your source codes will require a minor modification or abstraction to process the dense matrices in column batches. I think we compiled it with up to 32 dense columns per call.
To be technical, you have slower results. If you had the same or above, the results would be alarming, but that is not the case. You should not compare with @carstenbauer 's results until you have his CPU information to compare with.
There is nothing you can do besides getting a better CPU or by trying to find a bug in MKLSparse.jl by comparing your results with MKLSparse.jl vs the Intel provided MKL Sparse routines.