how can i pick the absolute value of the number inside the matrix? if I use this tolerance even negative number are suppressed, but if I write abs(A) it throws an error.
Additionally: is droptol! considering just the absolute value or is he destroying negative numbers too?
Another question, just to push ths to its limit: is this going to overwrite existing data with the data itself, when the condition is met, or is it going to do nothing? instead of doing 2 loops to retrieve the index and say in which index I’m putting a 0, is there some already predefined syntax, just like the one you used, to explicitely say that in the case when abs.(A) > tol, it has to do nothing?
This is a bit difficult to explain but I couldn’t find a good explanation quickly so I will try my best. See modern CPUs don’t really start executing an instruction, wait for it to finish and then start executing the next one. Usually they overlap multiple instructions. Note that I am talking about a single CPU core - multiple cores ofc work in parallel as well but even a single core will overlap instructions. This is because many instructions take multiple cycles to complete (e.g. a float division can take upto ~20 clock cycles) so the core can do something else in the meantime. So even if you read some assembly code, you only get a vague idea how the CPU will actually execute the code because it will likely reorder instructions, execute them out of order, optimize loads/stores etc. in order to maximize throughput.
So why is skipping the store to the array sometimes likely slower? The first part is that a conditional is SLOW. Thw CPU has to wait for the result pf the comparison and then decide wether to store or not to store. Actually it won’t wait, it will try to predict the branch and just go ahead and do something until the result of the comparison arives. If the branch was predicted right, then everything is fine. If it was predict wrong, then the CPU has to undo its work and take the correct branch, which is SLOW. If you use ifelse (actually in most cases you could also use the ternary operator ?: because the compiler is smart - however you can’t broadcast it so ifelse needs to be used in our case) the compiler can simply compute both results and use a conditional move instruction cmov to select the correct result without a branch! This is great and always fast
It actually has another huge benefit in that it can be vectorized with special SIMD instructions. That means instead of operating on single numbers your CPU can operator on a whole set of number at once (I think upto 8 Float64 - how many depends on your CPU). This only works because we do the exactly same thing for every number (conditional moving is fine - branching destroys SIMD).
There might even be more that I don’t know of, or can’t think of right now. These all wrap KrylovKit in very nice ways, and many people have already spent quite some time optimizing these codes, and have many nice additional features.