On solving sets of linear equations in Fortran vs Julia

Hi, I’d like to translate a Fortran 90 program to Julia and then continue with my development in Julia. The bottleneck right now seem to be the solving of around 100 systems of linear equations where the matrix (~400x400 and complex valued) is general.

In Fortran, I’m using the ZGELSS routine from LAPACK (with ifort and mkl) and everything seems to be alright. After that, the computation continues towards the end goal and I am getting a meaningful result (the easiest result I can get are some differential cross section plots which are continuous).

Related to this, I’ve also been trying to call ZGELSS from MKL via ccall and failed (see this other topic I opened: Calling Intel MKL as an external library on Mac)

In Julia, I have very slightly different systems of linear equations, and, after trying many ways of solving the linear equations, I couldn’t get a meaningful result (the plots were always very discontinuous).

Trying to troubleshoot this, I copied a couple of matrices (the first two are 21x21 and 61x61) from gdb (while debugging the Fortran program) to Julia format. I then tried to solve them with various Julia methods and I saw that the solutions are different than the Fortran ones.
I’ve been computing A * x - b and looking at how close to 0 they were and the solutions from Fortran seemed to be weird. For the 21x21 system the last element of the A * x - b was ≈e-6 and for the 61x61 system, there were predominantly elements ≈e-6 and even some ≈e-4. In contrast, for the Julia solutions, all of the elements were ≈e-19.

here’s the code to get these: 21x21 system and 61x61 system

A * x - b for the 21x21 system

[-8.41193052129676e-20 - 1.7618285302889447e-19im; 
 6.021949305641171e-19 - 5.590417451878382e-20im; 
 3.1739912720394736e-19 - 6.776263578034403e-20im; 
 2.030165921001147e-19 + 3.6591823321385775e-19im; 
 6.576416742041591e-20 - 5.421010862427522e-20im; 
 -1.6568096797192123e-19 - 8.131516293641283e-20im; 
 -1.1265538198482195e-19 + 1.0570971181733668e-18im; 
 -6.775469484646352e-19 + 7.453889935837843e-19im; 
 -7.5174174068819155e-19 - 1.3552527156068805e-20im; 
 3.586125740437894e-19 - 2.574980159653073e-19im; 
 1.971469184734384e-18 + 4.912791094074942e-20im; 
 5.22036993304666e-19 - 9.215718466126788e-19im; 
 -4.2118713302220084e-19 - 1.9651164376299768e-19im; 
 1.8550021544869177e-19 + 7.860465750519907e-19im; 
 -2.8079142201480056e-19 + 1.1519648082658485e-19im; 
 -1.2475736521871776e-18 + 2.727446090158847e-19im; 
 -4.1504614415460717e-20 + 3.6422416731934915e-19im; 
 -1.6040686438628313e-19 - 6.2002811739014785e-19im; 
 2.286988957586611e-20 - 6.005463596032989e-19im; 
 -1.0942606887341493e-19 - 4.0149361699853836e-19im; 
 -6.604427503011574e-6 + 6.604427503011174e-6im]

A * x - b for the 61x61 system

[9.760892516039532e-5 - 4.100798101845093e-6im; 
 -2.730647765114168e-20 - 1.571678266618155e-20im; 
 6.400431078830215e-5 - 3.805928084635793e-6im; 
 -3.756187174301844e-6 + 5.638383581659189e-6im; 
 -5.342260391612202e-20 + 1.7658319322163377e-20im; 
 -2.438728221806456e-5 - 6.025413522853987e-6im; 
 -1.3821389038091633e-5 + 8.612209815336237e-6im; 
 -2.398256312587934e-19 + 1.013825737990661e-19im; 
 4.1161593406279174e-5 + 9.141185893385266e-6im; 
 -1.0274252913931014e-5 - 1.0543996746264763e-5im; 
 -3.3539866302940466e-6 + 1.1066685915407524e-6im; 
 0.00010069331314181374 - 8.424775358141123e-21im; 
 -1.8432641825977152e-6 + 1.9122637919002016e-5im; 
 7.829953301362379e-6 + 2.957411639832971e-6im; 
 1.3820346824677052e-5 + 9.304126363287537e-20im; 
 5.119050267405865e-6 + 1.002847074080797e-5im; 
 9.555217621769329e-20 + 2.2136043405215362e-20im; 
 7.154835356177473e-5 + 1.124904278191871e-5im; 
 -6.495542158257637e-7 - 6.597301609668962e-6im; 
 -3.0183157578901532e-5 - 2.245133389592671e-5im; 
 -0.0001472892965390413 + 4.473295925708644e-19im; 
 5.73771726031555e-6 + 3.4225865943845064e-5im; 
 -1.2946684480996267e-19 - 3.854230380851119e-19im; 
 -0.00016055757130969993 - 1.87858585318712e-6im; 
 1.6468286127333356e-5 + 1.5551326830996722e-5im; 
 3.566446962470552e-5 + 4.730876177151483e-6im; 
 -0.0003120660188802475 - 5.685219811429178e-20im; 
 4.99975369828618e-6 + 5.945819516277557e-6im; 
 9.574364267929141e-6 + 6.952092670467297e-7im; 
 0.0001976285845226872 - 3.096925363013367e-20im; 
 1.590357534547442e-5 + 5.943175664586429e-6im; 
 8.319340956772255e-6 - 2.3804324829714663e-6im; 
 0.00023714506325843435 + 3.211160373721208e-5im; 
 3.2669298156510517e-5 + 2.179302362121413e-6im; 
 -6.487695623290713e-19 + 1.5985781545548807e-20im; 
 -0.00015523211767915195 + 1.9044901694239245e-19im; 
 -4.452933517091035e-5 + 1.0297354528050546e-5im; 
 -3.018943989950913e-18 - 2.407119728549568e-19im; 
 0.00010356311401058251 + 1.1135757924987326e-5im; 
 -1.27620363060983e-5 + 1.3228694198667444e-5im; 
 -6.33205758091314e-21 + 2.515222138802148e-19im; 
 -3.0092554624420232e-5 - 1.6746107095485413e-6im; 
 8.919079144532807e-19 - 1.963827789169692e-19im; 
 -1.9880646201407765e-5 + 2.1287508248804676e-6im; 
 -0.00015814947471110417 + 3.710669092613512e-6im; 
 -0.00013754026748623695 + 3.7235210514262154e-8im; 
 -6.419798592577473e-5 - 1.138124089567215e-5im; 
 7.043353985198681e-5 - 3.450212697203164e-20im; 
 -4.62290385179595e-5 - 1.579528329359978e-5im; 
 -7.705308154600581e-7 - 9.43115835040066e-6im; 
 2.886030190722491e-5 + 1.6078917622759995e-19im; 
 5.00536886289637e-6 - 7.031114175103631e-6im; 
 5.065176850986559e-5 + 2.1900483801697587e-5im; 
 -3.767142276049058e-5 + 3.1906763381359277e-19im; 
 -5.5691324017020326e-5 - 3.281499493800018e-5im; 
 -0.00019680591760458494 + 2.2159419005093834e-6im; 
 1.776298006268921e-5 + 6.98001345549784e-19im; 
 -0.0002201167291087681 + 9.764302999792899e-5im; 
 -0.00012155856835381963 - 0.00012906230166654346im; 
 2.187084283257905e-6 - 4.6928553719359486e-18im; 
 0.00018884657326180516 + 0.00022475453546830226im]

At this point I started looking at the floating point standards used, but they seemed to be the same. I then looked into the stability of linear systems and the Fortran matrices that I copied from Fortran to Julia seem to have condition numbers ≈42.4 and 1.000000000000013 (this cannot be right?).
(by the way, the linear equations I was trying to solve in Julia had condition numbers between 1 and 10)

I’d like to figure out the source of the difference between the solutions of the linear systems before I move on to do anything else and I don’t know how to troubleshoot this anymore. Any input is appreciated.

Also some questions I’d like to find the answer to:
What’s the best way to quantify how good a solution is?
Is there a good way of moving large complex valued matrices from Fortran to Julia (other than writing Fortran code to output arrays that can be interpreted by Julia)? And is moving the systems the the best way to run the Julia solvers on them?

From what you describe it seems indeed that something wrong is happening on the FORTRAN side. However, to rule out that your making an error in the process of extracting data from FORTRAN to Julia I’d suggest to take a look at two options:

  • Compile your FORTRAN code into a shared object and directly call it from Julia with ccall to get the system matrices, the right-hand side and the solution from FORTRAN and then you can check bit by bit from Julia.
  • Perhaps a bit easier: Use an established file format. For MatrixMarket for example there is a FORTRAN implementation and a Julia one and it’s also pretty simple to work with. Maybe that helps you to get some insight?

For using MKL to solve your problem, take a look at MKL.jl … they do all the hard work of rerouting Julia’s default linear algebra operations to the MKL.

With respect to quantifying the quality of a solution, looking at the residual Ax - b as you describe is exactly what I would do.

2 Likes

Thanks for the information. I’m already using MKL.jl, but zgelss doesn’t seem to be implemented into Julia. The closest is gelsd! which I’ve already tried, but there’s no other way to call zgelss as far as i know.

The FORTRAN program uses relatively few functions so I can’t ccall the part I’m interested in.

I briefly tried looking up file formats, but I didn’t find anything. MatrixMarket looks good.

I am slightly confused what help you would like from the Julia forum. I think the best thing would be to figure out why your Julia program produces discontinuous plots. The Julia solver routines, including the calls into LAPACK, seem to working correctly.

Perhaps to transfer numbers more smoothly between Julia and Fortran you should use ccall to call directly into your Fortran routine.

To help move this along and make it more transparent about what is happening, I made some modifications to your 21x21 code so that I wouldn’t have to do a bunch of commenting and uncommenting by just using deepcopy everywhere to be safe.

Also I modified the output as such:

@printf("Σ|x_0| = %g\n",sum(abs.(x0)))
@printf("Σ|x_1| = %g\n",sum(abs.(x1)))
@printf("Σ|x_2| = %g\n",sum(abs.(x2)))
@printf("Σ|x_3| = %g\n",sum(abs.(x3)))
@printf("Σ|x_4| = %g\n",sum(abs.(x4)))
@printf("Σ|x_5| = %g\n",sum(abs.(x5)))

for i=1:21
  println(abs(x0[i]-x5[i]));
end
julia> include("21x21.jl")
Σ|x_0| = 9.34007e-06
Σ|x_1| = 6.80532e-19
Σ|x_2| = 2.99872e-18
Σ|x_3| = 5.97152e-19
Σ|x_4| = 1.97918e-19
Σ|x_5| = 7.5936e-19
1.8107496234338362e-19
6.214326071409102e-19
3.1071680867848654e-19
4.185954212237005e-19
9.354250834943223e-20
1.7618285302889447e-19
1.0369727249755868e-18
9.914084242815347e-19
7.589717713136635e-19
4.474386903525676e-19
1.9794114297145343e-18
1.0825916136606102e-18
4.642111499465803e-19
7.721670536909471e-19
2.9583648834882694e-19
1.205676349236012e-18
3.7738145890631644e-19
5.764605301733078e-19
5.611041877966055e-19
4.819371838949341e-19
9.340070946468615e-6

Thanks for the modification! Well, my first assumption was that the discontinuous plots are caused by the differences in the solutions of the linear systems (since there’s not much happening after solving the systems and it’s very unlikely that there are errors in that bit of code).
I didn’t know a good way to troubleshoot this, but I’ll try to use MatrixMarket today and, if successful, it will at least show whether the bit of code from the linear equations to the result is correct or not.

I’m really curious as to why the FORTRAN subroutine zgelss gives such poor solutions compared to the Julia methods. Since I can’t get a ccall to zgelss in Julia, I can’t really compare apples to apples.

Also I looked up ZEGLSS in LAPACK and ZEGLSD seems to be a newer direct replacement which you can call from Julia using gelsd!.

https://www.netlib.org/lapack/lug/node27.html

I suspect the calling convention between ZGELSD and ZGELSS might be similar so you might be able to adapt this code to call ZGELSS. Otherwise, I would just proceed with gelsd! in your Juilia port. If you used ZGELSD in your Fortran code, does it still work?

When I tried to change it to ZGELSD in Fortran, I got my favourite type of error ever - segmentation fault.

I’ll try to call ZGELSS as they do in lapack.jl and I’ll update here.

Note there is one argument difference: iwork

Yeah, I had taken that into account. Thanks for noticing, though!

I got ZGELSS working in Julia:

1 Like

Thanks for taking the time! I meant I had taken iwork into account when trying to use ZGELSD in FORTRAN.

Anyway, as I expected, the results are still different and I’m very interested why that could be.

To advance on this question, I think we need to now need the Minimum Working Example of your Fortran code along with compiler instructions to proceed.

3 Likes

I wrote a short script to transfer the matrices from the Fortran program to the Matrix Market format and then tested the zgelss! Julia implementation. It gives larger residues than gelsd!, but they are basically identical to to the real Fortran one.
I’ve checked the code up to the first proper result and that seems to be accurate (i.e. the plot is both continuous and correct). I now know for sure that I have to be more careful with how I create the linear systems.

There was a small error in the first matrix (a missing ‘im’ at the end of the last element) and for the second matrix was pretty much wrong altogether and getting it from gdb was such a tedious process I didn’t check for a second time.

Many thanks to both @mfh an @mkitti!

3 Likes

Sorry for “resurrecting” this question, but it seems that Julia is using double precision (aka Float64), which is its default, whereas your FORTRAN code seems to use single precision (as indicated by the terminating “S” and also that “just changing” to ZGELSD got you memory problems). Single precision vs double precision can indeed lead to very different truncation errors, which is probably what you’re seeing.

Looking at the code, it is indeed double precision in the Julia side: you declare our data to be ComplexF64. If you changed that to ComplexF32 (if that’s even possible) then you might get the same range of errors. Or you could change your FORTRAN code to use “double precision complex”.

@bernardofpc, I don’t think that’s the problem. The first letter shows the precision, not the last one. Z means Complex*16 which means 16 bytes (i.e. 2 x Float64).

You can see that both ZGELSS and ZGELSD are using Complex*16 inputs.