Thank you very much!
Indeed, I was thinking of some more generic/native solutions such as just using for _ in 1:2 as you suggested, something similar with the link below,
Sorry for another stupid question, uhm, do you perhaps know why
www .= [My_w(1.0)]
will just make www[1] and www[2] all the same all the time? What is the logic?
The single operator
.=
seems acting on the stuff on the left?
It seems when using .= , the elements on the left hand side (LHS) and right hand side (RHS) need to match.
for
www .= [My_w(1.0)]
the LHS www has two elements, while RHS [My_w(1.0)] is just one element, therefore the two www elements both point to the same memory address as [My_w(1.0)] all the time?
While in the example you give, the elements of LHS and RHS are match.
Also, what does the symbol _ mean?
I am sorry for those sloppy lazy questions.
That is due to how mutable objects are represented in Julia (nothing new, the same is true for Python or Common Lisp, for example). It is a reference to some data stored in memory (typically on the heap, may be on stack but that is unimportant for the explanation). On broadcast, only the reference is copied multiple times, all copies pointing to the same data.
Not exactly, itās either match or the size of the RHS in some dimension is 1, in which case the rhs value is broadcasted over the lhs axis.
It means itās a dummy variable. Just a hint to a reader that the generator expression (or a loop body) doesnāt depend on the iteration variable. Any other name would be fine as well.
it was not mentioned because the OP insisted a pre-allocated www
, now, I donāt know if itās their need or just Fortran habit, but I didnāt want to guess their intention because last time we collectively failed at doing that by saying: donāt allocate long vector just make one when you actually need it.
Sorry, my bad for being so stubborn.
But I am glad and grateful that all of your guys still provided very useful suggestions and examples, and basically allowing me to find a way to write my Julia code basically the same as my stupid version of modern Fortran. Like a 1;1 translation. I cannot imagine other language can do such a thing.
The āglobalā variables or allocating array stuff in Fortran is slight different than those in Julia, though. The global variable in my Fortran code is actually module variables, it is only āglobalā within the module, and it does not impact performance. Though you may say this could be a somewhat bad habit. I got it. It just that sometimes, in the code, I know what I am doing, so I store them as module variables, and so I can always access them without having to put them in the arguments.
In the small piece of module below, eg, i calculate path integral propagation of the Argonne v6/v8, or ChEFT N2LO nucleon-nucleon interaction, for the central part of the interaction like f(r), I just tablute like 10^4 points of them before the calculation, store them as vtab(:, or something, When I need to really calcualte f(r) on the fly, I take take 4 relevant points and do Lagrange 4-point interpolation etc.
In this example, I feel it is not a very bad idea to store those tabulated points as some āglobalā array like vtab(:, within the module, since in Fortran, it does not influence the performance.
module v6stepcalc
implicit none
integer, private, parameter :: i4=selected_int_kind(9)
integer, private, parameter :: i8=selected_int_kind(15)
integer, private, parameter :: r8=selected_real_kind(15,9)
complex(kind=r8), private, parameter :: czero=(0.0_r8,0.0_r8)
complex(kind=r8), private, parameter :: ci=(0.0_r8,1.0_r8),forthci=(0.0_r8,-0.25_r8) ! forthci=1/(4i)
complex(kind=r8), private, parameter :: cone=(1.0_r8,0.0_r8)
integer(kind=i4), private, save :: npart,nprot,nneut,n2p
integer(kind=i4), private, save :: lpot,lpotpr
integer(kind=i4), private, save :: nisospin,nspin,nbasis,niso
integer(kind=i4), private, save :: mmax,nbisect,ntab,ntabrsmall,ntabrbig,ntablag4,nrepmax
real(kind=r8), private, save :: scalep,rangev,rangevsafe,rangevsmall,rangevbig,drlag4
real(kind=r8), private, save :: hbar,dt
complex(kind=r8), private, save, allocatable :: cwt(:,:),cwtgnd(:,:)
real(kind=r8), private, allocatable, save :: utab(:,:,:),vtab(:,:),vtabls(:,:),evdttab(:,:,:),vemtab(:) &
,utabpr(:,:,:),evdttabpr(:,:,:) &
,utabrsmall(:,:,:),utabrsmallpr(:,:,:),utabrbig(:,:,:),utabrbigpr(:,:,:) &
,vtabrsmall(:,:),vtabrbig(:,:) &
,laginterpc(:,:),v6taborigin(:,:)
real(kind=r8), private,save, dimension(6) :: u0tab
complex(kind=r8), private, save :: cls1 ! coefficient for LS term.
integer(kind=i4), private, allocatable, save :: ignd(:),invspin(:),invispin(:),liso(:),lisoi(:) &
,invcwt(:,:) ! invcwt given nspin,niso, give nstate accordingly.
logical, private, save :: iem
integer(kind=i4), private, save :: irep
! 0: n, down.
! 1: p, up.
! nisospin 1: 0011 =3
! 2: 0101 =5
! 3: 0110 =6
! 4: 1001 =9
! 5: 1010 =10
! 6: 1100 =12
! nspin 0:15
contains
subroutine v6stepcalcinit(npartin,nprotin,lpotin,dtin,hbarin,ntabin,rangevin,mmaxin,nrepmaxin,iemin,irepin)
use brut
use math
use v6pot
use mympi
real(kind=r8) :: dtin,hbarin,rangevin,dr,drc
integer(kind=i4) :: npartin,nprotin,mmaxin,ntabin,nrepmaxin,lpotin
real(kind=r8) :: v(6),e(6) ! v(6) should already absorbed the stime step dt.
integer(kind=i4) :: i,j,ibox,irepin
integer(kind=i4) :: myunit
real(kind=r8) :: vfact(8)
logical :: iemin
integer(kind=i4) :: iispin,ispin,nstate
irep=irepin
iem=iemin ! em force switch
npart=npartin
n2p=6*npart+1
nprot=nprotin
nneut=npart-nprot
nspin=2**npart ! n=4, it is 16.
call combin(npart,nprot,nisospin) ! give value for nisospin, n=4, it is 4C2= 6.
nbasis=nspin*nisospin ! n=4, it is 96.
allocate(invspin(nbasis))
allocate(invispin(nbasis))
allocate(cwt(0:nspin-1,nisospin))
allocate(invcwt(0:nspin-1,nisospin))
allocate(liso(nisospin))
allocate(lisoi(0:nspin-1))
allocate(cwtgnd(0:nspin-1,nisospin))
allocate(ignd(npart))
!do iispin=1,nisospin
! do ispin=0,nspin-1
! nstate=(iispin-1)*nspin+ispin+1 ! 1:96
! invcwt(ispin,iispin)=nstate
! invspin(nstate)=ispin ! 0:15
! invispin(nstate)=iispin ! 1:6
! enddo
!enddo
!call spinit(npartin,nprotin)
call getinvstarrays(nspin,nisospin,invcwt,invspin,invispin)
call spinittest(npart,nprot,nspin,nisospin,liso,lisoi,ignd,cwtgnd)
dt=dtin
hbar=hbarin
ntab=ntabin
rangev=rangevin
mmax=mmaxin
nrepmax=nrepmaxin
nbisect=2**mmax
dr=rangev/ntab
scalep=1/dr
! make a table for u for different r and dt. time step=2**j dt.
!write(6,*) 'u table start!'
vfact=1
ibox=0
lpot=lpotin ! 3 for v6
lpotpr=lpot
ntabrsmall=5000 ! small v range table
ntabrbig=5000 ! big v range table
rangevsmall=2*dr
rangevbig=4*rangev
call v6potinit(npart,ibox,-rangev,lpot,ntab,vfact,lpotpr,iem &
,ntabrsmall,ntabrbig,rangevsmall,rangevbig)
if (.not.allocated(vtab)) allocate(vtab(6,0:ntab))
if (.not.allocated(vtabls)) allocate(vtabls(2,0:ntab))
if (.not.allocated(v6taborigin)) allocate(v6taborigin(6,0:ntab))
if (.not.allocated(vtabrsmall)) allocate(vtabrsmall(6,0:ntabrsmall))
if (.not.allocated(vtabrbig)) allocate(vtabrbig(6,0:ntabrbig))
call getvtab(v6taborigin)
call getvtabschmidt(vtab,vtabrsmall,vtabrbig) ! vtab is v6 schmidt order, this is what we use in the code.
call getvtabls(vtabls)
if (iem) then
allocate(vemtab(0:ntab))
call getvemtab(vemtab)
endif
!vtab=0.
!vtab(6,:)=1.
!write(6,*) 'get v table done!'
if (.not.allocated(utab)) allocate(utab(6,0:ntab,-mmax:mmax))
if (.not.allocated(utabpr)) allocate(utabpr(6,0:ntab,-mmax:mmax)) ! utabpr deal with +delta t
!if (.not.allocated(evdttab)) allocate(evdttab(6,0:ntab,-mmax:mmax))
!if (.not.allocated(evdttabpr)) allocate(evdttabpr(6,0:ntab,-mmax:mmax))
if (.not.allocated(utabrsmall)) allocate(utabrsmall(6,0:ntabrsmall,-mmax:mmax))
if (.not.allocated(utabrsmallpr)) allocate(utabrsmallpr(6,0:ntabrsmall,-mmax:mmax)) ! utabpr deal with +delta t
if (.not.allocated(utabrbig)) allocate(utabrbig(6,0:ntabrbig,-mmax:mmax))
if (.not.allocated(utabrbigpr)) allocate(utabrbigpr(6,0:ntabrbig,-mmax:mmax)) ! utabpr deal with +delta t
do j=-mmax,mmax ! dt=2**j
do i=0,ntab
! we use Schmidt order, note that it is different from the order in v18pot. so need to change it.
v(:)=vtab(:,i)*dt*2.0_r8**j
e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t
e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau
!evdttab(:,i,j)=e(:)
utab(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
utab(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
utab(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
utab(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
utab(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
utab(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24
v(:)=vtab(:,i)*(-dt)*2.0_r8**j
e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t
e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau
!evdttabpr(:,i,j)=e(:)
utabpr(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
utabpr(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
utabpr(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
utabpr(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
utabpr(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
utabpr(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24
enddo
do i=0,ntabrbig
! we use Schmidt order, note that it is different from the order in v18pot. so need to change it.
v(:)=vtabrbig(:,i)*dt*2.0_r8**j
e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t
e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau
!evdttab(:,i,j)=e(:)
utabrbig(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
utabrbig(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
utabrbig(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
utabrbig(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
utabrbig(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
utabrbig(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24
v(:)=vtabrbig(:,i)*(-dt)*2.0_r8**j
e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t
e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau
!evdttabpr(:,i,j)=e(:)
utabrbigpr(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
utabrbigpr(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
utabrbigpr(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
utabrbigpr(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
utabrbigpr(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
utabrbigpr(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24
enddo
enddo
u0tab(1)=1
u0tab(2:6)=0
! calculate lagrange c table. however this does not have any performance increase.
ntablag4=10000
if (.not.allocated(laginterpc)) allocate(laginterpc(4,0:ntablag4))
drlag4=1.0_r8/ntablag4
do i=0,ntablag4
drc=i*drlag4
laginterpc(1,i)=-drc*(drc-1)*(drc-2)/6
laginterpc(2,i)=(drc+1)*(drc-1)*(drc-2)/2
laginterpc(3,i)=-(drc+1)*drc*(drc-2)/2
laginterpc(4,i)=(drc+1)*drc*(drc-1)/6
enddo
if (myrank().eq.0) then
open(unit=9,form='formatted',file='v6tableorigin.dat')
rewind 9
do i=0,ntab
write(9,'(7(e15.7,2x))') i*dr,v6taborigin(:,i)
enddo
close(9)
open(unit=9,form='formatted',file='vlstableorigin.dat')
rewind 9
do i=0,ntab
write(9,'(7(e15.7,2x))') i*dr,vtabls(:,i)
enddo
close(9)
open(newunit=myunit,form='formatted',file='vtableSchmidt.dat')
rewind myunit
do i=0,ntab
write(myunit,'(7(e15.7,2x))') i*dr,vtab(:,i)
enddo
close(myunit)
open(newunit=myunit,form='formatted',file='vrsmalltableSchmidt.dat')
rewind myunit
do i=0,ntabrsmall
write(myunit,'(6(e15.7,2x))') vtabrsmall(:,i)
enddo
close(myunit)
return
end subroutine
subroutine opexsigma(i,j,nstate,nstatenew) ! spin exchange, I think this is the fastest.
integer(kind=i4) :: i,j,nstate,nstatenew,spn,newspn
spn=invspin(nstate) ! spn, 0:15.
!iv=ibits(spn,i-1,1)
!jv=ibits(spn,j-1,1)
!iv=shiftr(and(spn,shiftl(1,i-1)),i-1)
!jv=shiftr(and(spn,shiftl(1,j-1)),j-1)
!newspn=spn+(jv-iv)*2**(i-1)+(iv-jv)*2**(j-1)
newspn=spn
call mvbits(spn,i-1,1,newspn,j-1)
call mvbits(spn,j-1,1,newspn,i-1)
nstatenew=invcwt(newspn,invispin(nstate))
return
end subroutine opexsigma
subroutine opextau(i,j,nstate,nstatenew) ! tau exchange
integer(kind=i4) :: i,j,nstate,nstatenew,ispn,newispn
ispn=liso(invispin(nstate))
!iv=ibits(ispn,i-1,1)
!jv=ibits(ispn,j-1,1)
!iv=shiftr(and(ispn,shiftl(1,i-1)),i-1)
!jv=shiftr(and(ispn,shiftl(1,j-1)),j-1)
!newispn=ispn+(jv-iv)*2**(i-1)+(iv-jv)*2**(j-1)
newispn=ispn
call mvbits(ispn,i-1,1,newispn,j-1)
call mvbits(ispn,j-1,1,newispn,i-1)
nstatenew=invcwt(invspin(nstate),lisoi(newispn))
return
end subroutine opextau
subroutine opexsigmatau(i,j,nstate,nstatenew) ! sigma, tau exchange at the same time.
integer(kind=i4) :: i,j,nstate,nstatenew,ispn,newispn,spn,newspn
spn=invspin(nstate) ! spn, 0:15.
newspn=spn
call mvbits(spn,i-1,1,newspn,j-1)
call mvbits(spn,j-1,1,newspn,i-1)
ispn=liso(invispin(nstate))
newispn=ispn
call mvbits(ispn,i-1,1,newispn,j-1)
call mvbits(ispn,j-1,1,newispn,i-1)
nstatenew=invcwt(newspn,lisoi(newispn))
return
end subroutine opexsigmatau
subroutine v6prop1(x,i,j,u,cwtold,cwtnew) ! optimized version 20190703, 30 times faster than old verison.
integer(kind=i4) :: i,j,nstate,nstatenew,nstatenew1,nstatenew2
real(kind=r8) :: x(3,npart),r(3),rsq(3),dx(3),r3r1
real(kind=r8) :: u(6) ! v6 so 6, coefficients after conversion. check if real or complex, I think real.
complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
,fceff,fsex,fiex,fsiex,fteff,fteffiex,f(6) &
,cir2,cir1,c(4),cfteff(4),cfteffiex(4),r2cir1,r3cir2
integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij
if (irep.eq.4) then
cwtnew(:,:)=cwtold(:,:)
return
endif
dx(:)=x(:,i)-x(:,j)
r(:)=dx(:)/sqrt(dot_product(dx,dx)) !unit vector
rsq(:)=r(:)**2
r3r1=r(3)*r(1)
cir1=ci*r(1)
cir2=ci*r(2)
r2cir1=r(2)*cir1
r3cir2=r(3)*cir2
cwtnew(:,:)=0
do nstate=1,nbasis
f(:)=u(:)*cwtold(invspin(nstate),invispin(nstate))
fceff=f(1)-f(2)-f(4)+f(5)+f(3)-f(6)
fsex=2*(f(2)-f(5)-f(3)+f(6))
fiex=2*(f(4)-f(5)+f(6))
fsiex=4*(f(5)-f(6))
fteff=3*(f(3)-f(6))
fteffiex=6*f(6)
! fceff,op1
cwtnew(invspin(nstate),invispin(nstate))=cwtnew(invspin(nstate),invispin(nstate))+fceff
! fsex,opexsigma
call opexsigma(i,j,nstate,nstatenew1)
cwtnew(invspin(nstatenew1),invispin(nstatenew1))=cwtnew(invspin(nstatenew1),invispin(nstatenew1))+fsex
! fiex,opextau
call opextau(i,j,nstate,nstatenew2)
cwtnew(invspin(nstatenew2),invispin(nstatenew2))=cwtnew(invspin(nstatenew2),invispin(nstatenew2))+fiex
! fsiex
call opextau(i,j,nstatenew1,nstatenew)
cwtnew(invspin(nstatenew),invispin(nstatenew))=cwtnew(invspin(nstatenew),invispin(nstatenew))+fsiex
! fteff,op3
!call op3pure(r,i,j,nstate,fteff,cwtnew)
spn=invspin(nstate) ! spn, 0:15.
ispn=invispin(nstate)
signi=2*ibits(spn,i-1,1)-1
signj=2*ibits(spn,j-1,1)-1
newspni=xor(spn,shiftl(1,i-1))
newspnj=xor(spn,shiftl(1,j-1))
newspn=xor(newspni,shiftl(1,j-1))
signij=signi*signj
!c(1)=rsq(1)-r(2)*(signij*r(2)-(signi+signj)*cir1)
c(1)=rsq(1)-rsq(2)*signij+(signi+signj)*r2cir1
!c(2)=signj*r(3)*(r(1)+signi*cir2)
c(2)=signj*r3r1+signij*r3cir2
!c(3)=signi*r(3)*(r(1)+signj*cir2)
c(3)=signi*r3r1+signij*r3cir2
c(4)=signij*rsq(3)
cfteff(:)=fteff*c(:)
cfteffiex(:)=fteffiex*c(:)
cwtnew(newspn,ispn)=cfteff(1)+cwtnew(newspn,ispn)
cwtnew(newspni,ispn)=cfteff(2)+cwtnew(newspni,ispn)
cwtnew(newspnj,ispn)=cfteff(3)+cwtnew(newspnj,ispn)
cwtnew(spn,ispn)=cfteff(4)+cwtnew(spn,ispn)
! fteffiex,opextau,op3
!call opextau(i,j,nstate,nstatenew)
!call op3pure(r,i,j,nstatenew2,fteffiex,cwtnew)
ispn=invispin(nstatenew2)
!spn=invspin(nstatenew2) ! spn, 0:15.
!signi=2*ibits(spn,i-1,1)-1
!signj=2*ibits(spn,j-1,1)-1
!newspni=xor(spn,shiftl(1,i-1))
!newspnj=xor(spn,shiftl(1,j-1))
!newspn=xor(newspni,shiftl(1,j-1))
cwtnew(newspn,ispn)=cfteffiex(1)+cwtnew(newspn,ispn)
cwtnew(newspni,ispn)=cfteffiex(2)+cwtnew(newspni,ispn)
cwtnew(newspnj,ispn)=cfteffiex(3)+cwtnew(newspnj,ispn)
cwtnew(spn,ispn)=cfteffiex(4)+cwtnew(spn,ispn)
enddo
return
end subroutine v6prop1
subroutine vemprop1(expvemcdt,i,j,cwtold,cwtnew)
integer(kind=i4) :: i,j,l,it,lspin,ipi,ipj,index
real(kind=r8) :: expvemcdt
complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin)
cwtnew(:,:)=cwtold(:,:)
if (irep.eq.4) return
do l=1,nisospin
lspin=liso(l)
ipi=and(1,shiftr(lspin,i-1)) !(2*(and(1,shiftr(lspin,i-1))))/2
ipj=and(1,shiftr(lspin,j-1)) !(2*(and(1,shiftr(lspin,j-1))))/2
if ((ipi*ipj).eq.1) then
cwtnew(:,l)=cwtold(:,l)*expvemcdt
endif
enddo
return
end subroutine vemprop1
subroutine vlsprop1(x0,x1,i,j,vls,sign,cwtold,cwtnew)
integer(kind=i4) :: i,j,nstate,nstatenew,nstatenew1,nstatenew2,sign
real(kind=r8) :: x0(3,npart),x1(3,npart),rij(3),dr01i(3),dr01j(3),dr01ij(3)
real(kind=r8) :: vls
complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
,fx,fyi,fz,fold,cls1vls
integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij
if (irep.eq.4) then
cwtnew(:,:)=cwtold(:,:)
return
endif
rij(:)=x0(:,i)-x0(:,j)
dr01i(:)=x0(:,i)-x1(:,i)
dr01j(:)=x0(:,j)-x1(:,j)
dr01ij(:)=dr01i(:)-dr01j(:)
cls1vls=cls1*vls*sign
fx=(rij(2)*dr01ij(3)-rij(3)*dr01ij(2))*cls1vls
fyi=-(rij(1)*dr01ij(3)-rij(3)*dr01ij(1))*cls1vls*ci
fz=(rij(1)*dr01ij(2)-rij(2)*dr01ij(1))*cls1vls
cwtnew(:,:)=0
do nstate=1,nbasis
spn=invspin(nstate) ! spn, 0:15.
ispn=invispin(nstate)
fold=cwtold(spn,ispn)
signi=2*ibits(spn,i-1,1)-1
signj=2*ibits(spn,j-1,1)-1
newspni=xor(spn,shiftl(1,i-1)) ! flip spin i
newspnj=xor(spn,shiftl(1,j-1)) ! flip spin j
cwtnew(newspni,ispn)=cwtnew(newspni,ispn)+(fx+signi*fyi)*fold
cwtnew(newspnj,ispn)=cwtnew(newspnj,ispn)+(fx+signj*fyi)*fold
cwtnew(spn,ispn)=cwtnew(spn,ispn)+fz*(signi+signj)*fold
enddo
return
end subroutine vlsprop1
subroutine vlsprop0(spinorbit,sign,x0,x1,cwtold,cwtnew) ! without no LS*t1*t2. old version and works. 2020/5/20
! only LS for now. no LS*t1*t2. Linear terms. If cwtold and cwtl2r, then sign=-1. If cwtr2l, sign=1.
use cheft
integer(kind=i4) :: i,j,it,index,indexlag4,sign,nstate
real(kind=r8) :: x0(3,npart),x1(3,npart) ! the configuration for the system
real(kind=r8) :: r,c1,c2,c3,c4,dr,rij(3),dr01i(3),dr01j(3),dr01ij(3)
real(kind=r8) :: vls
complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
,cwtnew1(0:nspin-1,nisospin),cwt1(0:nspin-1,nisospin)
complex(kind=r8) :: fx,fyi,fz,fold,cls1vls
integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij
logical :: spinorbit
if (irep.eq.4) then
cwtnew(:,:)=cwtold(:,:) ! because for vmc, dt=0.
return
endif
cwtnew(:,:)=cwtold(:,:)
if (spinorbit) then
do i=1,npart-1
do j=i+1,npart
rij(:)=x0(:,i)-x0(:,j)
r=sqrt(dot_product(rij,rij)) !r=sqrt(sum(dx(:)**2))
if (r.lt.rangev) then
dr=scalep*r
index=dr
index=max(1,min(index,ntab-2))
dr=dr-index
c1=-dr*(dr-1)*(dr-2)/6
c2=(dr+1)*(dr-1)*(dr-2)/2
c3=-(dr+1)*dr*(dr-2)/2
c4=(dr+1)*dr*(dr-1)/6
vls=c1*vtabls(1,index-1)+c2*vtabls(1,index)+c3*vtabls(1,index+1)+c4*vtabls(1,index+2)
else
!vls=0
call cheft_pot_ls(lpot,r,vls)
endif
dr01i(:)=x0(:,i)-x1(:,i)
dr01j(:)=x0(:,j)-x1(:,j)
dr01ij(:)=dr01i(:)-dr01j(:)
cls1vls=cls1*vls*sign
fx=(rij(2)*dr01ij(3)-rij(3)*dr01ij(2))*cls1vls
fyi=-(rij(1)*dr01ij(3)-rij(3)*dr01ij(1))*cls1vls*ci
fz=(rij(1)*dr01ij(2)-rij(2)*dr01ij(1))*cls1vls
do spn=0,nspin-1
signi=2*ibits(spn,i-1,1)-1
signj=2*ibits(spn,j-1,1)-1
newspni=xor(spn,shiftl(1,i-1)) ! flip spin i
newspnj=xor(spn,shiftl(1,j-1)) ! flip spin j
cwtnew(newspni,:)=cwtnew(newspni,:)-(fx+signi*fyi)*cwtold(spn,:)
cwtnew(newspnj,:)=cwtnew(newspnj,:)-(fx+signj*fyi)*cwtold(spn,:)
cwtnew(spn,:)=cwtnew(spn,:)-fz*(signi+signj)*cwtold(spn,:)
enddo
!call vlsprop1(x0,x1,i,j,vls,sign,cwtold,cwtnew1)
!cwtnew(:,:)=cwtnew(:,:)+cwtnew1(:,:)
enddo
enddo
endif
return
end subroutine vlsprop0
subroutine vlsprop(spinorbit,sign,x0,x1,cwtold,cwtnew) ! added LS*t1*t2. 2020/5/20
! only LS for now. and LS*t1*t2. Linear terms. If cwtold and cwtl2r, then sign=-1. If cwtr2l, sign=1.
use cheft
integer(kind=i4) :: i,j,it,index,indexlag4,sign,nstate
real(kind=r8) :: x0(3,npart),x1(3,npart) ! the configuration for the system
real(kind=r8) :: r,c1,c2,c3,c4,dr,rij(3),dr01i(3),dr01j(3),dr01ij(3),rdx,rdy,rdz
real(kind=r8) :: vls(2),vv(18)
complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
,cwtnew1(0:nspin-1,nisospin),cwt1(0:nspin-1,nisospin)
complex(kind=r8) :: fx,fyi,fz,f2x,f2yi,f2z,fold,cls1vls,cls1vls2
integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij,nsnew
logical :: spinorbit
if (irep.eq.4) then
cwtnew(:,:)=cwtold(:,:) ! because for vmc, dt=0.
return
endif
if (spinorbit) then
if (lpot>2) then
call vlsprop0(spinorbit,sign,x0,x1,cwtold,cwtnew)
return
endif
! AV8' added LS*t1*t2
cwtnew(:,:)=cwtold(:,:)
do i=1,npart-1 ! all the rij thing for a bead can be optimized, no need to repeatedly calculate them.
do j=i+1,npart
rij(:)=x0(:,i)-x0(:,j)
r=sqrt(dot_product(rij,rij)) !r=sqrt(sum(dx(:)**2))
if (r.lt.rangev) then
dr=scalep*r
index=dr
index=max(1,min(index,ntab-2))
dr=dr-index
c1=-dr*(dr-1)*(dr-2)/6
c2=(dr+1)*(dr-1)*(dr-2)/2
c3=-(dr+1)*dr*(dr-2)/2
c4=(dr+1)*dr*(dr-1)/6
vls(:)=c1*vtabls(:,index-1)+c2*vtabls(:,index)+c3*vtabls(:,index+1)+c4*vtabls(:,index+2)
else
!vls(:)=0
call av18op(lpot,r,vv)
vls(1:2)=vv(7:8)
endif
!write(6,*) 'vls in prop=',r, vls(1), vls(2)
dr01i(:)=x0(:,i)-x1(:,i)
dr01j(:)=x0(:,j)-x1(:,j)
dr01ij(:)=dr01i(:)-dr01j(:)
rdx=rij(2)*dr01ij(3)-rij(3)*dr01ij(2)
rdy=-(rij(1)*dr01ij(3)-rij(3)*dr01ij(1))
rdz=rij(1)*dr01ij(2)-rij(2)*dr01ij(1)
cls1vls=cls1*(vls(1)-vls(2))*sign
cls1vls2=cls1*vls(2)*2*sign
fx=rdx*cls1vls
fyi=rdy*cls1vls*ci
fz=rdz*cls1vls
f2x=rdx*cls1vls2
f2yi=rdy*cls1vls2*ci
f2z=rdz*cls1vls2
!!!!! check
!write (6,'( ''rdx, rdy, rdz = '', 3(g15.7,1x) )')rdx,rdy,rdz
!write (6,'( ''vls1,2 = '', 3(g15.7,1x) )') vls(1:2),vls(1)-vls(2)
!write (6,'( ''cls1vls = '', 2g15.7 )') cls1vls
!write (6,'( ''cls1vls2 = '', 2g15.7 )') cls1vls2
!!!!!
do ispn=1,nisospin
call opextau(i,j,invcwt(0,ispn),nsnew) ! take spn as 0 here bc we only care about ispin exchange.
newispn=invispin(nsnew)
do spn=0,nspin-1
fold=cwtold(spn,ispn)
signi=2*ibits(spn,i-1,1)-1
signj=2*ibits(spn,j-1,1)-1
newspni=xor(spn,shiftl(1,i-1)) ! flip spin i
newspnj=xor(spn,shiftl(1,j-1)) ! flip spin j
cwtnew(newspni,ispn)=cwtnew(newspni,ispn)-(fx+signi*fyi)*fold
cwtnew(newspnj,ispn)=cwtnew(newspnj,ispn)-(fx+signj*fyi)*fold
cwtnew(spn,ispn)=cwtnew(spn,ispn)-fz*(signi+signj)*fold
!call opextau(i,j,invcwt(spn,ispn),nsnew)
!newispn=invispin(nsnew)
!write(6,*) 'isospin in prop:',i,j,ispn,newispn,nisospin
cwtnew(newspni,newispn)=cwtnew(newspni,newispn)-(f2x+signi*f2yi)*fold
cwtnew(newspnj,newispn)=cwtnew(newspnj,newispn)-(f2x+signj*f2yi)*fold
cwtnew(spn,newispn)=cwtnew(spn,newispn)-f2z*(signi+signj)*fold
enddo
enddo
!call vlsprop1(x0,x1,i,j,vls,sign,cwtold,cwtnew1)
!cwtnew(:,:)=cwtnew(:,:)+cwtnew1(:,:)
enddo
enddo
else
cwtnew(:,:)=cwtold(:,:)
endif
return
end subroutine vlsprop
end module v6stepcalc
But I agree there can be more smart ways to do that, and using global variables make the code a little sloppy. But overall, I feel that modern Fortran is OK, and its grammar makes it harder to write a slow code.
Thank you all very much!
So, Iāve tried to translate the Fortran code in Julia. This is what I got:
using LinearAlgebra
struct MeanCovar{T<:AbstractFloat}
Ī¼::Matrix{T}
Ļ::Matrix{T}
w::T
end
function MeanCovar(Ī¼::AbstractVector{Tv}, Ļ::AbstractMatrix{Tm}, w::Tw) where {Tv, Tm, Tw}
T = float(promote_type(Tv,Tm,Tw))
dim_p = LinearAlgebra.checksquare(Ļ)
length(Ī¼) == dim_p || error("Incompatible sizes")
newĪ¼ = copyto!(Matrix{T}(undef, (dim_p, 1)), Ī¼)
newĻ = Matrix{T}(Ļ)
return MeanCovar{T}(newĪ¼, newĻ, w)
end
function MeanCovar{T}(dim_p::Integer) where {T}
return MeanCovar{T}(zeros(T, (dim_p, 1)), Matrix{T}(I(dim_p)), zero(T))
end
MeanCovar(dim_p) = MeanCovar{Float64}(dim_p)
function init_mean_covar(
kmix::Integer,
dim_p::Integer,
weight::AbstractVector{T},
Ļ::AbstractArray{T,3},
Ī¼::AbstractMatrix{T}
) where {T}
Ī¼Ļ = [MeanCovar(@view(Ī¼[k,:]), @view(Ļ[k,1:dim_p,1:dim_p]), weight[k]) for k in 1:kmix]
return Ī¼Ļ
end
What I donāt understand is why mu
is a matrix with one column and not a vector but I kept that.
From what I see, mean_covar_init
cannot be called multiple times in one program because thatāll trigger an error in allocate
. You may be a person who implicitly knows about that, otherwise it is not documented, and other may have troubles. The reason globals are frowned upon is primarily not the performance but that they introduce implicit rules of using everything else in the module.
Thank you very much grandmaster!
- You are absoutely right, mu should be a vector. I make it is n*1 matrix simply because I am sloppy.
- You are absoutely right,
mean_covar_init
can only be called once, and there is no need to call it twice. Because it is just doing initialization at the beginning.
It is just like I read from a file about all the dimensions of the arrays i need to use in the program. Then at the beginning of the code I initialize/allocate the arrays just once. Then I just use the arrays, no need to reallocate them.
In Fortran, in the init function, I can add things like
if ( .not. allocated(a) ) allocate( a(100) )
so that even if I repeatedly call init function, it will not trigger error because a is already allocated as 100 element array at the beginning.
These init function only needs to be called once in the program.
Below is my module
module Mixture
export musigma, mean_covar_init
mutable struct Mean_covar
mu::Array{Float64,2}
sigma::Array{Float64,2}
w::Float64
end
const musigma = Array{Mean_covar,1}() # const is global
function mean_covar_init(kmix::Int64,dim_p::Int64
,weight::Array{Float64,1}
,sigma::Array{Float64,3}
,mu::Array{Float64,2})
@assert length(weight) == kmix
@assert size(sigma) == (kmix,dim_p,dim_p)
@assert size(mu) == (kmix,dim_p)
resize!(musigma, kmix)
for k in 1:kmix
musigma[k] = Mean_covar(zeros(dim_p,1),zeros(dim_p,dim_p),0.0)
musigma[k].mu[1,1] = mu[k,1]
musigma[k].mu[2,1] = mu[k,2]
musigma[k].sigma[1,1] = sigma[k,1,1]
musigma[k].sigma[2,2] = sigma[k,2,2]
musigma[k].w = weight[k]
end
return nothing
end
end
No, it has an undocumented feature that it can only be called once in the program without triggering an error.
It may be unimportant for your workflow here and now, but another person (including yourself in the future) might have a use case where that feature is an unpleasant surprise.
But imagine someone builds your module as a library, makes a Python binding and tries to process multiple files from Python.
Now, for Julia module, Iām genuinely puzzled by your choice to keep the design with a global variable. Changing
resize!(musigma, kmix)
to
musigma = Vector{MeanCovar}(undef, kmix)
and getting rid of the global variable definition has a win of not using globals and has less LOC.
Here, you wonāt have memory management errors, but instead you limit the module to being only able of loading one dataset at any moment.
Thank you very much!
Uhm, may I ask, is there advantage using
musigma = Vector{MeanCovar}(undef, kmix)
over
resize!(musigma, kmix)
When initializing an array of non-isbits
structs thatās immediately overwritten with other structs, I would suggest neither - just use an array comprehension as suggested here:
In this line,
resize!(musigma, kmix)
youāre creating a bunch of references to uninitialized arrays that are immediately overwritten here:
musigma[k] = Mean_covar(zeros(dim_p,1),zeros(dim_p,dim_p),0.0)
Itās not a performance-killer, but it creates pointless repetition of work. With the array comprehension, the array bindings are created where and when theyāre needed.
Separately, you may want to investigate StaticArrays.jl for your mu
and sigma
arrays (provided dim_p
is less than 10 or so), which will allow a more-compact isbits
memory representation of Mean_covar
and generation of more-efficient code thanks to the fixed array dimensions.
The first one creates a new array (which has then to be returned, of course), the second mutates a global variable which is not an explicit argument of the function.
Again, why do you keep using the global variable at all? Iād assume that your client code uses that array without explicitly passing it as argument as well. While that isnāt great either, you can easily keep that behavior in client code by declaring
musigma = mean_covar_init(something)
in the client code toplevel instead of library toplevel.
Further, the global array antipattern keeps you from using generic types. In my example, as you can see, mu
and sigma
within the struct can be arrays of single or half precision floats if you need that for performance on GPUs, or have extended precision if needed. Having a pre-defined array for each possible type parameter is just infeasible. An example of such difficulty even with immutable scalar globals is the definition of pi
in the Julia standard library. It has to have a special type to be used efficiently in a generically-written code.
In that case, it is fine to use global caches since they are not exported, not used implicitly as return values etc. But again, it may be better to use the approach employed e.g. in the form of āplansā in FFTW: create an object containing the pre-compiled cache for a specific problem and use it explicitly. In FFTW.jl, plans can be conveniently applied via the overloaded multiplication. Functions using caches are naturally implemented via structs with overloaded call operator. So, thereās some design space to explore even for such tasks.