How to set the elements of an array of defined type in one line?

Thank you very much!
Indeed, I was thinking of some more generic/native solutions such as just using for _ in 1:2 as you suggested, something similar with the link below,

Sorry for another stupid question, uhm, do you perhaps know why

www .= [My_w(1.0)]

will just make www[1] and www[2] all the same all the time? What is the logic?

The single operator

.=

seems acting on the stuff on the left?

It seems when using .= , the elements on the left hand side (LHS) and right hand side (RHS) need to match.

for
www .= [My_w(1.0)]

the LHS www has two elements, while RHS [My_w(1.0)] is just one element, therefore the two www elements both point to the same memory address as [My_w(1.0)] all the time?

While in the example you give, the elements of LHS and RHS are match.

Also, what does the symbol _ mean?

I am sorry for those sloppy lazy questions.

That is due to how mutable objects are represented in Julia (nothing new, the same is true for Python or Common Lisp, for example). It is a reference to some data stored in memory (typically on the heap, may be on stack but that is unimportant for the explanation). On broadcast, only the reference is copied multiple times, all copies pointing to the same data.

Not exactly, itā€™s either match or the size of the RHS in some dimension is 1, in which case the rhs value is broadcasted over the lhs axis.

It means itā€™s a dummy variable. Just a hint to a reader that the generator expression (or a loop body) doesnā€™t depend on the iteration variable. Any other name would be fine as well.

4 Likes

it was not mentioned because the OP insisted a pre-allocated www, now, I donā€™t know if itā€™s their need or just Fortran habit, but I didnā€™t want to guess their intention because last time we collectively failed at doing that by saying: donā€™t allocate long vector just make one when you actually need it.

1 Like

Sorry, my bad for being so stubborn.

But I am glad and grateful that all of your guys still provided very useful suggestions and examples, and basically allowing me to find a way to write my Julia code basically the same as my stupid version of modern Fortran. Like a 1;1 translation. I cannot imagine other language can do such a thing.

The ā€˜globalā€™ variables or allocating array stuff in Fortran is slight different than those in Julia, though. The global variable in my Fortran code is actually module variables, it is only ā€˜globalā€™ within the module, and it does not impact performance. Though you may say this could be a somewhat bad habit. I got it. It just that sometimes, in the code, I know what I am doing, so I store them as module variables, and so I can always access them without having to put them in the arguments.

In the small piece of module below, eg, i calculate path integral propagation of the Argonne v6/v8, or ChEFT N2LO nucleon-nucleon interaction, for the central part of the interaction like f(r), I just tablute like 10^4 points of them before the calculation, store them as vtab(:,:slight_smile: or something, When I need to really calcualte f(r) on the fly, I take take 4 relevant points and do Lagrange 4-point interpolation etc.
In this example, I feel it is not a very bad idea to store those tabulated points as some ā€˜globalā€™ array like vtab(:,:slight_smile: within the module, since in Fortran, it does not influence the performance.

module v6stepcalc
   implicit none
   integer, private, parameter :: i4=selected_int_kind(9)
   integer, private, parameter :: i8=selected_int_kind(15)
   integer, private, parameter :: r8=selected_real_kind(15,9)
   complex(kind=r8), private, parameter :: czero=(0.0_r8,0.0_r8)
   complex(kind=r8), private, parameter :: ci=(0.0_r8,1.0_r8),forthci=(0.0_r8,-0.25_r8) ! forthci=1/(4i)
   complex(kind=r8), private, parameter :: cone=(1.0_r8,0.0_r8)
   integer(kind=i4), private, save :: npart,nprot,nneut,n2p
   integer(kind=i4), private, save :: lpot,lpotpr
   integer(kind=i4), private, save :: nisospin,nspin,nbasis,niso
   integer(kind=i4), private, save :: mmax,nbisect,ntab,ntabrsmall,ntabrbig,ntablag4,nrepmax
   real(kind=r8), private, save :: scalep,rangev,rangevsafe,rangevsmall,rangevbig,drlag4
   real(kind=r8), private, save :: hbar,dt
   complex(kind=r8), private, save, allocatable :: cwt(:,:),cwtgnd(:,:)  
   real(kind=r8), private, allocatable, save :: utab(:,:,:),vtab(:,:),vtabls(:,:),evdttab(:,:,:),vemtab(:) &
                                               ,utabpr(:,:,:),evdttabpr(:,:,:) &
                                               ,utabrsmall(:,:,:),utabrsmallpr(:,:,:),utabrbig(:,:,:),utabrbigpr(:,:,:) &
                                               ,vtabrsmall(:,:),vtabrbig(:,:) &
                                               ,laginterpc(:,:),v6taborigin(:,:)
   real(kind=r8), private,save, dimension(6) :: u0tab
   complex(kind=r8), private, save :: cls1 ! coefficient for LS term.
   integer(kind=i4), private, allocatable, save :: ignd(:),invspin(:),invispin(:),liso(:),lisoi(:) &
                                                  ,invcwt(:,:) ! invcwt given nspin,niso, give nstate accordingly.  
   logical, private, save :: iem
   integer(kind=i4), private, save :: irep
! 0: n, down. 
! 1: p, up. 
! nisospin 1: 0011  =3  
!          2: 0101  =5
!          3: 0110  =6
!          4: 1001  =9
!          5: 1010  =10
!          6: 1100  =12
! nspin 0:15
contains	   
    subroutine v6stepcalcinit(npartin,nprotin,lpotin,dtin,hbarin,ntabin,rangevin,mmaxin,nrepmaxin,iemin,irepin)
    use brut
    use math
    use v6pot
    use mympi
    real(kind=r8) :: dtin,hbarin,rangevin,dr,drc
    integer(kind=i4) :: npartin,nprotin,mmaxin,ntabin,nrepmaxin,lpotin
    real(kind=r8) :: v(6),e(6) ! v(6) should already absorbed the stime step dt. 
    integer(kind=i4) :: i,j,ibox,irepin
    integer(kind=i4) :: myunit
    real(kind=r8) :: vfact(8)   
    logical :: iemin
    integer(kind=i4) :: iispin,ispin,nstate
    
    irep=irepin
    iem=iemin ! em force switch

    npart=npartin
    n2p=6*npart+1
    nprot=nprotin
    nneut=npart-nprot
    nspin=2**npart ! n=4, it is 16.
    call combin(npart,nprot,nisospin) ! give value for nisospin, n=4, it is 4C2= 6.
    nbasis=nspin*nisospin ! n=4, it is 96.
    allocate(invspin(nbasis))	
    allocate(invispin(nbasis))
    allocate(cwt(0:nspin-1,nisospin))
    allocate(invcwt(0:nspin-1,nisospin))
    allocate(liso(nisospin))
    allocate(lisoi(0:nspin-1))
    allocate(cwtgnd(0:nspin-1,nisospin))
    allocate(ignd(npart))
      
    !do iispin=1,nisospin
    !    do ispin=0,nspin-1
	   !     nstate=(iispin-1)*nspin+ispin+1 ! 1:96
	   !     invcwt(ispin,iispin)=nstate
	   !     invspin(nstate)=ispin ! 0:15
	   !     invispin(nstate)=iispin ! 1:6
    !    enddo
    !enddo    

    !call spinit(npartin,nprotin)
    call getinvstarrays(nspin,nisospin,invcwt,invspin,invispin)
    call spinittest(npart,nprot,nspin,nisospin,liso,lisoi,ignd,cwtgnd) 
    
    dt=dtin
    hbar=hbarin
    ntab=ntabin
    rangev=rangevin
    mmax=mmaxin
    nrepmax=nrepmaxin
    nbisect=2**mmax
    dr=rangev/ntab 
    scalep=1/dr 
    ! make a table for u for different r and dt.  time step=2**j dt.
    !write(6,*) 'u table start!'
    vfact=1
    ibox=0
    lpot=lpotin ! 3 for v6
    lpotpr=lpot

    ntabrsmall=5000 ! small v range table
    ntabrbig=5000  ! big v range table    
    
    rangevsmall=2*dr
    rangevbig=4*rangev
    
    call v6potinit(npart,ibox,-rangev,lpot,ntab,vfact,lpotpr,iem &
                   ,ntabrsmall,ntabrbig,rangevsmall,rangevbig) 
    
    if (.not.allocated(vtab)) allocate(vtab(6,0:ntab)) 
    if (.not.allocated(vtabls)) allocate(vtabls(2,0:ntab)) 
    if (.not.allocated(v6taborigin)) allocate(v6taborigin(6,0:ntab)) 

    if (.not.allocated(vtabrsmall)) allocate(vtabrsmall(6,0:ntabrsmall)) 
    if (.not.allocated(vtabrbig)) allocate(vtabrbig(6,0:ntabrbig)) 
	  
    
    call getvtab(v6taborigin)
    call getvtabschmidt(vtab,vtabrsmall,vtabrbig) ! vtab is v6 schmidt order, this is what we use in the code.

    call getvtabls(vtabls)
    
    if (iem) then
    allocate(vemtab(0:ntab)) 
    call getvemtab(vemtab) 
    endif
	  
    !vtab=0.
    !vtab(6,:)=1.	 
	    
    !write(6,*) 'get v table done!'  
    if (.not.allocated(utab)) allocate(utab(6,0:ntab,-mmax:mmax))
    if (.not.allocated(utabpr)) allocate(utabpr(6,0:ntab,-mmax:mmax)) ! utabpr deal with +delta t
    !if (.not.allocated(evdttab)) allocate(evdttab(6,0:ntab,-mmax:mmax))
    !if (.not.allocated(evdttabpr)) allocate(evdttabpr(6,0:ntab,-mmax:mmax))	
    
    if (.not.allocated(utabrsmall)) allocate(utabrsmall(6,0:ntabrsmall,-mmax:mmax))
    if (.not.allocated(utabrsmallpr)) allocate(utabrsmallpr(6,0:ntabrsmall,-mmax:mmax)) ! utabpr deal with +delta t
    if (.not.allocated(utabrbig)) allocate(utabrbig(6,0:ntabrbig,-mmax:mmax))
    if (.not.allocated(utabrbigpr)) allocate(utabrbigpr(6,0:ntabrbig,-mmax:mmax)) ! utabpr deal with +delta t    
    
    do j=-mmax,mmax ! dt=2**j
        do i=0,ntab
            ! we use Schmidt order, note that it is different from the order in v18pot. so need to change it.    
            v(:)=vtab(:,i)*dt*2.0_r8**j  
            e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
            e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
            e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t	
            e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
            e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
            e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau	
            !evdttab(:,i,j)=e(:)	  
            utab(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
            utab(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
            utab(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
            utab(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
            utab(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
            utab(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24	
       
            v(:)=vtab(:,i)*(-dt)*2.0_r8**j  
            e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
            e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
            e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t	
            e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
            e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
            e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau	
            !evdttabpr(:,i,j)=e(:)	
            utabpr(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
            utabpr(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
            utabpr(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
            utabpr(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
            utabpr(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
            utabpr(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24	    
        enddo     
        
        
        
        do i=0,ntabrbig
            ! we use Schmidt order, note that it is different from the order in v18pot. so need to change it.    
            v(:)=vtabrbig(:,i)*dt*2.0_r8**j  
            e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
            e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
            e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t	
            e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
            e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
            e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau	
            !evdttab(:,i,j)=e(:)	  
            utabrbig(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
            utabrbig(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
            utabrbig(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
            utabrbig(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
            utabrbig(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
            utabrbig(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24	
       
            v(:)=vtabrbig(:,i)*(-dt)*2.0_r8**j  
            e(1)=exp(-(v(1)+v(2)+2*v(3)+v(4)+v(5)+2*v(6))) !c
            e(2)=exp(-(v(1)+v(2)-4*v(3)+v(4)+v(5)-4*v(6))) !sigma
            e(3)=exp(-(v(1)+v(2)+2*v(3)-3*v(4)-3*v(5)-6*v(6))) !t	
            e(4)=exp(-(v(1)+v(2)-4*v(3)-3*v(4)-3*v(5)+12*v(6))) !tau
            e(5)=exp(-(v(1)-3*v(2)+v(4)-3*v(5))) !sigma tau
            e(6)=exp(-(v(1)-3*v(2)-3*v(4)+9*v(5))) ! t tau	
            !evdttabpr(:,i,j)=e(:)	
            utabrbigpr(1,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)+3*e(5)+e(6))/16
            utabrbigpr(2,i,j)=(6*e(1)+3*e(2)+2*e(3)+e(4)-9*e(5)-3*e(6))/48
            utabrbigpr(3,i,j)=(3*e(1)-3*e(2)+e(3)-e(4))/24
            utabrbigpr(4,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)+e(5)-e(6))/16
            utabrbigpr(5,i,j)=(2*e(1)+e(2)-2*e(3)-e(4)-3*e(5)+3*e(6))/48
            utabrbigpr(6,i,j)=(e(1)-e(2)-e(3)+e(4))/24	    
        enddo        
 
    enddo
    u0tab(1)=1
    u0tab(2:6)=0	
    
! calculate lagrange c table. however this does not have any performance increase.
    
    ntablag4=10000
    if (.not.allocated(laginterpc)) allocate(laginterpc(4,0:ntablag4)) 
    drlag4=1.0_r8/ntablag4  
    do i=0,ntablag4    
        drc=i*drlag4
        laginterpc(1,i)=-drc*(drc-1)*(drc-2)/6
        laginterpc(2,i)=(drc+1)*(drc-1)*(drc-2)/2
        laginterpc(3,i)=-(drc+1)*drc*(drc-2)/2
        laginterpc(4,i)=(drc+1)*drc*(drc-1)/6            
    enddo
    
    if (myrank().eq.0) then 
        open(unit=9,form='formatted',file='v6tableorigin.dat')
        rewind 9
        do i=0,ntab
            write(9,'(7(e15.7,2x))') i*dr,v6taborigin(:,i)  
        enddo
        close(9) 
        
        open(unit=9,form='formatted',file='vlstableorigin.dat')
        rewind 9
        do i=0,ntab
            write(9,'(7(e15.7,2x))') i*dr,vtabls(:,i)  
        enddo
        close(9)
        
        open(newunit=myunit,form='formatted',file='vtableSchmidt.dat')
        rewind myunit
        do i=0,ntab
	        write(myunit,'(7(e15.7,2x))') i*dr,vtab(:,i)
        enddo
        close(myunit) 
       
        open(newunit=myunit,form='formatted',file='vrsmalltableSchmidt.dat')
        rewind myunit
        do i=0,ntabrsmall
	        write(myunit,'(6(e15.7,2x))') vtabrsmall(:,i)
        enddo
        close(myunit)        
       
        
    return
    end subroutine

	
  
    subroutine opexsigma(i,j,nstate,nstatenew) ! spin exchange, I think this is the fastest.
    integer(kind=i4) :: i,j,nstate,nstatenew,spn,newspn
    spn=invspin(nstate) ! spn, 0:15.
    
    !iv=ibits(spn,i-1,1)
    !jv=ibits(spn,j-1,1)
    
    !iv=shiftr(and(spn,shiftl(1,i-1)),i-1)
    !jv=shiftr(and(spn,shiftl(1,j-1)),j-1)
    
    !newspn=spn+(jv-iv)*2**(i-1)+(iv-jv)*2**(j-1)

    newspn=spn
    call mvbits(spn,i-1,1,newspn,j-1)
    call mvbits(spn,j-1,1,newspn,i-1)    
    nstatenew=invcwt(newspn,invispin(nstate))
    return
    end subroutine opexsigma
 
    subroutine opextau(i,j,nstate,nstatenew) ! tau exchange 
    integer(kind=i4) :: i,j,nstate,nstatenew,ispn,newispn
    ispn=liso(invispin(nstate)) 
    
    !iv=ibits(ispn,i-1,1)
    !jv=ibits(ispn,j-1,1)    

    !iv=shiftr(and(ispn,shiftl(1,i-1)),i-1)
    !jv=shiftr(and(ispn,shiftl(1,j-1)),j-1)
    
    !newispn=ispn+(jv-iv)*2**(i-1)+(iv-jv)*2**(j-1)

    newispn=ispn
    call mvbits(ispn,i-1,1,newispn,j-1)
    call mvbits(ispn,j-1,1,newispn,i-1)
    nstatenew=invcwt(invspin(nstate),lisoi(newispn))
    return
    end subroutine opextau     
    
    subroutine opexsigmatau(i,j,nstate,nstatenew) ! sigma, tau exchange at the same time. 
    integer(kind=i4) :: i,j,nstate,nstatenew,ispn,newispn,spn,newspn
    spn=invspin(nstate) ! spn, 0:15.
    newspn=spn
    call mvbits(spn,i-1,1,newspn,j-1)
    call mvbits(spn,j-1,1,newspn,i-1)     
    ispn=liso(invispin(nstate)) 
    newispn=ispn
    call mvbits(ispn,i-1,1,newispn,j-1)
    call mvbits(ispn,j-1,1,newispn,i-1)   
    nstatenew=invcwt(newspn,lisoi(newispn))
    return
    end subroutine opexsigmatau    
  
    subroutine v6prop1(x,i,j,u,cwtold,cwtnew) ! optimized version 20190703, 30 times faster than old verison.
    integer(kind=i4) :: i,j,nstate,nstatenew,nstatenew1,nstatenew2
    
    real(kind=r8) :: x(3,npart),r(3),rsq(3),dx(3),r3r1
    real(kind=r8) :: u(6) ! v6 so 6, coefficients after conversion. check if real or complex, I think real.
    complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
                       ,fceff,fsex,fiex,fsiex,fteff,fteffiex,f(6) &
                       ,cir2,cir1,c(4),cfteff(4),cfteffiex(4),r2cir1,r3cir2
    integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij  
    
    if (irep.eq.4) then
        cwtnew(:,:)=cwtold(:,:)
        return  
    endif
     
    dx(:)=x(:,i)-x(:,j)
    r(:)=dx(:)/sqrt(dot_product(dx,dx)) !unit vector
    rsq(:)=r(:)**2
    r3r1=r(3)*r(1)
    cir1=ci*r(1)
    cir2=ci*r(2)
    r2cir1=r(2)*cir1
    r3cir2=r(3)*cir2
    cwtnew(:,:)=0 
    do nstate=1,nbasis   
        f(:)=u(:)*cwtold(invspin(nstate),invispin(nstate))
        fceff=f(1)-f(2)-f(4)+f(5)+f(3)-f(6)
        fsex=2*(f(2)-f(5)-f(3)+f(6))
        fiex=2*(f(4)-f(5)+f(6))
        fsiex=4*(f(5)-f(6))
        fteff=3*(f(3)-f(6))
        fteffiex=6*f(6)
        ! fceff,op1
        cwtnew(invspin(nstate),invispin(nstate))=cwtnew(invspin(nstate),invispin(nstate))+fceff
        ! fsex,opexsigma     
        call opexsigma(i,j,nstate,nstatenew1)
        cwtnew(invspin(nstatenew1),invispin(nstatenew1))=cwtnew(invspin(nstatenew1),invispin(nstatenew1))+fsex
        ! fiex,opextau        
        call opextau(i,j,nstate,nstatenew2)  
        cwtnew(invspin(nstatenew2),invispin(nstatenew2))=cwtnew(invspin(nstatenew2),invispin(nstatenew2))+fiex        
        ! fsiex
        call opextau(i,j,nstatenew1,nstatenew)  
        cwtnew(invspin(nstatenew),invispin(nstatenew))=cwtnew(invspin(nstatenew),invispin(nstatenew))+fsiex      
        ! fteff,op3 
        !call op3pure(r,i,j,nstate,fteff,cwtnew) 
        spn=invspin(nstate) ! spn, 0:15.
        ispn=invispin(nstate) 
        signi=2*ibits(spn,i-1,1)-1
        signj=2*ibits(spn,j-1,1)-1    
        newspni=xor(spn,shiftl(1,i-1))
        newspnj=xor(spn,shiftl(1,j-1))
        newspn=xor(newspni,shiftl(1,j-1))
        signij=signi*signj
        !c(1)=rsq(1)-r(2)*(signij*r(2)-(signi+signj)*cir1) 
        c(1)=rsq(1)-rsq(2)*signij+(signi+signj)*r2cir1
        !c(2)=signj*r(3)*(r(1)+signi*cir2)
        c(2)=signj*r3r1+signij*r3cir2
        !c(3)=signi*r(3)*(r(1)+signj*cir2)
        c(3)=signi*r3r1+signij*r3cir2
        c(4)=signij*rsq(3) 
        cfteff(:)=fteff*c(:)
        cfteffiex(:)=fteffiex*c(:)   
        cwtnew(newspn,ispn)=cfteff(1)+cwtnew(newspn,ispn)    
        cwtnew(newspni,ispn)=cfteff(2)+cwtnew(newspni,ispn)
        cwtnew(newspnj,ispn)=cfteff(3)+cwtnew(newspnj,ispn)    
        cwtnew(spn,ispn)=cfteff(4)+cwtnew(spn,ispn)        
        ! fteffiex,opextau,op3
        !call opextau(i,j,nstate,nstatenew)
        !call op3pure(r,i,j,nstatenew2,fteffiex,cwtnew)    
        ispn=invispin(nstatenew2)      
        !spn=invspin(nstatenew2) ! spn, 0:15.
        !signi=2*ibits(spn,i-1,1)-1
        !signj=2*ibits(spn,j-1,1)-1  
        !newspni=xor(spn,shiftl(1,i-1))
        !newspnj=xor(spn,shiftl(1,j-1))
        !newspn=xor(newspni,shiftl(1,j-1))  
        cwtnew(newspn,ispn)=cfteffiex(1)+cwtnew(newspn,ispn)    
        cwtnew(newspni,ispn)=cfteffiex(2)+cwtnew(newspni,ispn)
        cwtnew(newspnj,ispn)=cfteffiex(3)+cwtnew(newspnj,ispn)    
        cwtnew(spn,ispn)=cfteffiex(4)+cwtnew(spn,ispn)         
    enddo 
    return
    end subroutine v6prop1         
   
    subroutine vemprop1(expvemcdt,i,j,cwtold,cwtnew)
    integer(kind=i4) :: i,j,l,it,lspin,ipi,ipj,index
    real(kind=r8) :: expvemcdt  
    complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin)
   
    cwtnew(:,:)=cwtold(:,:)
    if (irep.eq.4) return    
    
    do l=1,nisospin
        lspin=liso(l)       
	    ipi=and(1,shiftr(lspin,i-1))  !(2*(and(1,shiftr(lspin,i-1))))/2
	    ipj=and(1,shiftr(lspin,j-1))    !(2*(and(1,shiftr(lspin,j-1))))/2  
	    if ((ipi*ipj).eq.1) then
            cwtnew(:,l)=cwtold(:,l)*expvemcdt        
	    endif               
    enddo
    return
    end subroutine vemprop1
    
    subroutine vlsprop1(x0,x1,i,j,vls,sign,cwtold,cwtnew) 
    integer(kind=i4) :: i,j,nstate,nstatenew,nstatenew1,nstatenew2,sign
    real(kind=r8) :: x0(3,npart),x1(3,npart),rij(3),dr01i(3),dr01j(3),dr01ij(3)
    real(kind=r8) :: vls 
    complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
                       ,fx,fyi,fz,fold,cls1vls
    integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij  
    
    if (irep.eq.4) then
        cwtnew(:,:)=cwtold(:,:)
        return  
    endif
    
    rij(:)=x0(:,i)-x0(:,j)
    
    dr01i(:)=x0(:,i)-x1(:,i)
    dr01j(:)=x0(:,j)-x1(:,j) 
    dr01ij(:)=dr01i(:)-dr01j(:)
    
    cls1vls=cls1*vls*sign
    fx=(rij(2)*dr01ij(3)-rij(3)*dr01ij(2))*cls1vls
    fyi=-(rij(1)*dr01ij(3)-rij(3)*dr01ij(1))*cls1vls*ci
    fz=(rij(1)*dr01ij(2)-rij(2)*dr01ij(1))*cls1vls
      
    cwtnew(:,:)=0 
    do nstate=1,nbasis 
        spn=invspin(nstate) ! spn, 0:15.
        ispn=invispin(nstate) 
        fold=cwtold(spn,ispn)
       
        signi=2*ibits(spn,i-1,1)-1
        signj=2*ibits(spn,j-1,1)-1 
       
        newspni=xor(spn,shiftl(1,i-1)) ! flip spin i
        newspnj=xor(spn,shiftl(1,j-1)) ! flip spin j
        
        cwtnew(newspni,ispn)=cwtnew(newspni,ispn)+(fx+signi*fyi)*fold 
        cwtnew(newspnj,ispn)=cwtnew(newspnj,ispn)+(fx+signj*fyi)*fold  
        cwtnew(spn,ispn)=cwtnew(spn,ispn)+fz*(signi+signj)*fold                   
    enddo 
    
    return
    end subroutine vlsprop1   
       
    subroutine vlsprop0(spinorbit,sign,x0,x1,cwtold,cwtnew) ! without no LS*t1*t2. old version and works. 2020/5/20
! only LS for now. no LS*t1*t2. Linear terms. If cwtold and cwtl2r, then sign=-1. If cwtr2l, sign=1.
    use cheft
    integer(kind=i4) :: i,j,it,index,indexlag4,sign,nstate
    real(kind=r8) :: x0(3,npart),x1(3,npart) ! the configuration for the system
    real(kind=r8) :: r,c1,c2,c3,c4,dr,rij(3),dr01i(3),dr01j(3),dr01ij(3)
    real(kind=r8) :: vls
    complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
		                ,cwtnew1(0:nspin-1,nisospin),cwt1(0:nspin-1,nisospin)
    complex(kind=r8) :: fx,fyi,fz,fold,cls1vls
    integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij  
    logical :: spinorbit
    if (irep.eq.4) then
	    cwtnew(:,:)=cwtold(:,:) ! because for vmc, dt=0.	
	    return	
    endif
    
    cwtnew(:,:)=cwtold(:,:)
    if (spinorbit) then    
        do i=1,npart-1
	        do j=i+1,npart 		
                rij(:)=x0(:,i)-x0(:,j)		   
		        r=sqrt(dot_product(rij,rij)) !r=sqrt(sum(dx(:)**2))
                if (r.lt.rangev) then
			        dr=scalep*r
			        index=dr
			        index=max(1,min(index,ntab-2)) 
			        dr=dr-index
                    c1=-dr*(dr-1)*(dr-2)/6
                    c2=(dr+1)*(dr-1)*(dr-2)/2
                    c3=-(dr+1)*dr*(dr-2)/2
                    c4=(dr+1)*dr*(dr-1)/6
                    vls=c1*vtabls(1,index-1)+c2*vtabls(1,index)+c3*vtabls(1,index+1)+c4*vtabls(1,index+2)      
                else
                    !vls=0
                    call cheft_pot_ls(lpot,r,vls)
                endif	
                dr01i(:)=x0(:,i)-x1(:,i)
                dr01j(:)=x0(:,j)-x1(:,j) 
                dr01ij(:)=dr01i(:)-dr01j(:)  
                cls1vls=cls1*vls*sign
                fx=(rij(2)*dr01ij(3)-rij(3)*dr01ij(2))*cls1vls
                fyi=-(rij(1)*dr01ij(3)-rij(3)*dr01ij(1))*cls1vls*ci
                fz=(rij(1)*dr01ij(2)-rij(2)*dr01ij(1))*cls1vls    
                do spn=0,nspin-1      
                    signi=2*ibits(spn,i-1,1)-1
                    signj=2*ibits(spn,j-1,1)-1     
                    newspni=xor(spn,shiftl(1,i-1)) ! flip spin i
                    newspnj=xor(spn,shiftl(1,j-1)) ! flip spin j
                    cwtnew(newspni,:)=cwtnew(newspni,:)-(fx+signi*fyi)*cwtold(spn,:)
                    cwtnew(newspnj,:)=cwtnew(newspnj,:)-(fx+signj*fyi)*cwtold(spn,:)
                    cwtnew(spn,:)=cwtnew(spn,:)-fz*(signi+signj)*cwtold(spn,:)           
                enddo                 
                !call vlsprop1(x0,x1,i,j,vls,sign,cwtold,cwtnew1)                   
		        !cwtnew(:,:)=cwtnew(:,:)+cwtnew1(:,:)	
	        enddo
        enddo
    endif
    return
    end subroutine vlsprop0
    
    subroutine vlsprop(spinorbit,sign,x0,x1,cwtold,cwtnew) ! added LS*t1*t2. 2020/5/20
! only LS for now. and LS*t1*t2. Linear terms. If cwtold and cwtl2r, then sign=-1. If cwtr2l, sign=1.
    use cheft
    integer(kind=i4) :: i,j,it,index,indexlag4,sign,nstate
    real(kind=r8) :: x0(3,npart),x1(3,npart) ! the configuration for the system
    real(kind=r8) :: r,c1,c2,c3,c4,dr,rij(3),dr01i(3),dr01j(3),dr01ij(3),rdx,rdy,rdz
    real(kind=r8) :: vls(2),vv(18)
    complex(kind=r8) :: cwtnew(0:nspin-1,nisospin),cwtold(0:nspin-1,nisospin) &
		                ,cwtnew1(0:nspin-1,nisospin),cwt1(0:nspin-1,nisospin)
    complex(kind=r8) :: fx,fyi,fz,f2x,f2yi,f2z,fold,cls1vls,cls1vls2
    integer(kind=i4) :: spn,ispn,newspn,newispn,newspni,newspnj,signi,signj,signij,nsnew
    logical :: spinorbit
    if (irep.eq.4) then
	    cwtnew(:,:)=cwtold(:,:) ! because for vmc, dt=0.	
	    return	
    endif
       
    if (spinorbit) then
        if (lpot>2) then
            call vlsprop0(spinorbit,sign,x0,x1,cwtold,cwtnew)
            return
        endif     
        ! AV8' added LS*t1*t2
        cwtnew(:,:)=cwtold(:,:)
        do i=1,npart-1 ! all the rij thing for a bead can be optimized, no need to repeatedly calculate them.
	        do j=i+1,npart 		
                rij(:)=x0(:,i)-x0(:,j)		   
		        r=sqrt(dot_product(rij,rij)) !r=sqrt(sum(dx(:)**2))
                if (r.lt.rangev) then
			        dr=scalep*r
			        index=dr
			        index=max(1,min(index,ntab-2)) 
			        dr=dr-index
                    c1=-dr*(dr-1)*(dr-2)/6
                    c2=(dr+1)*(dr-1)*(dr-2)/2
                    c3=-(dr+1)*dr*(dr-2)/2
                    c4=(dr+1)*dr*(dr-1)/6
                    vls(:)=c1*vtabls(:,index-1)+c2*vtabls(:,index)+c3*vtabls(:,index+1)+c4*vtabls(:,index+2)       
                else
                    !vls(:)=0              
                    call av18op(lpot,r,vv)
                    vls(1:2)=vv(7:8)
                endif	
                         
                !write(6,*) 'vls in prop=',r, vls(1), vls(2)
                
                dr01i(:)=x0(:,i)-x1(:,i)
                dr01j(:)=x0(:,j)-x1(:,j) 
                dr01ij(:)=dr01i(:)-dr01j(:)  

                rdx=rij(2)*dr01ij(3)-rij(3)*dr01ij(2)
                rdy=-(rij(1)*dr01ij(3)-rij(3)*dr01ij(1))
                rdz=rij(1)*dr01ij(2)-rij(2)*dr01ij(1)                
                
                cls1vls=cls1*(vls(1)-vls(2))*sign
                cls1vls2=cls1*vls(2)*2*sign
                         
                fx=rdx*cls1vls
                fyi=rdy*cls1vls*ci
                fz=rdz*cls1vls 
                
                f2x=rdx*cls1vls2
                f2yi=rdy*cls1vls2*ci
                f2z=rdz*cls1vls2
                             
                !!!!! check
                !write (6,'( ''rdx, rdy, rdz = '', 3(g15.7,1x)  )')rdx,rdy,rdz
                !write (6,'( ''vls1,2 = '', 3(g15.7,1x)  )') vls(1:2),vls(1)-vls(2)  
                !write (6,'( ''cls1vls = '', 2g15.7  )') cls1vls
                !write (6,'( ''cls1vls2 = '', 2g15.7  )') cls1vls2 
                !!!!!
                          
                do ispn=1,nisospin
                    call opextau(i,j,invcwt(0,ispn),nsnew)        ! take spn as 0 here bc we only care about ispin exchange.             
                    newispn=invispin(nsnew)                      
                    do spn=0,nspin-1                      
                        fold=cwtold(spn,ispn)
                        signi=2*ibits(spn,i-1,1)-1
                        signj=2*ibits(spn,j-1,1)-1     
                        newspni=xor(spn,shiftl(1,i-1)) ! flip spin i
                        newspnj=xor(spn,shiftl(1,j-1)) ! flip spin j
                        cwtnew(newspni,ispn)=cwtnew(newspni,ispn)-(fx+signi*fyi)*fold
                        cwtnew(newspnj,ispn)=cwtnew(newspnj,ispn)-(fx+signj*fyi)*fold
                        cwtnew(spn,ispn)=cwtnew(spn,ispn)-fz*(signi+signj)*fold
                                     
                    !call opextau(i,j,invcwt(spn,ispn),nsnew)                    
                    !newispn=invispin(nsnew) 
                        
                        !write(6,*) 'isospin in prop:',i,j,ispn,newispn,nisospin
                        
                        cwtnew(newspni,newispn)=cwtnew(newspni,newispn)-(f2x+signi*f2yi)*fold
                        cwtnew(newspnj,newispn)=cwtnew(newspnj,newispn)-(f2x+signj*f2yi)*fold
                        cwtnew(spn,newispn)=cwtnew(spn,newispn)-f2z*(signi+signj)*fold                                
                    enddo   
                enddo                 
                !call vlsprop1(x0,x1,i,j,vls,sign,cwtold,cwtnew1)                   
		        !cwtnew(:,:)=cwtnew(:,:)+cwtnew1(:,:)	
	        enddo
        enddo
    else
        cwtnew(:,:)=cwtold(:,:)
    endif
    return
    end subroutine vlsprop


end module v6stepcalc

But I agree there can be more smart ways to do that, and using global variables make the code a little sloppy. But overall, I feel that modern Fortran is OK, and its grammar makes it harder to write a slow code.
Thank you all very much!

So, Iā€™ve tried to translate the Fortran code in Julia. This is what I got:

using LinearAlgebra

struct MeanCovar{T<:AbstractFloat}
	Ī¼::Matrix{T}
	Ļƒ::Matrix{T}
	w::T
end

function MeanCovar(Ī¼::AbstractVector{Tv}, Ļƒ::AbstractMatrix{Tm}, w::Tw) where {Tv, Tm, Tw}
    T = float(promote_type(Tv,Tm,Tw))
    dim_p = LinearAlgebra.checksquare(Ļƒ)
    length(Ī¼) == dim_p || error("Incompatible sizes")
    newĪ¼ = copyto!(Matrix{T}(undef, (dim_p, 1)), Ī¼)
    newĻƒ = Matrix{T}(Ļƒ)
    return MeanCovar{T}(newĪ¼, newĻƒ, w)
end

function MeanCovar{T}(dim_p::Integer) where {T}
	return MeanCovar{T}(zeros(T, (dim_p, 1)), Matrix{T}(I(dim_p)), zero(T))
end

MeanCovar(dim_p) = MeanCovar{Float64}(dim_p)

function init_mean_covar(
	kmix::Integer,
	dim_p::Integer,
	weight::AbstractVector{T},
	Ļƒ::AbstractArray{T,3},
	Ī¼::AbstractMatrix{T}
) where {T}
	Ī¼Ļƒ = [MeanCovar(@view(Ī¼[k,:]), @view(Ļƒ[k,1:dim_p,1:dim_p]), weight[k]) for k in 1:kmix]
	return Ī¼Ļƒ
end

What I donā€™t understand is why mu is a matrix with one column and not a vector but I kept that.

4 Likes

From what I see, mean_covar_init cannot be called multiple times in one program because thatā€™ll trigger an error in allocate. You may be a person who implicitly knows about that, otherwise it is not documented, and other may have troubles. The reason globals are frowned upon is primarily not the performance but that they introduce implicit rules of using everything else in the module.

3 Likes

Thank you very much grandmaster!

  1. You are absoutely right, mu should be a vector. I make it is n*1 matrix simply because I am sloppy.
  2. You are absoutely right, mean_covar_init can only be called once, and there is no need to call it twice. Because it is just doing initialization at the beginning.
    It is just like I read from a file about all the dimensions of the arrays i need to use in the program. Then at the beginning of the code I initialize/allocate the arrays just once. Then I just use the arrays, no need to reallocate them.

In Fortran, in the init function, I can add things like

if ( .not. allocated(a) ) allocate( a(100) )

so that even if I repeatedly call init function, it will not trigger error because a is already allocated as 100 element array at the beginning.

These init function only needs to be called once in the program.

Below is my module

module Mixture

export musigma, mean_covar_init

mutable struct Mean_covar

    mu::Array{Float64,2}

    sigma::Array{Float64,2}

    w::Float64

end

const musigma = Array{Mean_covar,1}() # const is global

function mean_covar_init(kmix::Int64,dim_p::Int64

                        ,weight::Array{Float64,1}

                        ,sigma::Array{Float64,3}

                        ,mu::Array{Float64,2})

    @assert length(weight) == kmix

    @assert size(sigma) == (kmix,dim_p,dim_p)

    @assert size(mu) == (kmix,dim_p)

    resize!(musigma, kmix) 

    for k in 1:kmix

        musigma[k] = Mean_covar(zeros(dim_p,1),zeros(dim_p,dim_p),0.0)

        musigma[k].mu[1,1] = mu[k,1]

        musigma[k].mu[2,1] = mu[k,2]

        musigma[k].sigma[1,1] = sigma[k,1,1]

        musigma[k].sigma[2,2] = sigma[k,2,2]

        musigma[k].w = weight[k]

    end

    return nothing

end

end

No, it has an undocumented feature that it can only be called once in the program without triggering an error.
It may be unimportant for your workflow here and now, but another person (including yourself in the future) might have a use case where that feature is an unpleasant surprise.
But imagine someone builds your module as a library, makes a Python binding and tries to process multiple files from Python.

Now, for Julia module, Iā€™m genuinely puzzled by your choice to keep the design with a global variable. Changing

resize!(musigma, kmix)

to

musigma = Vector{MeanCovar}(undef, kmix)

and getting rid of the global variable definition has a win of not using globals and has less LOC.
Here, you wonā€™t have memory management errors, but instead you limit the module to being only able of loading one dataset at any moment.

6 Likes

Thank you very much!
Uhm, may I ask, is there advantage using

musigma = Vector{MeanCovar}(undef, kmix)

over

resize!(musigma, kmix)

When initializing an array of non-isbits structs thatā€™s immediately overwritten with other structs, I would suggest neither - just use an array comprehension as suggested here:

In this line,

resize!(musigma, kmix) 

youā€™re creating a bunch of references to uninitialized arrays that are immediately overwritten here:

musigma[k] = Mean_covar(zeros(dim_p,1),zeros(dim_p,dim_p),0.0)

Itā€™s not a performance-killer, but it creates pointless repetition of work. With the array comprehension, the array bindings are created where and when theyā€™re needed.

Separately, you may want to investigate StaticArrays.jl for your mu and sigma arrays (provided dim_p is less than 10 or so), which will allow a more-compact isbits memory representation of Mean_covar and generation of more-efficient code thanks to the fixed array dimensions.

2 Likes

The first one creates a new array (which has then to be returned, of course), the second mutates a global variable which is not an explicit argument of the function.
Again, why do you keep using the global variable at all? Iā€™d assume that your client code uses that array without explicitly passing it as argument as well. While that isnā€™t great either, you can easily keep that behavior in client code by declaring

musigma = mean_covar_init(something) 

in the client code toplevel instead of library toplevel.
Further, the global array antipattern keeps you from using generic types. In my example, as you can see, mu and sigma within the struct can be arrays of single or half precision floats if you need that for performance on GPUs, or have extended precision if needed. Having a pre-defined array for each possible type parameter is just infeasible. An example of such difficulty even with immutable scalar globals is the definition of pi in the Julia standard library. It has to have a special type to be used efficiently in a generically-written code.

2 Likes

In that case, it is fine to use global caches since they are not exported, not used implicitly as return values etc. But again, it may be better to use the approach employed e.g. in the form of ā€œplansā€ in FFTW: create an object containing the pre-compiled cache for a specific problem and use it explicitly. In FFTW.jl, plans can be conveniently applied via the overloaded multiplication. Functions using caches are naturally implemented via structs with overloaded call operator. So, thereā€™s some design space to explore even for such tasks.