Solvent Accessible Surface Area

For anyone that might be interested, we implemented a molecular Solvent Accessible Surface Area calculator in PDBTools.jl:

The interface is useful (for us) for allowing extracting contributions to the SASA very easily:

julia> using PDBTools

julia> pdb = wget("1BSX", "chain A");

julia> sasa_per_atom = atomic_sasa(pdb) # array of atoms containing SASA per atom 
   Vector{Atom{PDBTools.SASA}} with 1905 atoms with fields:
   index name resname chain   resnum  residue        x        y        z occup  beta model segname index_pdb
       1    N     LYS     A      211        1   52.884   24.022   35.587  1.00 53.10     1                 1
       2   CA     LYS     A      211        1   52.916   24.598   36.993  1.00 53.10     1                 2
       3    C     LYS     A      211        1   52.017   23.754   37.944  1.00 53.10     1                 3
       4    O     LYS     A      211        1   51.522   22.675   37.539  1.00 53.10     1                 4
â‹®
    1902   CG     ASP     A      461      243   16.862   50.003   45.475  1.00 97.43     1              1902
    1903  OD1     ASP     A      461      243   16.488   49.747   44.304  1.00 97.43     1              1903
    1904  OD2     ASP     A      461      243   17.538   51.009   45.748  1.00 97.43     1              1904
    1905  OXT     ASP     A      461      243   14.506   47.082   47.528  1.00 97.43     1              1905

julia> sasa(sasa_per_atom) # all atoms (Angs^2)
11900.575f0

julia> sasa(sasa_per_atom, "backbone") 
2391.8325f0

julia> sasa(sasa_per_atom, "sidechain and resname GLU") 
1632.0673f0

This can also be useful to run calculations on trajectories within Julia, like in the following example.

SASA on trajectory
using MolSimToolkit # uses Chemfiles.jl under the hood
using PDBTools
using ProgressMeter
using DelimitedFiles

function sasa_traj(sim)
    prot = select(atoms(sim), "protein")
    index_prot = index.(prot)
    sasa_total = Float32[]
    @showprogress for frame in sim
        p = @view(positions(frame)[index_prot])
        for i in eachindex(prot, p)
            prot[i].x = p[i].x
            prot[i].y = p[i].y
            prot[i].z = p[i].z
        end
        sasa_atoms = atomic_sasa(prot; n_dots=100)
        push!(sasa_total, sasa(sasa_atoms))
    end
    return sasa_total
end

# Rodar com julia -t auto --project sasa.jl
function (@main)(ARGS)
    sim = Simulation(
        "/home/leandro/Documents/vinicius/EMIMClBF4/3.00/08/system.pdb",
        "/home/leandro/Documents/vinicius/EMIMClBF4/3.00/08/processed.xtc"
    )
    sasa_total = sasa_traj(sim)
    writedlm("sasa_julia.txt", sasa_total)
end

Performance is “ok” (something like 2x slower than what Gromacs or VMD tools provide, considering the same “precision” - number of dots - used).

One thing is that moving the calculations out of PDBTools.jl into an independent package is very easy, and probably something I will do in the future, such that other packages, like BioStructures.jl, can just plug it in with their own atom data structures.

4 Likes

Update:

After the help from you, now we are in a different ballpark here.

For small structures the SASA here implemented takes similar time than Gromacs and VMD.

For large structures, it is much faster than Gromacs, and similar to VMD without the julia startup time (6co8.pdb is a ~700k virus capsid):

Gromacs:

$ time gmx sasa -s ./6co8.pdb -o sasa_output.xvg -ndots 100
real    1m15,536s
user    1m15,412s
sys     0m0,123s

VMD:

$ time vmd -dispdev text -e sasa_big.vmd
real    0m3,941s
user    0m3,716s
sys     0m0,543s

SASA from PDBTools.jl:

$ time julia --project -e "using PDBTools; sasa(atomic_sasa(read_pdb(\"6co8.pdb\")))"
real    0m7,134s
user    0m8,000s
sys     0m0,400s

#or
julia> @b sasa(atomic_sasa(pdb; parallel=false))
3.635 s (40045204 allocs: 1.186 GiB, 17.71% gc time, without a warmup)

To be released soon in PDBTools.jl 3.5.2

(all these are single-threaded, I’m not sure if VMD is using the GPU)

2 Likes

Looks like a useful tool!

1 Like