Issue with Atom Selection mddf calculation (ComplexMixtures.jl from python)

Hi @lmiq

I’m encountering an issue when calculating the center of mass using cm.select(). Specifically, it seems that the selection is not correctly including all the specified residues.

For example, whether I use:
protein = cm.select(atoms, “protein and resnum 7”)

or
protein = cm.select(atoms, “protein and resnum 7 8 9 391 392 393 72”)

I observe that a similar number of solute atoms are being selected. Upon closer inspection, it looks like only the first residue ID is being considered during the calculation, even when multiple IDs are specified.

I’ve tried using a comma-separated list and other formats, but nothing seems to change the behavior. The script below runs without errors, but the selection appears incorrect:
atoms = cm.readPDB(“*.pdb”)
protein = cm.select(atoms, “protein and resnum 7 8 9 391 392 393 72”)
bgc = cm.select(atoms, “resname bgc”)

solute = cm.AtomSelection(protein, nmols=1)
solvent = cm.AtomSelection(bgc, natomspermol=24) # 24 atoms per BGC molecule

trajectory = cm.Trajectory(“../*.xtc”, solute, solvent)
options4 = cm.Options(bulk_range=(8.0, 12.0))
results = cm.mddf(trajectory, options4)

cm.save(results, “./save.json”)
print(“Results saved to save.json”)

Could you please advise on the correct syntax for selecting multiple residues, or let me know if this is a known issue?

Best regards,
Sneha Sahu

1 Like

Thanks for the feedback!

Actually that is a limitation of the selection syntax we implemented so far in the PDBTools.jl package, which is a dependency of ComplexMixtures.jl.

The best alternative is to provide, to the select function, an anonymous Julia function, as:

using PDBTools
protein = select(atoms, at -> isprotein(at) && resnum(at) in (7, 8, 9, 391, 392, 393, 72))

The at -> .... should be read as "given the atom at, return true if the at satisfies conditions provided, and && means “and”.

You can also define your own functions to select interesting things, and then use them as arguments to the select function, such as:

using PDBTools
select_my_residues(at) = isprotein(at) && resnum(at) in (7, 8, 9, 391, 392, 393, 72)
protein = select(atoms, select_my_residues)

FWIW, the issue there is that, using the string selection, you would need to write "resnum 7 or resnum 8 or resnum 9 ... etc". which is of course anoying in this case.

Let me know if this solves your issue.

Ps. The select and read_pdb functions are from the PDBTools package, and I’m sort of suprised that read_pdb("*.pdb") is working for you, I would expect an explicit file name there.

The use of Julia functions to define selections is documented here: Selections · PDBTools.jl

Dear Leandro

Thanks a lot for your quick response, i will try that first thing in the morning.

Best regards
Sneha

1 Like

Dear Leandro,

I’m explicitly giving pdb file name in read_pdb(“*.pdb”), and I’m using Python for the job. Can you please further suggest how to achieve multiple residue selection in Python?

Thank you!
Best regards
Sneha

One alternative is to use a python function to create the required string. For example, here it could be:

import ComplexMixtures as cm
atoms = readPDB("system.pdb")
protein = cm.select(atoms, "protein")

def select_my_residues(protein) :
    s = "resnum 7"
    for i in (8, 9, 391, 392, 393, 72) :
        s += " or resnum " + str(i)
    return cm.select(protein, s)

my_residues = select_my_residues(protein)

the select_my_residues function just creates the string "resnum 7 or resnum 8 or resnum 9... etc" and uses it to select the corresponding atoms from the protein object, which contains the protein atoms.

I’ll see if we can provide better alternatives.

Hi again. I’ve made some updates on the packages that might be helpful.

First, update the packages with (in python):

import juliacall as jl
jl.Pkg.update()

You should get ComplexMixtures.jl version 2.13.0 and PDBTools.jl version 3.0.0 among the updates.

Then, update (copy the new version) of the ComplexMixtures.py python module (updated in the manual here).

The script is mostly identical. But the news is that, if you have VMD installed, you can use all the features of the VMD selection syntax with the cm.select_with_vmd function. For example:

>>> import ComplexMixtures as cm
>>> atoms = cm.read_pdb("system.pdb")
>>> my_residues = cm.select_with_vmd(atoms, "resid 7 8 9 391 392 393 72")
>>> my_residues
Julia: [ Atom(95N-GLY7P), Atom(96HN-GLY7P), Atom(97CA-GLY7P), Atom(98HA1-GLY7P)…

The selection string resid ... is just the exact same string that would be used within VMD to select the desired subgroup.

I hope this helps. The scripts in the manual where updated to provide this information.

Hi Leandro

Thank you so much for fixing the errors and updating the app — really appreciate your support!:grin:

Best,
Sneha

1 Like

FWIW, now the PDBTools.jl package is updated (to version 3.1.0) and the selection syntax is more powerful, accepting the original code you posted here, and also parentheses, etc. You could do now:

cm.select(atoms, "(residue 1 3 7 5) and backbone")
#or
cm.select(atoms, "protein and resnum 2 7 39 81")

for example.

Just update the packages using

import juliacall as jl
jl.Pkg.update()