Accessing values from XML files using XML.jl

Hello, I’m struggling to understand how to use XML.jl to read a file and acquire data from it by using the tags at specific depths.
Here is an abbreviated version of the XML that contains the basic structure of the XML file that I am currently trying to acquire data from.

The basic target here is to get the values and from the IcpmsElement.
From there I should be able to get the premise of how it works.

<?xml version="1.0" encoding="utf-8"?>
<AcquisitionDataSet SchemaVersion="65555" DataVersion="55" SIVersion="D.1.1.653.14" AuditTrail="false" UserEdited="true" EstimatedBatchSize="608249778" BatchDataPath="C:\ExampleDirectory\ExampeBatch .b" PresetMethodType="" InstrumentType="ICPQQQ" ModuleID="4" HashCode="12C1869425524D04E71E272EA6C48A155AFF2BFC332A83710CA801C4DE8E4D6A2E94F5F1FD1C3550E5725435BD3162B030D304D8D23918E790249F2F24EF4537" xmlns="Acquisition">
  <AcquisitionMethod>
    <AcqID>-1</AcqID>
    <Replicate>3</Replicate>
    <AcqMode>TRA</AcqMode>
    <IsotopeAnalysis>false</IsotopeAnalysis>
    <PeakPattern>3</PeakPattern>
    <CreatedFromPresetMethod>false</CreatedFromPresetMethod>
    <AutoOptimizationSelectable>true</AutoOptimizationSelectable>
  </AcquisitionMethod>
  <IcpmsElement>
    <AcqID>-1</AcqID>
    <TuneID>1</TuneID>
    <ElementID>0</ElementID>
    <MZ>39</MZ>
    <SelectedMZ>39</SelectedMZ>
    <DetectorMode>Auto</DetectorMode>
    <IntegrationTime>0.002</IntegrationTime>
    <ElementName>K</ElementName>
    <AcqMonitor>true</AcqMonitor>
  </IcpmsElement>
  <IcpmsElement>
    <AcqID>-1</AcqID>
    <TuneID>1</TuneID>
    <ElementID>1</ElementID>
    <MZ>24</MZ>
    <SelectedMZ>24</SelectedMZ>
    <DetectorMode>Auto</DetectorMode>
    <IntegrationTime>0.002</IntegrationTime>
    <ElementName>Mg</ElementName>
    <AcqMonitor>false</AcqMonitor>
  </IcpmsElement>
  <PAFDetectorElement>
    <AcqID>-1</AcqID>
    <TuneID>0</TuneID>
    <PAFDetElementID>77</PAFDetElementID>
    <MZ>39</MZ>
    <PAReference>0.1227</PAReference>
  </PAFDetectorElement>
</AcquisitionDataSet>

As such the very basic XML Layout here will be:

<?xml version="1.0" encoding="utf-8"?>
<AcquisitionDataSet key=val>
  <child_element_d2>
      <child_element_d3>value</child_element>
  </child_element_d2>
</AcquisitionDataSet>

Now given there will be multiple <IcpmsElement> children of <AcquisitionDataSet> I realise I will need to loop through them (I will need all of those anyway), but I only want the values of specific children of <IcpmsElement>.

I can access the value of the first by:

f = read("ExampleFile.xml", LazyNode)

f[3][2][4][1].value
"39"

And create a simple loop that would be accessing all the values I want to acquire:

for i in findall(==("IcpmsElement"), tag.(f[3][:]))
    println("$(f[3][i][8][1].value): $(f[3][i][4][1].value)")
end
K: 39
Mg: 24
Y: 105

But is it possible to to access them using the tags instead? E.g.:

for i in findall(==("IcpmsElement"), tag.(f["AcquisitionDataSet"][:]))
    println("$(f["AcquisitionDataSet"][i]["ElementName"][1].value): $(f[["AcquisitionDataSet"][i]["MZ"][1].value)")
end

Or am I going about this the completely wrong way?

I’ve been using EzXML.jl instead and it has a nice XPath feature, which simplifies getting these nodes somewhat:

julia> using EzXML

julia> xml = readxml("test.xml")
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000386ae31b0>))

julia> findall("//AcquisitionDataSet/IcpmsElement", xml)
2-element Vector{EzXML.Node}:
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386ae55a0>)
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386af8940>)

julia> findall("//IcpmsElement", xml)
2-element Vector{EzXML.Node}:
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386ae55a0>)
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386af8940>)

Note that it would somehow only work once I removed the xmlns attribute from the root node, I don’t really get what it’s for anyway but it results in a warning at parse time and then the queries return nothing.

Ah it seems if the namespace is there you have to do a special version of the query like:

julia> xml = readxml("test.xml")
┌ Warning: XMLError: xmlns: URI Acquisition is not absolute from XML Namespace module (code: 100, line: 2)
└ @ EzXML ~/.julia/packages/EzXML/qbIRq/src/error.jl:97
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000386aa3430>))

julia> findall("//x:AcquisitionDataSet/x:IcpmsElement", xml.root, ["x" => "Acquisition"])
2-element Vector{EzXML.Node}:
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386ae5620>)
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386af64d0>)

julia> findall("//x:IcpmsElement", xml.root, ["x" => "Acquisition"])
2-element Vector{EzXML.Node}:
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386ae5620>)
 EzXML.Node(<ELEMENT_NODE[IcpmsElement]@0x0000000386af64d0>)

@jules thanks for that reply. I’m trying to steer away from libraries with external dependencies (granted EzXML bundles the libxml2 via XML2_jll.jl) to keep as much of the codebase in the Julia ecosystem as possible for performance and debugging purposes.

EzXML isn’t off the table though if needed.

1 Like

I’m away at the moment so can’t currently access Julia. These answers are from memory…

c=XML.children(f["AcquisionDataSet"] will return a dict of all children nodes in f. You can then iterate through the elements of c.

for el in c
    if XML.tag(el) == "IcpmsElement" 
        if haskey(el, "MX") 
            myMx = XML.simple_value(el["MX"])
        end
    end
end 

Of course, you’ll have multiple IcpmsElement nodes and so multiple sets of parameters to keep track of, too.

Hope this makes sense and isn’t too far from being right!
Edit: editor on my phone inserts spaces where I don’t want them,