Many instruments write data in XML format, and I often find that I spend way too much time writing parsers to convert it to DataFrame.
Do we have a package that can take a file like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ExperimentResults>
<ExperimentInfo Name="Assay001">
<ExpDescription/>
<RunID>81EC636D</RunID>
</ExperimentInfo>
<AssayParamsInfo/>
<KineticsData>
<Step>
<CommonData>
<SampleLocation>1</SampleLocation>
<Temperature>20.6</Temperature>
</CommonData>
<CycleTime>0.2</CycleTime>
</Step>
<Step>
<CommonData>
<SampleLocation>2</SampleLocation>
<Temperature>20</Temperature>
</CommonData>
<CycleTime>0.2</CycleTime>
</Step>
<Step>
<CommonData>
<SampleLocation>9</SampleLocation>
<Temperature>20</Temperature>
</CommonData>
<CycleTime>0.2</CycleTime>
</Step>
</KineticsData>
</ExperimentResults>
and give a result like this:
3×5 DataFrame
Row │ ExperimentInfo_Name RunID SampleLocation Temperature CycleTime
│ String String Int64 Float64 Float64
─────┼───────────────────────────────────────────────────────────────────────
1 │ Assay001 81EC636D 1 20.6 0.2
2 │ Assay001 81EC636D 2 20.0 0.2
3 │ Assay001 81EC636D 9 20.0 0.2
without having to explicitly point to the individual elements with XPath like this;
using EzXML, DataFrames
doc = EzXML.readxml("ex1.xml")
DataFrame( ExperimentInfo_Name = EzXML.attributes(findfirst("//ExperimentInfo", doc))[1].content,
RunID = findfirst("//RunID", doc).content,
SampleLocation = map(x->x.content, findall("//SampleLocation", doc)),
Temperature = map(x->x.content, findall("//Temperature", doc))
)
It would be great to have a generic function that will just do
XML2DataFrame(“ex1.xml”) and return the result.