XML parsing with Requests + LibExpat ... OS-dependent?


#1

Seems like LibExpat or Requests has some kind of OS-dependent behavior that’s utterly breaking my XML parsing. An example follows; works in Windows, pukes in Linux. Could someone please tell me what can be done to parse such an XML file in an OS-independent way? Could someone please explain what the problem is that’s causing this OS-dependent behavior?

Thank you.

MWE

using Requests
using LibExpat
url = "http://service.iris.edu/fdsnws/event/1/query?starttime=2011-03-11T05:37:00&endtime=2011-03-11T05:57:00&minmag=7.0&maxmag=9.9&format=xml"
R = get(url)
xstr = String(IOBuffer(R.data))
tmp = LibExpat.xp_parse(xstr)

LINUX RESULTS (Ubuntu 16.04, Julia 0.5.0)
ERROR: "BoundsError(Union{AbstractString,LibExpat.ETree}[],(1,)), , 0, 0, 0" in xp_parse(::String) at /home/josh/.julia/v0.5/LibExpat/src/LibExpat.jl:276

WINDOWS RESULTS (Windows 10, Julia 0.5.0)

<q:quakeml xmlns:q="http://quakeml.org/xmlns/quakeml/1.2" xmlns:catalog="http://anss.org/xmlns/catalog/0.1" xmlns="http://quakeml.org/xmlns/bed/1.2">
 <eventParameters publicID="quakeml:nc.anss.org/Event/NC/71541116#148486928789">
  <creationInfo>
   <agencyID>NC</agencyID>
   <creationTime>2017-01-19T23:41:27.89</creationTime>
  </creationInfo>
  <event catalog:eventid="71541116" catalog:datasource="nc" publicID="quakeml:nc.anss.org/Event/NC/71541116" catalog:eventsource="nc" catalog:dataid="nc71541116">
   <preferredOriginID>quakeml:nc.anss.org/Origin/NC/7205099</preferredOriginID>
   <preferredMagnitudeID>quakeml:nc.anss.org/Netmag/NC/4276694</preferredMagnitudeID>
   <creationInfo>
    <agencyID>NC</agencyID>
    <creationTime>2011-03-11T18:04:37.00</creationTime>
    <version>4</version>
   </creationInfo>
   <type>earthquake</type>
   <origin catalog:eventid="71541116" catalog:datasource="nc" publicID="quakeml:nc.anss.org/Origin/NC/7205099" catalog:eventsource="nc" catalog:dataid="nc71541116">
    <time>
     <value>2011-03-11T04:51:24.93</value>
    </time>
    <timeFixed>0</timeFixed>
    <latitude>
     <value>35.3568333</value>
    </latitude>
    <longitude>
     <value>-118.5495</value>
    </longitude>
    <epicenterFixed>0</epicenterFixed>
    <depth>
     <value>9314</value>
     <uncertainty>810</uncertainty>
    </depth>
    <depthType>from location</depthType>
    <type>hypocenter</type>
    <evaluationMode>manual</evaluationMode>
    <evaluationStatus>final</evaluationStatus>
    <creationInfo>
     <agencyID>NC</agencyID>
     <creationTime>2011-03-11T18:04:34.00</creationTime>
    </creationInfo>
    <originUncertainty>
     <confidenceEllipsoid>
      <semiMajorAxisLength>1944</semiMajorAxisLength>
      <semiMinorAxisLength>504</semiMinorAxisLength>
      <semiIntermediateAxisLength>984</semiIntermediateAxisLength>
      <majorAxisPlunge>89</majorAxisPlunge>
      <majorAxisAzimuth>279</majorAxisAzimuth>
      <majorAxisRotation>210</majorAxisRotation>
     </confidenceEllipsoid>
     <preferredDescription>confidence ellipsoid</preferredDescription>
     <confidenceLevel>95</confidenceLevel>
     <horizontalUncertainty>410</horizontalUncertainty>
    </originUncertainty>
    <methodID>smi:nc.anss.org/origin/HYP2000_m2g</methodID>
   </origin>
   <magnitude publicID="quakeml:nc.anss.org/Netmag/NC/4276694">
    <mag>
     <value>3.36</value>
     <uncertainty>.115</uncertainty>
    </mag>
    <type>Ml</type>
    <originID>quakeml:nc.anss.org/Origin/NC/7205099</originID>
    <stationCount>32</stationCount>
    <azimuthalGap>80.9</azimuthalGap>
    <evaluationMode>manual</evaluationMode>
    <evaluationStatus>reviewed</evaluationStatus>
    <creationInfo>
     <agencyID>NC</agencyID>
     <creationTime>2011-03-11T18:04:37.00</creationTime>
    </creationInfo>
    <methodID>smi:nc.anss.org/magnitude/CISNml2</methodID>
   </magnitude>
  </event>
  <event catalog:eventid="71540846" catalog:datasource="nc" publicID="quakeml:nc.anss.org/Event/NC/71540846" catalog:eventsource="nc" catalog:dataid="nc71540846">
   <preferredOriginID>quakeml:nc.anss.org/Origin/NC/7205529</preferredOriginID>
   <preferredMagnitudeID>quakeml:nc.anss.org/Netmag/NC/4277044</preferredMagnitudeID>
   <creationInfo>
    <agencyID>NC</agencyID>
    <creationTime>2011-03-12T01:03:14.00</creationTime>
    <version>3</version>
   </creationInfo>
   <type>earthquake</type>
   <origin catalog:eventid="71540846" catalog:datasource="nc" publicID="quakeml:nc.anss.org/Origin/NC/7205529" catalog:eventsource="nc" catalog:dataid="nc71540846">
    <time>
     <value>2011-03-10T15:56:24.75</value>
    </time>
    <timeFixed>0</timeFixed>
    <latitude>
     <value>36.0316667</value>
    </latitude>
    <longitude>
     <value>-117.4256667</value>
    </longitude>
    <epicenterFixed>0</epicenterFixed>
    <depth>
     <value>5531</value>
     <uncertainty>1030</uncertainty>
    </depth>
    <depthType>from location</depthType>
    <type>hypocenter</type>
    <evaluationMode>manual</evaluationMode>
    <evaluationStatus>final</evaluationStatus>
    <creationInfo>
     <agencyID>NC</agencyID>
     <creationTime>2011-03-12T01:03:14.00</creationTime>
    </creationInfo>
    <originUncertainty>
     <confidenceEllipsoid>
      <semiMajorAxisLength>4416</semiMajorAxisLength>
      <semiMinorAxisLength>912</semiMinorAxisLength>
      <semiIntermediateAxisLength>2688</semiIntermediateAxisLength>
      <majorAxisPlunge>34</majorAxisPlunge>
      <majorAxisAzimuth>85</majorAxisAzimuth>
      <majorAxisRotation>2</majorAxisRotation>
     </confidenceEllipsoid>
     <preferredDescription>confidence ellipsoid</preferredDescription>
     <confidenceLevel>95</confidenceLevel>
     <horizontalUncertainty>1530</horizontalUncertainty>
    </originUncertainty>
    <methodID>smi:nc.anss.org/origin/HYP2000_m2g</methodID>
   </origin>
   <magnitude publicID="quakeml:nc.anss.org/Netmag/NC/4277044">
    <mag>
     <value>3.08</value>
     <uncertainty>.152</uncertainty>
    </mag>
    <type>Ml</type>
    <originID>quakeml:nc.anss.org/Origin/NC/7205529</originID>
    <stationCount>5</stationCount>
    <azimuthalGap>175.8</azimuthalGap>
    <evaluationMode>manual</evaluationMode>
    <evaluationStatus>reviewed</evaluationStatus>
    <creationInfo>
     <agencyID>NC</agencyID>
     <creationTime>2011-03-12T01:03:14.00</creationTime>
    </creationInfo>
    <methodID>smi:nc.anss.org/magnitude/CISNml2</methodID>
   </magnitude>
  </event>
 </eventParameters>
</q:quakeml>

#2

This is not an answer to your question, but a suggestion that may solve your problem.

EzXML.jl is also a cross-platform package supporting XML parsing and handling, including XPaths, namespaces, and streaming parsing. So, if you’ve never tried it, I think it is worth trying EzXML.jl as well (my opinion is totally biased because I made it :wink:).

julia> using EzXML

julia> url = "http://service.iris.edu/fdsnws/event/1/query?starttime=2011-03-11T05:37:00&endtime=2011-03-11T05:57:00&minmag=7.0&maxmag=9.9&format=xml"
"http://service.iris.edu/fdsnws/event/1/query?starttime=2011-03-11T05:37:00&endtime=2011-03-11T05:57:00&minmag=7.0&maxmag=9.9&format=xml"

julia> res = get(url)
Response(200 OK, 10 headers, 2033 bytes in body)

julia> parsexml(res.data)
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x00007f7f8ad3ca30>))

#3

I’m sorry, but switching XML parsers is not an option. I realize that there are at least three known Julia XML parsers, but I don’t have the time or patience to recode half a dozen functions in a fairly substantive package right now.


#4

Notably, upon looking into this a little more, LibExpat returns an empty ETree for that particular XML file when I do the C call manually:

julia> xph = LibExpat.xp_make_parser()
LibExpat.XPHandle(Ptr{Void} @0x0000000004611b90,</>,false)

julia> test_xml = LibExpat.XML_Parse(xph.parser, xstr, sizeof(xstr), 1)
1

julia> xph.pdata.elements
0-element Array{Union{AbstractString,LibExpat.ETree},1}

So this is what’s causing the error. Yet xstr is exactly the same in Windows and Linux, so I can only conclude that the problem lies with XML_Parse in Linux. I know that this XML does parse correctly with other implementations of LibExpat because LibExpat is literally the standard for this data format. What I don’t know is what’s different about the Julia implementation that causes breakage in Linux but not Windows.


#5

LibExpat version differences maybe? This works for me on juliabox which I believe is Ubuntu 14.04.


#6

Maybe? Both LightXML and EzXML parse this without difficulty. LibExpat alone fails, and only in Ubuntu (…and only 16.04, from what you just wrote). So if there’s some hidden dependency that Ubuntu botched, I’d love to know what it is.


#7

Create a C repro, try stepping through it in gdb to see if anything interesting jumps out at you, and report as a bug either to libexpat’s mailing list (if they have one) or to the ubuntu package?


#8

Well, deleting and reinstalling Julia through apt fixed the problem, whatever it was, but apparently I can’t precompile if I want to use LibExpat … ? Is that correct?


#9

I don’t see how reinstalling Julia would make much difference here, but okay…

What are you referring to about precompilation? Can you rephrase the question to be more specific?


#10

I also didn’t expect reinstalling to make a difference.

Regarding precompile: as far as I can tell, LibExpat doesn’t set __precompile__(true). So, if I want to create a module that uses LibExpat, no precompile is possible. Right? Or is there a workaround?


#11

It doesn’t explicitly set __precompile__(false) so there’s a chance it
could work if you’re lucky, I would have to test and see whether the
package is doing anything that would break if precompiled.


#12

I tried that before even writing my last reply. No luck. Results in a warning and module fails to precompile. So maybe I need a different XML parser, after all. That’s more than a little disappointing; the association with JuliaIO implies this is supposed to be the “official” XML parser, and yet, it seems so incredibly fragile…