I have a lot of XML-files (gzipped - about 1.5Gb in size) containing approximately 400000 elements
in which I am interested.
The structure of the file:
<records> <REC> ... </REC> etc. </records>
I have first looked at LightXML but could not figure out how to do streaming. Then I tried EzXML in this way (which worked for a few records, but then just hangs after a random number of records in the
using EzXML using CodecZlib function eenrekord(x) r = parsexml(string(x)) # This is where the process hangs - sometimes after 20 - 80 iterations. ut = nodecontent(findfirst(root(r), "//UID")) ut, string(r) end function xml_records(gzipfile,tabel,zipfile) l = GzipDecompressorStream(open(gzipfile)) reader = EzXML.StreamReader(l) count = 0 rcount = 0 conn = dbcon() list =  while !done(reader) if nodetype(reader) == 1 && nodename(reader) == "REC" rcount += 1 if rcount % 2 == 1 x = expandtree(reader) ut, xml = eenrekord(x) push!(list, [zipfile, gzipfile, ut, xml]) count +=1 if count % 20000 == 0 println(count) # write_list_sql_table - another function not shown here. list =  end end end end if telling % 20000 != 0 println(count) # write_list_sql_table - another function not shown here. end close(conn) end gzipfile = "/tmp/some.xml.gz" zipfile = "somezipfile.zip" tabel = "core_2018" xml_records(gzipfile,tabel,zipfile)
I have looked at LibExpat.jl of which the README indicates streaming from a file using
xp_streaming_parsefile but I have no idea how to use it and I could not any example using it.
At this stage I just want to put
<REC> </REC> as a XML-type record in the SQL table from where I will use PostgreSQL’s XPATH-capabilities to manipulate it. In total it will be about 60 million records I have to process in this way.
I will appreciate any references to good examples or tutorials on how to do this.
And any comments about mistakes I made in the code shown above.