Iβm trying to use LightXML to extract this, but I canβt find the command to extract this. Iβd like to extract this into a DataFrame but an array of Dicts would be OK also.
this code produces a dataframe from a xml structured in that way
using LightXML, DataFrames
#length of an unkwown iterable, maybe implementing this in LIghtXML would help
function ilength(iter)
n=0
for i in iter
n+=1
end
return n
end
function xml_parse_main()
obj_vector = xroot["Obj"] #obtain all Obj in the xml
attr_length = ilength(attributes(obj_vector[1]))
#all xml things are strings, so we are gonna store all as a string first and transform later
attr_vector = Vector{String}(undef,attr_length)
for (i,a) in enumerate(attributes(obj_vector[1]))
attr_vector[i] = name(a)
end
datamatrix = Matrix{String}(undef,length(obj_vector),attr_length)
for (i,ai) in enumerate(attributes(obj_vector[1]))
for j in 1:length(obj_vector)
datamatrix[j,i] = value(ai)
end
end
return (attr_vector,datamatrix)
end
titles,data = xml_parse_main()
df = DataFrame(titles)
names!(df, Symbol.(titles))
the result is this:
4Γ4 DataFrame
β Row β Time β attr1 β attr2 β attr3 β
β β String β String β String β String β
βββββββΌββββββββββββββββββββββββββββΌβββββββββΌβββββββββΌβββββββββ€
β 1 β 2019-09-01T23:40:13+08:00 β L β 1 β a β
β 2 β 2019-09-01T23:40:13+08:00 β L β 1 β a β
β 3 β 2019-09-01T23:40:13+08:00 β L β 1 β a β
β 4 β 2019-09-01T23:40:13+08:00 β L β 1 β a β
probably and surely, the next steps is converting all the values to the corresponding types (DateTime, char,Int64,char)
EzXML.jl supports XPath and thus it is very easy to extract <Obj> tags. Also, you donβt need to care about memory leak because EzXML.jl does memory management for you.
Had a similar situation, and just share my solution here in case it can be useful for others.
I need the numerical values to be parsed as Float64.
import EzXML, DataFrames
function XPathDataFrame(path,doc)
elements = findall(path, doc)
reduce(vcat, DataFrames.DataFrame.(nodeDict.(elements)))
end
function nodeDict(n::EzXML.Node)
my_keys = map(x -> x.name, EzXML.attributes(n))
my_vals = map(my_keys) do x
if tryparse(Float64, n[x]) == nothing
return(n[x])
end
parse(Float64, n[x])
end
Dict{String, Union{String, Float64}}(zip(my_keys,my_vals))
end
Then we get
doc = EzXML.readxml("test.xml")
XPathDataFrame("//Obj",doc)
4Γ4 DataFrame
Row β Time attr1 attr2 attr3
β String String Float64 String
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 2019-09-01T23:40:13+08:00 L 1.0 a
2 β 2019-09-01T23:40:14+08:00 R 1.0 a
3 β 2019-09-01T23:40:15+08:00 L 11.0 d
4 β 2019-09-01T23:40:16+08:00 L 13.0 c