Simplify JSON while deserializing into struct with StructTypes.jl

Hi everyone,

I have a relatively complex and nested JSON object (representing a patent application) which I would like to deserialize into a struct.
However, I would like to “de-nest” the structure somewhat and between the different options provided by the awesome StructTypes.jl, such as wrapper types or mapping field names, I am still unsure what a feasible solution looks like.

Here’s a simplified example of what the JSON look like:

    {
        "biblio": {
            "parties": {
                "applicants": [
                    {
                        "extracted_name": {
                            "value": "REAVELEY LAWRENCE D"
                        }
                    },
                    {
                        "extracted_name": {
                            "value": "BRYANT MARK"
                        }
                    }
                ],
                "inventors": [
                    {
                        "extracted_name": {
                            "value": "REAVELEY LAWRENCE D"
                        }
                    },
                    {
                        "extracted_name": {
                            "value": "BRYANT MARK"
                        }
                    }
                ]
            }
        }
    }

And this would be the corresponding target struct:

mutable struct Test
    applicants::Vector{String}
    inventors::Vector{String}
    Test() = new()
end

StructTypes.StructType(::Type{Test}) = StructTypes.Mutable()

Does anyone have an idea how to best approach this?

Firstly, it’s not clear to me why your struct is mutable and what the point of the internal constructor is. We can simplify to

struct Test
    applicants::Vector{String}
    inventors::Vector{String}
end

The naive approach would be an external constructor

get_names(dat) = map(x -> x["extracted_name"]["value"], dat)

Test(parties) = Test(get_names(parties["applicants"]), get_names(parties["inventors"]))

Thanks for the quick reply. As I understand it, the mutable struct and the inner constructor is StructTypes.jl’s way of rubustly dealing with e.g. missing fields in the JSON (see the docs).
I guess doing the parsing and dealing with exceptions manually would be a possibility, but I would rather use the StructTypes machinery as the actual json is much more complex and the manual route would probably be more error prone and less efficient.

I see. Sorry, I’m not experienced with StructTypes.jl. I wonder though if that doesnt cause a lot of problems because you’re splitting up parsing and validation…

I do not think it is possible now, at least there was no replies in this issue: https://github.com/JuliaData/StructTypes.jl/issues/10

On the other hand, I doubt that it would be easy to work with vector types anyway.

So, probably the only way you have now is to define all intermediate structures and then flatten them by hand.

2 Likes

I see, thanks. Creating all the intermediate structs was my first approach and JSON3.@generatetypes actually turned that from a very tedious task into a breeze. But parsing directly into the final structure would have been more elegant.

1 Like

GitHub - JuliaData/Strapping.jl: Tools for mapping between Julia structs and 2D tabular data. might be useful; I’ve tried it a couple times and ran into some issues which I think have been fixed now.

Strapping.jl looks nice but in this case neither the input JSON nor the targeted output struct are really flat/tabular - it’s just that the JSON is more nested than necessary for the target struct so I would like to skip some of the nesting levels while keeping others.

Yeah, Strapping.jl works well when you need to completely unnest, but it’s certainly tricky when you only want to selectively unnest. I dealt with this once in a project, but not quite as complex as your case; in my case, I just did JSON3.read(json), which returns a JSON3.Object. In StructTypes, we have a StructTypes.constructfrom(T, x) method that allows constructing T from x, where x is some kind of AbstractDict (like JSON3.Object).

Then you just define your own constructing machinery that does something like:

function unnest(json)
    x = JSON3.read(json)
    # pull the pieces out of x that are needed
end
3 Likes

Thanks, I’ll give that a try.