[pre-ANN] RestClient.jl

logotype

Don’t stress about web APIs

I’ve had to deal with a bunch of Web APIs as of late, and because I was getting fed up with the boilerplate I’ve put together RestClient.jl, which aims to make interfacing with Web APIs as simple as reasonably possible, with sensible out-of-the-box behaviour.

To that end, it takes care of URI encoding and rate limiting, comes with basic debugging utilities, and makes serialisation/deserialisation as simple as slapping @jsondef / @xmldef in front of a struct (with an optional syntax for specifying what JSON/XML element should map to a field).

A great place to start is by looking at the tutorial, but here’s an example from the readme for what it takes to wrap a JSON API:

using RestClient, JSON3

@globalconfig RequestConfig("https://api.sampleapis.com/coffee")

@jsondef struct Coffee
   title::String
   description::String
   ingredients::Vector{String}
   image::String
   id::Int
end

@endpoint hot() -> "hot" -> Vector{Coffee}
@endpoint iced() -> "iced" -> Vector{Coffee}

To give an XML example too:

using RestClient, XML

@globalconfig RequestConfig("https://boardgamegeek.com/xmlapi2")

@xmldef struct SearchItem
    type."@type"::Symbol
    id."@id"::Int
    name."name[1]/@value"::String
    year."yearpublished[1]/@value"::Union{Int, Nothing}
end

@xmldef struct SearchResponse <: ListResponse{SearchItem}
    items."items[1]/item"::Vector{SearchItem}
end

@endpoint search(query::String) -> "search?{query}" -> SearchResponse

All these macros are shorthand for creating methods and types for the RestClient request/response process, which looks a bit like this:

Invocation: apido(x) -> perform(Request(...)):

         ╭─╴config╶────────────────────────────╮
         │     ╎                               │
         │     ╎        ╭─▶ responsetype ╾─────┼────────────────┬──▶ dataformat ╾───╮
Request╶─┤     ╰╶╶╶╶╶╶╶╶│                      │                ╰─────────╮         │
         │              ├─▶ pagename ╾───╮     │      ┌┄┄┄┄debug┄┄┄┐      │  ╭──────╯
         │              │                ├──▶ url ╾─┬─━─▶ request ╾━┬─▶ interpret ╾──▶ data
         ├─╴endpoint╶───┼─▶ parameters ╾─╯          │               │                   │
         │              │                           │               │             ╭─────╯
         │              ├─▶ parameters ╾────────────┤               ╰─────────╮   │
         │              │                           │                      postprocess ╾──▶ result
         │             *╰─▶ payload ─▶ writepayload╶╯                           │
         │                    ╰─▶ dataformat ╾╯                                 │
         ╰─────────┬────────────────────────────────────────────────────────────╯
                   ╰────▶ validate (before initiating the request)

 * Only for POST requests   ╶╶ Optional first argument

At this point I’m pretty happy with it, but before I register the package I’d like to check to see if anybody else writing (or thinking about writing) API clients has thoughts on the design.

I like that it’s useful for me, I’d love it if it was useful to more people too :slight_smile:

27 Likes

How did you make that textual diagram? Did you do it by hand, or is there a tool that generated it. (Once I clicked on the expanding arrows found by hovering over the top-right corner, the text stopped wrapping, and it looked good.)

I’m afraid the answer to this is Box-drawing characters - Wikipedia + Geometric Shapes (Unicode block) - Wikipedia + C-c C-v :sweat_smile:

You can see more of this in some of my other packages, e.g. the docs of GitHub - tecosaur/KangarooTwelve.jl: Hashing with hopping (particularly the internals page).

5 Likes

Hi there, this looks really promising. I am currently working quite a bit with a rest interface to a biology-centric database (KEGG API Manual). I have implemented my own kegg specific functions in a project I am dealing with, but something like what you have here would’ve been super nice to use :slight_smile:

That being said, I think 2 things may help future users as well:

  1. The ability to cache results. Sometimes you want to download the same thing 1000 times, e.g. when building a model on-the-fly, which is kinda slow if you need to do the same request 1000 times. I currently use Scratch.jl for this, maybe this would be a useful addition?
  2. The ability to handle “flat files” i.e. text file output. For example, kegg has very spotty coverage of useful output in json format (e.g. https://rest.kegg.jp/get/C00001 which lists some information about a metabolite)

Otherwise, I really like the direction of your package!

1 Like

Thanks for the feedback @Elmo!

Ah yes, this makes a lot of sense. Since we have the Request object, it should be easy to use that as a cache key. This probably just requires an extra method like shouldcache or cacheduration, and then some sort of filesystem cache for the information. I’ll probably extend @endpoint somehow for this too (syntax suggestions welcome!).

I’m tempted to just save the raw response (meaning it would be re-parsed) and use BaseDirs.jl for this so I can actually put this is the system-appropriate cache directory as opposed to scratch (arguably scratch itself should do this, but Julia tends to be annoyingly anti-system-integration).

I think this can be done at the moment, you’d just want to either use RawFormat and a custom post-processing step, or define a custom format.

Ah, BaseDirs.jl looks really cool, I might switch in the future actually :slight_smile:

I don’t really have any strong opinions about the syntax, but I would say that its more useful to just cache the entire downloaded result, and not just the parsed object. It has happened multiple times now that I parse what I need from the downloaded object, and then a few days later realize that I need more info than I originally parsed. I then needed to re-download all 5000+ objects. It’s not a major inconvenience, but would be a nice feature to just be able to re-interpret the downloaded object, exactly as you suggest.

Ah perfect, then I will look more closely at RawFormat, thanks!

1 Like

JuliaHub assumed the README was in Markdown format, but it was really an Org document.

For the drawing, sometime ago I used https://draw.io/ I think it was pretty cool, we could do drawings with ascii characters.

3 Likes

I think by the way this caching is what you nearly done but extended to persitent cache. So literally you created @mcacheand then we do the @pcache with the BaseDir.jl later on and we could easily put it into the logic.
But this is just a guess! (But it is really important to see that the caching was appe ded to the logic and where it was appended. The logic could stay visible which is nice patter I think.) Also it would stay modular, so it isnt have to be an internal caching functionality with a flag.

This looks really cool.
Do I need to define the structs, or can it just return the result as a Dict or Vector{Dict}?

If JSON3.jl knows what to do with the type, you can do just that. All you’ll need to do is declare that the response is JSON-formatted by defining RestClient.dataformat(::MyEndpoint, ::Type{Dict}) = RestClient.JSONFormat() etc

Without this, RestClient can’t know how the response data should be interpreted.

1 Like

Thanks for the reply @tecosaur . I need a bit more help to understand it . How should I modify the draw endpoint from the tutorial to get it as a Dict?

You could do:

@endpoint draw(deck::Deck, count::Int) -> "deck/{deck.id}/draw?{count}" -> Dict
RestClient.dataformat(::DrawEndpoint, ::Type{Dict}) = RestClient.JSONFormat()
1 Like

My experience level is not high: I did some very simple GET requests about 7 years ago (probably with Python). I started a Julia package to read a particular REST API recently.

Could RestClient.jl be used as a replacement for the client part of HTTP.jl and say,URITemplate.jl (and maybe other packages)?

In case it informs your design: this is how I handled caching. I did a rather ad-hoc version of this for my project. I thought about which level to use for caching. I ended up caching the raw body because it was simplest to implement. But it also looks like a cleaner design, because details of caching are implemented in a smaller amount of code and don’t appear higher up where the code is more complicated and subject to change. I also have the example of a Python version of the same project that could have made this choice, but didn’t. So I was motivated to do it from the outset.

In my case, reparsing is fast, and is a small price to pay for the simplicity.

The low-level functions that make GET/POST requests take a keyword arg refresh=false. Higher-level functions also take this keyword and pass it to the lower-level functions. In my application, responses are generated from data written by a job that runs for a while and then is finished. So I need a cache that persists indefinitely.

My project has an externally imposed config directory. I abuse this slightly by storing mutable data (the cache) as well.

1 Like
2 Likes

I wonder to what extent Cache-Control and Expires headers can be used for reasonable default behaviour. :thinking:

1 Like