[pre-ANN] RestClient.jl

logotype

Don’t stress about web APIs

I’ve had to deal with a bunch of Web APIs as of late, and because I was getting fed up with the boilerplate I’ve put together RestClient.jl, which aims to make interfacing with Web APIs as simple as reasonably possible, with sensible out-of-the-box behaviour.

To that end, it takes care of URI encoding and rate limiting, comes with basic debugging utilities, and makes serialisation/deserialisation as simple as slapping @jsondef / @xmldef in front of a struct (with an optional syntax for specifying what JSON/XML element should map to a field).

A great place to start is by looking at the tutorial, but here’s an example from the readme for what it takes to wrap a JSON API:

using RestClient, JSON3

@globalconfig RequestConfig("https://api.sampleapis.com/coffee")

@jsondef struct Coffee
   title::String
   description::String
   ingredients::Vector{String}
   image::String
   id::Int
end

@endpoint hot() -> "hot" -> Vector{Coffee}
@endpoint iced() -> "iced" -> Vector{Coffee}

To give an XML example too:

using RestClient, XML

@globalconfig RequestConfig("https://boardgamegeek.com/xmlapi2")

@xmldef struct SearchItem
    type."@type"::Symbol
    id."@id"::Int
    name."name[1]/@value"::String
    year."yearpublished[1]/@value"::Union{Int, Nothing}
end

@xmldef struct SearchResponse <: ListResponse{SearchItem}
    items."items[1]/item"::Vector{SearchItem}
end

@endpoint search(query::String) -> "search?{query}" -> SearchResponse

All these macros are shorthand for creating methods and types for the RestClient request/response process, which looks a bit like this:

Invocation: apido(x) -> perform(Request(...)):

         ╭─╴config╶────────────────────────────╮
         │     ╎                               │
         │     ╎        ╭─▶ responsetype ╾─────┼────────────────┬──▶ dataformat ╾───╮
Request╶─┤     ╰╶╶╶╶╶╶╶╶│                      │                ╰─────────╮         │
         │              ├─▶ pagename ╾───╮     │      ┌┄┄┄┄debug┄┄┄┐      │  ╭──────╯
         │              │                ├──▶ url ╾─┬─━─▶ request ╾━┬─▶ interpret ╾──▶ data
         ├─╴endpoint╶───┼─▶ parameters ╾─╯          │               │                   │
         │              │                           │               │             ╭─────╯
         │              ├─▶ parameters ╾────────────┤               ╰─────────╮   │
         │              │                           │                      postprocess ╾──▶ result
         │             *╰─▶ payload ─▶ writepayload╶╯                           │
         │                    ╰─▶ dataformat ╾╯                                 │
         ╰─────────┬────────────────────────────────────────────────────────────╯
                   ╰────▶ validate (before initiating the request)

 * Only for POST requests   ╶╶ Optional first argument

At this point I’m pretty happy with it, but before I register the package I’d like to check to see if anybody else writing (or thinking about writing) API clients has thoughts on the design.

I like that it’s useful for me, I’d love it if it was useful to more people too :slight_smile:

28 Likes

How did you make that textual diagram? Did you do it by hand, or is there a tool that generated it. (Once I clicked on the expanding arrows found by hovering over the top-right corner, the text stopped wrapping, and it looked good.)

I’m afraid the answer to this is Box-drawing characters - Wikipedia + Geometric Shapes (Unicode block) - Wikipedia + C-c C-v :sweat_smile:

You can see more of this in some of my other packages, e.g. the docs of GitHub - tecosaur/KangarooTwelve.jl: Hashing with hopping (particularly the internals page).

6 Likes

Hi there, this looks really promising. I am currently working quite a bit with a rest interface to a biology-centric database (KEGG API Manual). I have implemented my own kegg specific functions in a project I am dealing with, but something like what you have here would’ve been super nice to use :slight_smile:

That being said, I think 2 things may help future users as well:

  1. The ability to cache results. Sometimes you want to download the same thing 1000 times, e.g. when building a model on-the-fly, which is kinda slow if you need to do the same request 1000 times. I currently use Scratch.jl for this, maybe this would be a useful addition?
  2. The ability to handle “flat files” i.e. text file output. For example, kegg has very spotty coverage of useful output in json format (e.g. https://rest.kegg.jp/get/C00001 which lists some information about a metabolite)

Otherwise, I really like the direction of your package!

1 Like

Thanks for the feedback @Elmo!

Ah yes, this makes a lot of sense. Since we have the Request object, it should be easy to use that as a cache key. This probably just requires an extra method like shouldcache or cacheduration, and then some sort of filesystem cache for the information. I’ll probably extend @endpoint somehow for this too (syntax suggestions welcome!).

I’m tempted to just save the raw response (meaning it would be re-parsed) and use BaseDirs.jl for this so I can actually put this is the system-appropriate cache directory as opposed to scratch (arguably scratch itself should do this, but Julia tends to be annoyingly anti-system-integration).

I think this can be done at the moment, you’d just want to either use RawFormat and a custom post-processing step, or define a custom format.

Ah, BaseDirs.jl looks really cool, I might switch in the future actually :slight_smile:

I don’t really have any strong opinions about the syntax, but I would say that its more useful to just cache the entire downloaded result, and not just the parsed object. It has happened multiple times now that I parse what I need from the downloaded object, and then a few days later realize that I need more info than I originally parsed. I then needed to re-download all 5000+ objects. It’s not a major inconvenience, but would be a nice feature to just be able to re-interpret the downloaded object, exactly as you suggest.

Ah perfect, then I will look more closely at RawFormat, thanks!

1 Like

JuliaHub assumed the README was in Markdown format, but it was really an Org document.

For the drawing, sometime ago I used https://draw.io/ I think it was pretty cool, we could do drawings with ascii characters.

3 Likes

I think by the way this caching is what you nearly done but extended to persitent cache. So literally you created @mcacheand then we do the @pcache with the BaseDir.jl later on and we could easily put it into the logic.
But this is just a guess! (But it is really important to see that the caching was appe ded to the logic and where it was appended. The logic could stay visible which is nice patter I think.) Also it would stay modular, so it isnt have to be an internal caching functionality with a flag.

This looks really cool.
Do I need to define the structs, or can it just return the result as a Dict or Vector{Dict}?

If JSON3.jl knows what to do with the type, you can do just that. All you’ll need to do is declare that the response is JSON-formatted by defining RestClient.dataformat(::MyEndpoint, ::Type{Dict}) = RestClient.JSONFormat() etc

Without this, RestClient can’t know how the response data should be interpreted.

1 Like

Thanks for the reply @tecosaur . I need a bit more help to understand it . How should I modify the draw endpoint from the tutorial to get it as a Dict?

You could do:

@endpoint draw(deck::Deck, count::Int) -> "deck/{deck.id}/draw?{count}" -> Dict
RestClient.dataformat(::DrawEndpoint, ::Type{Dict}) = RestClient.JSONFormat()
1 Like

My experience level is not high: I did some very simple GET requests about 7 years ago (probably with Python). I started a Julia package to read a particular REST API recently.

Could RestClient.jl be used as a replacement for the client part of HTTP.jl and say,URITemplate.jl (and maybe other packages)?

In case it informs your design: this is how I handled caching. I did a rather ad-hoc version of this for my project. I thought about which level to use for caching. I ended up caching the raw body because it was simplest to implement. But it also looks like a cleaner design, because details of caching are implemented in a smaller amount of code and don’t appear higher up where the code is more complicated and subject to change. I also have the example of a Python version of the same project that could have made this choice, but didn’t. So I was motivated to do it from the outset.

In my case, reparsing is fast, and is a small price to pay for the simplicity.

The low-level functions that make GET/POST requests take a keyword arg refresh=false. Higher-level functions also take this keyword and pass it to the lower-level functions. In my application, responses are generated from data written by a job that runs for a while and then is finished. So I need a cache that persists indefinitely.

My project has an externally imposed config directory. I abuse this slightly by storing mutable data (the cache) as well.

2 Likes
2 Likes

I wonder to what extent Cache-Control and Expires headers can be used for reasonable default behaviour. :thinking:

1 Like

As of Initial introduction of request caching · tecosaur/RestClient.jl@cdf8f42 · GitHub, caching behaviour is controlled via the cache field of RequsetConfig (defaulting to true) and unless otherwise customised content is kept for as long as it is considered valid according to Cache-Control directives and the Expires header :slight_smile:

If this seems to be working solidly a few weeks from now, I think with this capability RestClient is now in the sort of state I think might be worth registering as v1.0.

2 Likes

I can clarify what I wrote above: The caching I do understands the domain and makes decisions based on details of the content. If data is not in a final state, it may be changing on the scale of minutes, or hours. So by default, the cache is always refreshed from the server. But there is some useful information that you don’t need to refresh, so you can explicitly turn it off, per request.
Then there are markers in the data itself that say that the data is in its final state. It will never change, you should never refresh the cache automatically. Just for completeness, I allow the user to force it.

I suppose one could make a general framework that allows you to specify these details concisely. But it might be rather complicated. And for all I know, there is very little demand for this sort of thing.

That makes sense. I’ve made the expiry age determination extensible, so you can implement custom logic for specific endpoints with cachelifetime([conf::RequestConfig], endpoint::AbstractEndpoint, res::Response).

This is basically how my implementation operates, except instead of a kwarg you have to (slightly more awkwardly) provide an explicit RequestConfig(..., cache=false) as the first argument.

Mmm, I haven’t seen this really pushed similar packages for other languages, but I think it makes a lot of sense to cache requests that explicitly say (through the response headers) that they can be cached. Expired cache items are automatically incrementally removed once they’re more than a fortnight past their expiry time (cacheclean is added as a Julia exit hook), which should help avoid the cache growing out of control.