AbstractVector{<:Pair} and AbstractDict


#1

This was discussed some on slack and I wanted to capture it in a more permanent place.

HTTP is now using an Array{Pair{String,String},1} to store headers for performance reasons, but for many use cases (like Mux) people would like to read headers without needing to search through an array of pairs.

@oxinabox mentioned the possibility of defining Base.getindex(xs::AbstractVector{Pair{T, V}}, key) where T<:Union{Symbol, AbstractString}, V so that these could be indexed like headers["Origin"] and similar. @andyferris suggested a lightweight AbstractDict wrapper over AbstractVector{<:Pair}.

What’s the best option?


#2

What would

getindex([3 => 1, 2 => 2, 1 => 3], 1)

do? A wrapper might be a better option.

EDIT sorry I missed T<:Union{Symbol, AbstractString}. Re-consider the example with a NamedArray or something similar. I still think it is too clever to special case generic accessors for a restricted set of types.


#3

I agree.


#4

just a thought:

headers = sort!(["Origin" => "abc", "Content-Type" => "def", "H" => "1", "H" => "2", "H" => "3"])
firstH = searchsortedfirst(headers, "H" => "")
while firstH <= endof(headers) && headers[firstH].first == "H"
    println(headers[firstH].second)
    global firstH += 1
end

#5

What about

struct As_assoc end
const as_assoc = As_assoc()

Base.getindex(xs::AbstractVector{Pair{T, V}}, ::As_assoc, key::T) = ...

allowing for headers[as_assoc, "Origin"] or maybe (needing an extra definition) headers[as_assoc, :Origin].

Re sorting: Losing the order of headers (as seen on the wire) does not seem like a good idea. But it might be a good idea to sort and also store the on-wire order.


#6

Agreed, although the default sort! call is a stable-sort, so it can at least preserve the order of any given repeated header. We could preserve the original array too though:

> view(headers, sortperm(headers, by=first))
> searchsorted(ans, "H" => "", by=first)
2:4

#7

Why not store something like Vector{Triple{key_T, val_T, Pair{UInt32, UInt32}}} where the last two denote the offset of the first byte and the last byte of the header line, relative to the start of the request? I almost said UInt16, but apparently longer leaders are technically valid HTTP (or may at least be seen in the wild).

That way you can binary search for specific headers and still don’t lose any info (e.g. you probably need to case-normalize header-names anyway; sure convenient if one can look up the on-the-wire spelling in the original header if needed). I know that I would curse if any info were discarded and I needed to reproduce the same interpretation of some malformed HTTP as some weird old or custom piece of software (or wanted to script some adversarial fingerprinting).

But you’re of course right that preserving the order of repeated headers is most important, and these are AFAIK technically the only parts of the order that MUST be preserved according to rfc7230.