How to decode an HTTP payload to a string?

suppose i have the following call to HTTP.jl:

using HTTP
rsp = HTTP.request("GET", "https://www.google.com")

rsp appears to contain the target HTML document, since a call to print displays what look like the first few lines of a well-formed HTML payload:

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Wed, 12 May 2021 05:24:04 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2021-05-12-05; expires=Fri, 11-Jun-2021 05:24:04 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=215=rhS6AL0sDWhXxPPKRXMq4IUucLNO6fPZYKQgM_NIDYbhgJus66teBrucY9Wji3h3iXvymdE0_uD-oDcCl-fEEXnbmkHMg88cR1-XhQl5yHqtGXZN7_r4f07mUbTkva97KMXPsMxIoNNUAFS7ovNTVm9MWpkYOiRHW6ytlGHMt-k; expires=Thu, 11-Nov-2021 05:24:04 GMT; path=/; domain=.google.com; HttpOnly
Alt-Svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="GyBQgellYHlWflgrWWbR7Q==">(function(){window.google={kEI:'dGabYIfRE8GqtQaqyKyADA',kEXPI:'0,778759,523777,56872,955,5104,207,4804,926,1390,383,246,5,1354,4936,314,6385,1116131,1232,1196472,578,43,1,328941,51223,16115,19397,9287,17572,4859,1361,9291,3023,4744,12841,4020,978,13228,2054,1793,10622,1142,13385,4517,2778,919,2277,8,2796,1593,1279,2212,530,149,1103,842,515,1466,56,157,4101,3514,606,2023,1777,520,4269,328,1285,8788,3227,1989,856,7,12354,5096,7877,4928,108,1483,1371,553,908
⋮
13790-byte body
"""

i’d like to inspect the full text – but rsp.body has type Vector{UInt8}, and i’m sure how to convert that type into a human-readable string.

does Julia have a function analogous to str.encode and str.decode in Python? i.e. an idiom that would allow me to turn the binary-encoded payload from an HTTP call back into a readable string?

thanks in advance!

String(rsp.body)

3 Likes

thanks for this!

i’ve gotten inconsistent results; this works with some requests, others not, and i’ve been unable identify the determining factor. i’m new enough to working in Julia that i wasn’t sure if this was something wrong with my usage, or something complicated about the requests themselves

at any rate, it gives me a clear direction to know that the common idiom is expected to be this simple.

much appreciated!

That should always work - do you have an example request where it reproducibly doesn’t? In what ways does it fail?

i do have an example – though it’s not self-contained, and i’ve been unable so far to condense it to something that is.

fwiw, this is the exact request ( modulo the value of cookie, which expires on a ~24 hour period; i have been re-pasting new values as i get them from a web debugger ):

# Julia 1.6.1
using HTTP

url = "https://www.courts.mo.gov/casenet/cases/nameSearch.do"

cookie = "JSESSIONID=0002Sp1GeM9iTDeFJsEkifl7nLz:3JVJRVVP27:37NHJRO1KC; UJID=fe9c31ec-df08-401d-80fd-35a8cdd06d24; UJIA=1193593738; visid_incap_2409832=Ul3GekvET/SFlIuy+fYY4fIAImAAAAAAQUIPAAAAAABSSxaSgs8Qsh9cpSRZ4d5r; visitorid=20210502002746420305; visid_incap_1275915=7r+4HtEIQq6iBhPJees8B+MFkmAAAAAAQUIPAAAAAAAzv2CXwqFZSbiEyaYmbJ83; visid_incap_2223984=E/EOPCaHRUuNFok2TOSPuO8FkmAAAAAAQUIPAAAAAABIOzgdpNbAsHbuTEgCk9Oj; visid_incap_1692788=UmE9dQYuR3Gj24RhY6lkWt0IkmAAAAAAQUIPAAAAAAA8Wx9hgglXFyurN5SqP8rs; visid_incap_2285056=0uQJWF5cRXmkYce5Ql5P85szmGAAAAAAQUIPAAAAAAAjKh4lnPSzlK+9hAymt6gY; visid_incap_1276241=lQD9zXJdTDm/0T5TRdvV5GuRnGAAAAAAQUIPAAAAAADF3pfdceY/MtPIYu3vMx8p; incap_ses_8216_1276241=ic60TEjHKgQvNDyPnRgFcmuRnGAAAAAA+cXOU8aEyaSZCj3LEXGipg==; visid_incap_2154775=VgSr4W4RRvKS11mybNzgLGuRnGAAAAAAQUIPAAAAAABDK64xLGqasNBJHdzy5/ZE; incap_ses_1422_2154775=rFbfQtHZiCjAkvWiefW7E2uRnGAAAAAA32DoAVBbaZSXjg1oQz6KUg==; JSESSIONID=0001CjLwYADvIQ1eXmV0GPa8y4d:-GH4COV"

referer = "https://www.courts.mo.gov/casenet/cases/searchCases.do?searchType=name"

headers = [
"Host" => "www.courts.mo.gov",
"User-Agent" => "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language" => "en-US,en;q=0.5",
"Accept-Encoding" => "gzip, deflate, br",
"Referer" => referer,
"Content-Type" => "application/x-www-form-urlencoded",
"Origin" => "https://www.courts.mo.gov",
"Connection" => "keep-alive",
"Cookie" => cookie,
]


body = "inputVO.subAction=search&inputVO.type=SW&inputVO.courtId=SW&inputVO.startingRecord=1&inputVO.totalRecord=0&inputVO.blockNo=0&inputVO.selectedStatus=A&inputVO.aliasFlag=N&inputVO.judgmentAgainstFlag=N&inputVO.selectedIndexCourt=0&courtId=SW&inputVO.lastName=Wilson&inputVO.firstName=Eldra&inputVO.middleName=&inputVO.caseType=All&inputVO.yearFiled="


rsp = HTTP.request("POST", url, headers, body)

payload = String(rsp.body)

print(payload)

the response returns with a status of 200 – but the printed payload looks like garbage bytes.

by contrast, the web debugger ( in this case, F12 on Firefox ) clearly shows a well-formatted HTML page.

i also tried an identical request (same URL, headers, body) using Python’s requests library – and ( although i hate to say it ) it appears to return the page text with no extra calls or manipulation of the encoding.

if there’s something i’m doing wrong here, i’d love to know – because tbh i’d rather do this in Julia.

Perhaps the content is compressed? Check the response headers. HTTP.jl does not automatically decompress the payload (unlike browsers and apparently requests), so you have to do that yourself.

Here is an example of detecting Transfer-Encoding: gzip and decompressing:

julia> using HTTP, CodecZlib

julia> r = HTTP.get("http://localhost:8080");

julia> if HTTP.header(r, "Transfer-Encoding") == "gzip"
           @info "decompressing payload"
           bytes = transcode(GzipDecompressor, r.body)
       else
           bytes = r.body
       end;
[ Info: decompressing payload

julia> String(bytes)
"hello, world\n"
1 Like

thanks!

that didn’t resolve the issue, but it did give me an idea of where to look next: the Transfer-Encoding header is chunked in this case – and i assume i’ll have to do some reading and figure out how to handle that case

so, at the very least, i feel like i have a productive direction to go in to figure this out

You can try Downloads, which uses libcurl and should handle this.