How to decode an HTTP payload to a string?

ben-schulz · May 12, 2021, 5:33am

suppose i have the following call to HTTP.jl:

using HTTP
rsp = HTTP.request("GET", "https://www.google.com")

rsp appears to contain the target HTML document, since a call to print displays what look like the first few lines of a well-formed HTML payload:

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Wed, 12 May 2021 05:24:04 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2021-05-12-05; expires=Fri, 11-Jun-2021 05:24:04 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=215=rhS6AL0sDWhXxPPKRXMq4IUucLNO6fPZYKQgM_NIDYbhgJus66teBrucY9Wji3h3iXvymdE0_uD-oDcCl-fEEXnbmkHMg88cR1-XhQl5yHqtGXZN7_r4f07mUbTkva97KMXPsMxIoNNUAFS7ovNTVm9MWpkYOiRHW6ytlGHMt-k; expires=Thu, 11-Nov-2021 05:24:04 GMT; path=/; domain=.google.com; HttpOnly
Alt-Svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="GyBQgellYHlWflgrWWbR7Q==">(function(){window.google={kEI:'dGabYIfRE8GqtQaqyKyADA',kEXPI:'0,778759,523777,56872,955,5104,207,4804,926,1390,383,246,5,1354,4936,314,6385,1116131,1232,1196472,578,43,1,328941,51223,16115,19397,9287,17572,4859,1361,9291,3023,4744,12841,4020,978,13228,2054,1793,10622,1142,13385,4517,2778,919,2277,8,2796,1593,1279,2212,530,149,1103,842,515,1466,56,157,4101,3514,606,2023,1777,520,4269,328,1285,8788,3227,1989,856,7,12354,5096,7877,4928,108,1483,1371,553,908
⋮
13790-byte body
"""

i’d like to inspect the full text – but rsp.body has type Vector{UInt8}, and i’m sure how to convert that type into a human-readable string.

does Julia have a function analogous to str.encode and str.decode in Python? i.e. an idiom that would allow me to turn the binary-encoded payload from an HTTP call back into a readable string?

thanks in advance!

fredrikekre · May 12, 2021, 6:00am

String(rsp.body)

ben-schulz · May 13, 2021, 4:57am

thanks for this!

i’ve gotten inconsistent results; this works with some requests, others not, and i’ve been unable identify the determining factor. i’m new enough to working in Julia that i wasn’t sure if this was something wrong with my usage, or something complicated about the requests themselves

at any rate, it gives me a clear direction to know that the common idiom is expected to be this simple.

much appreciated!

Sukera · May 13, 2021, 5:48am

That should always work - do you have an example request where it reproducibly doesn’t? In what ways does it fail?

ben-schulz · May 15, 2021, 6:06pm

i do have an example – though it’s not self-contained, and i’ve been unable so far to condense it to something that is.

fwiw, this is the exact request ( modulo the value of cookie, which expires on a ~24 hour period; i have been re-pasting new values as i get them from a web debugger ):

# Julia 1.6.1
using HTTP

url = "https://www.courts.mo.gov/casenet/cases/nameSearch.do"

cookie = "JSESSIONID=0002Sp1GeM9iTDeFJsEkifl7nLz:3JVJRVVP27:37NHJRO1KC; UJID=fe9c31ec-df08-401d-80fd-35a8cdd06d24; UJIA=1193593738; visid_incap_2409832=Ul3GekvET/SFlIuy+fYY4fIAImAAAAAAQUIPAAAAAABSSxaSgs8Qsh9cpSRZ4d5r; visitorid=20210502002746420305; visid_incap_1275915=7r+4HtEIQq6iBhPJees8B+MFkmAAAAAAQUIPAAAAAAAzv2CXwqFZSbiEyaYmbJ83; visid_incap_2223984=E/EOPCaHRUuNFok2TOSPuO8FkmAAAAAAQUIPAAAAAABIOzgdpNbAsHbuTEgCk9Oj; visid_incap_1692788=UmE9dQYuR3Gj24RhY6lkWt0IkmAAAAAAQUIPAAAAAAA8Wx9hgglXFyurN5SqP8rs; visid_incap_2285056=0uQJWF5cRXmkYce5Ql5P85szmGAAAAAAQUIPAAAAAAAjKh4lnPSzlK+9hAymt6gY; visid_incap_1276241=lQD9zXJdTDm/0T5TRdvV5GuRnGAAAAAAQUIPAAAAAADF3pfdceY/MtPIYu3vMx8p; incap_ses_8216_1276241=ic60TEjHKgQvNDyPnRgFcmuRnGAAAAAA+cXOU8aEyaSZCj3LEXGipg==; visid_incap_2154775=VgSr4W4RRvKS11mybNzgLGuRnGAAAAAAQUIPAAAAAABDK64xLGqasNBJHdzy5/ZE; incap_ses_1422_2154775=rFbfQtHZiCjAkvWiefW7E2uRnGAAAAAA32DoAVBbaZSXjg1oQz6KUg==; JSESSIONID=0001CjLwYADvIQ1eXmV0GPa8y4d:-GH4COV"

referer = "https://www.courts.mo.gov/casenet/cases/searchCases.do?searchType=name"

headers = [
"Host" => "www.courts.mo.gov",
"User-Agent" => "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language" => "en-US,en;q=0.5",
"Accept-Encoding" => "gzip, deflate, br",
"Referer" => referer,
"Content-Type" => "application/x-www-form-urlencoded",
"Origin" => "https://www.courts.mo.gov",
"Connection" => "keep-alive",
"Cookie" => cookie,
]


body = "inputVO.subAction=search&inputVO.type=SW&inputVO.courtId=SW&inputVO.startingRecord=1&inputVO.totalRecord=0&inputVO.blockNo=0&inputVO.selectedStatus=A&inputVO.aliasFlag=N&inputVO.judgmentAgainstFlag=N&inputVO.selectedIndexCourt=0&courtId=SW&inputVO.lastName=Wilson&inputVO.firstName=Eldra&inputVO.middleName=&inputVO.caseType=All&inputVO.yearFiled="


rsp = HTTP.request("POST", url, headers, body)

payload = String(rsp.body)

print(payload)

the response returns with a status of 200 – but the printed payload looks like garbage bytes.

by contrast, the web debugger ( in this case, F12 on Firefox ) clearly shows a well-formatted HTML page.

i also tried an identical request (same URL, headers, body) using Python’s requests library – and ( although i hate to say it ) it appears to return the page text with no extra calls or manipulation of the encoding.

if there’s something i’m doing wrong here, i’d love to know – because tbh i’d rather do this in Julia.

fredrikekre · May 15, 2021, 6:33pm

Perhaps the content is compressed? Check the response headers. HTTP.jl does not automatically decompress the payload (unlike browsers and apparently requests), so you have to do that yourself.

Here is an example of detecting Transfer-Encoding: gzip and decompressing:

julia> using HTTP, CodecZlib

julia> r = HTTP.get("http://localhost:8080");

julia> if HTTP.header(r, "Transfer-Encoding") == "gzip"
           @info "decompressing payload"
           bytes = transcode(GzipDecompressor, r.body)
       else
           bytes = r.body
       end;
[ Info: decompressing payload

julia> String(bytes)
"hello, world\n"

ben-schulz · May 18, 2021, 3:11am

thanks!

that didn’t resolve the issue, but it did give me an idea of where to look next: the Transfer-Encoding header is chunked in this case – and i assume i’ll have to do some reading and figure out how to handle that case

so, at the very least, i feel like i have a productive direction to go in to figure this out

StefanKarpinski · May 18, 2021, 3:25am

You can try Downloads, which uses libcurl and should handle this.

Topic		Replies	Views
Get POST parameters using a HTTP server Web Stack question , package	5	3107	May 9, 2020
How to read JSON from HTML? Web Stack question , web , json	8	5216	October 8, 2020
Downloading website content (HTTP, BufferedStreams) General Usage question , strings , unicode , streaming	3	252	April 6, 2024
Scraping Site New to Julia	9	2198	November 25, 2020
Converting Postman Prerequest Script to Julia Script Web Stack web	5	968	October 19, 2021

How to decode an HTTP payload to a string?

Related topics