Help with web forms

I’m trying to learn how to do some really simple web stuff, like automatically filling out forms, with Julia and am kind of at loss. I found this python article, which has a simple example using DuckDuckGo that I am trying to replicate. The HTTP.jl examples mostly cover servers and the POST example is for a REST API, so I’ve only gotten as far as

using HTTP

url = "https://html.duckduckgo.com/html"   # URL for submitting the POST
payload = Dict("q" => "julialang")   # name of the search field => the search term
req = HTTP.request("POST", url,... ????)   # Proper format here?
results = String(req.body)  # Figure out way to extract results

As another example, the forms I would like to access are more in the form of web calculators, like this one for example, where the inputs/outputs are more well-defined. For that calculator (based on the DuckDuckGo one), the parameters would be

url = "https://www.brewingcalculators.com/celsius-c-to-fahrenheit-f/?"   # URL for submitting the POST
payload = Dict("?" => 34)   # name of the field => a number

So, my first question is, what would be the proper format for the POST in these cases? And then, is there a well-defined way to get the results out of the response?

Thanks for any help!

HTTP.jl could use more docs/examples - but this should work, wrapping your payload in a HTTP.Form - from api docs here

using HTTP
payload = Dict("q" => "julialang")
form = HTTP.Form(payload)

resp = HTTP.post("https://html.duckduckgo.com/html", [], form)
html = String(resp.body)

In terms of doing something with the results - that’s a project of its own - but could use something like EzXML.jl to parse the html and pull out data. Python has some more mature html/xml scraping libraries so could also call out to them with PyCall.jl

But here’s a simple example with EzXML, pulling the text out all the h2 tags in the html

using EzXML
tree = parsehtml(html)
titles = strip.(nodecontent.(findall("//h2", tree)))

25-element Array{SubString{String},1}:
 "The Julia Programming Language"
 "Download Julia"
 "GitHub - JuliaLang/julia: The Julia Programming Language"
 "The Julia Language - Posts | Facebook"
 "The Julia Language (@JuliaLanguage) | Твиттер"

2 Likes

Great thanks! I’ll play around with EzXML.

Look to Chrome’s developer tools… you can actually record the outgoing POST with your form submission and see the exact format of what is sent (along with the headers). It’s a bit brute force, but you can then lovingly handcraft the header and data portions of the POST request.

That at least gets you a minimum working example to expand from.

2 Likes

thanks for this example. I understand XPath far better than anything else, and it helped me pull out the href’s from a huge html table:

for n in findall("//tr/td/a/@href", doc)
  println(nodecontent(n))
end