Help with web forms

mihalybaci · February 25, 2021, 4:20pm

I’m trying to learn how to do some really simple web stuff, like automatically filling out forms, with Julia and am kind of at loss. I found this python article, which has a simple example using DuckDuckGo that I am trying to replicate. The HTTP.jl examples mostly cover servers and the POST example is for a REST API, so I’ve only gotten as far as

using HTTP

url = "https://html.duckduckgo.com/html"   # URL for submitting the POST
payload = Dict("q" => "julialang")   # name of the search field => the search term
req = HTTP.request("POST", url,... ????)   # Proper format here?
results = String(req.body)  # Figure out way to extract results

As another example, the forms I would like to access are more in the form of web calculators, like this one for example, where the inputs/outputs are more well-defined. For that calculator (based on the DuckDuckGo one), the parameters would be

url = "https://www.brewingcalculators.com/celsius-c-to-fahrenheit-f/?"   # URL for submitting the POST
payload = Dict("?" => 34)   # name of the field => a number

So, my first question is, what would be the proper format for the POST in these cases? And then, is there a well-defined way to get the results out of the response?

Thanks for any help!

chris-b1 · February 25, 2021, 6:06pm

HTTP.jl could use more docs/examples - but this should work, wrapping your payload in a HTTP.Form - from api docs here

using HTTP
payload = Dict("q" => "julialang")
form = HTTP.Form(payload)

resp = HTTP.post("https://html.duckduckgo.com/html", [], form)
html = String(resp.body)

In terms of doing something with the results - that’s a project of its own - but could use something like EzXML.jl to parse the html and pull out data. Python has some more mature html/xml scraping libraries so could also call out to them with PyCall.jl

But here’s a simple example with EzXML, pulling the text out all the h2 tags in the html

using EzXML
tree = parsehtml(html)
titles = strip.(nodecontent.(findall("//h2", tree)))

25-element Array{SubString{String},1}:
 "The Julia Programming Language"
 "Download Julia"
 "GitHub - JuliaLang/julia: The Julia Programming Language"
 "The Julia Language - Posts | Facebook"
 "The Julia Language (@JuliaLanguage) | Твиттер"

mihalybaci · February 25, 2021, 7:23pm

Great thanks! I’ll play around with EzXML.

fyzycyst · May 14, 2021, 8:41pm

Look to Chrome’s developer tools… you can actually record the outgoing POST with your form submission and see the exact format of what is sent (along with the headers). It’s a bit brute force, but you can then lovingly handcraft the header and data portions of the POST request.

That at least gets you a minimum working example to expand from.

sampope · May 18, 2021, 11:11pm

thanks for this example. I understand XPath far better than anything else, and it helped me pull out the href’s from a huge html table:

for n in findall("//tr/td/a/@href", doc)
  println(nodecontent(n))
end

Topic		Replies	Views
Get POST parameters using a HTTP server Web Stack question , package	5	3106	May 9, 2020
How to send a simple "GET" or "POST" with parameters Web Stack	2	2065	January 31, 2021
Form encoded data Web Stack question	0	653	December 4, 2019
HTTP POST xml request with headers and authentification General Usage http	6	1960	May 13, 2021
Send file with HTTP.post General Usage question , http , file	4	1000	October 22, 2024

Help with web forms

Related topics