Automating download of a fetch api in a website

tom-plaa · August 26, 2022, 1:11pm

I’m trying to automate a procedure in a Julia script which is equivalent to going on this page:
https://mercado.ren.pt/EN/Electr/MarketInfo/MarketResults/OMIE/Pages/Prices.aspx
and downloading a .xls file for the day, and I was hoping someone here had any previous successful experience in doing something similar.

Let me start by “disclaiming” that I have zero experience with webdev apart from making really basic plain html + css. My programming background is related to numerical/scientific computing, so please anticipate stupid questions/remarks.

Picture attached with the order of clicks to recreate the desired download.

After messing around a bit in the devtools on my browser, I believe the request that is performed is (for today, 27/08/2022):
“https://mercadoservices.ren.pt/api/Exports/GetExports?language=EN&nome=PrecoMerc&dia1=2022-08-27&dia2=2022-08-27&dia3=&dia4=&ordem=&tipo=”

In the browser, if I try to go directly to that link, I get

<string>API key is invalid.</string>

And in Julia, an extremely naive attempt went like

download("https://mercadoservices.ren.pt/api/PrecosMercadoSpot/GetPrecosMercadoSpotLatest?language=EN")
ERROR: HTTP/1.1 403 Forbidden while requesting https://mercadoservices.ren.pt/api/PrecosMercadoSpot/GetPrecosMercadoSpotLatest?language=EN
Stacktrace:
 [1] #3
   @ ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:233 [inlined]
 [2] arg_write(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, String}, arg::Nothing)
   @ ArgTools ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:101
 [3] #download#2
   @ ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:221 [inlined]
 [4] download(url::String, output::Nothing)
   @ Downloads ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:221
 [5] #invokelatest#2
   @ ./essentials.jl:716 [inlined]
 [6] invokelatest
   @ ./essentials.jl:714 [inlined]
 [7] do_download
   @ ./download.jl:24 [inlined]
 [8] download(url::String)
   @ Base ./download.jl:20
 [9] top-level scope
   @ REPL[16]:1

I believe the request on this API requires that I pass some kind of key. I have tried to google around but I am so out of my zone that I probably don’t know the correct keywords to google for. Any pointers?

EDIT: Copy pasting the Options/Request headers/Response headers of this link

scheme	https
host	mercadoservices.ren.pt
filename	/api/Exports/GetExports

language	EN
nome	PrecoMerc
dia1	2022-08-27
dia2	2022-08-27
dia3
dia4
ordem
tipo

Address	185.165.107.11:443

Status

200

OK

VersionHTTP/1.1

Transferred618 B (0 B size)

Referrer Policystrict-origin-when-cross-origin


Access-Control-Allow-Credentials	true

Access-Control-Allow-Headers	Origin, Content-Type, X-Auth-Token, X-ApiKey
Access-Control-Allow-Methods	GET, POST, PATCH, PUT, DELETE, OPTIONS
Access-Control-Allow-Origin	https://mercado.ren.pt
Allow	OPTIONS, TRACE, GET, HEAD, POST
Content-Length	0
Content-Security-Policy	default-src ‘none’
Date	Fri, 26 Aug 2022 12:31:56 GMT
Public	OPTIONS, TRACE, GET, HEAD, POST
Strict-Transport-Security	max-age=31536000
X-Content-Type-Options	nosniff
X-Content-Type-Options	nosniff
X-MS-InvokeApp	1; RequireReadOnly
X-Powered-By	ARR/3.0
X-XSS-Protection	1;mode=block


Accept	/

Accept-Encoding	gzip, deflate, br
Accept-Language	en-US,en;q=0.5
Access-Control-Request-Headers	content-type,x-apikey
Access-Control-Request-Method	GET
Connection	keep-alive
DNT	1
Host	mercadoservices.ren.pt
Origin	https://mercado.ren.pt
Referer	https://mercado.ren.pt/
Sec-Fetch-Dest	empty
Sec-Fetch-Mode	cors
Sec-Fetch-Site	same-site
User-Agent	Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:104.0) Gecko/20100101 Firefox/104.0

tbeason · August 26, 2022, 1:36pm

You must have crashed their website

Won’t load for me to play around with it.

tom-plaa · August 26, 2022, 1:43pm

But it’s actually loading on my side still. What about the root links, like https://mercado.ren.pt/EN ?

tbeason · August 26, 2022, 1:44pm

Nope nothing for me.

tom-plaa · August 26, 2022, 1:48pm

Let me try using a vpn of some sort to check if it’s geographically blocked, although I would find that to be extremely weird…

tom-plaa · August 26, 2022, 1:56pm

I’m getting a timeout if I use an US VPN, but with an UK VPN it’s working. I tried it using the Browsec extension for Firefox.
EDIT: Wait, no, it’s working on a US VPN as well, just took longer.

Jeff_Emanuel · August 26, 2022, 2:15pm

FWIW, when I downloaded the xlsx using the browser (Chrome), it sent an x-apikey header. Sorry, I can’t tell yet where it gets the key value. Maybe, but unlikely, it’s hard-coded and you can reuse the value in your own request.

tom-plaa · August 26, 2022, 2:22pm

That is my current intuition as well, but I’m at a loss as to:

how to look on google for methods of searching the api key sent when you click on the button in your browser (all I get are marketing “blog posts”);
if I happen to find that key, how to include it in a julia script for the GET request.

Additionally, I would also be open to just using Julia to building a robot that recreates opening the browser and clicking the link (although I have no idea where I would start with that), but I’m just afraid this would be robust enough to deploy on some random server in the future that uses this request for doing some other stuff.

tom-plaa · August 26, 2022, 2:26pm

I think I found the key used in the raw request:

GET /api/Exports/GetExports?language=EN&nome=PrecoMerc&dia1=2022-08-27&dia2=2022-08-27&dia3=&dia4=&ordem=&tipo= HTTP/1.1
Host: mercadoservices.ren.pt
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:104.0) Gecko/20100101 Firefox/104.0
Accept: application/octet-stream
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://mercado.ren.pt/
content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
**x-apikey: bWVyY2Fkb19tTDI3M0J0aUxlUmNxZnFCcUltV0JmNXV2UFRtZFc0Vkh4YjRFZUQ2**
Origin: https://mercado.ren.pt
DNT: 1
Connection: keep-alive
Cookie: cookiesession1=678B29657898901234ABCDEFHIJK7788
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site

Furthermore, it seems it is set up from only accepting requests from the company’s website, which it seems to say in the raw response:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Expires: -1
Content-Disposition: attachment; filename=PrecoMerc.xlsx
X-AspNet-Version: 4.0.30319
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE, OPTIONS
Access-Control-Allow-Headers: Origin, Content-Type, X-Auth-Token, X-ApiKey
X-Content-Type-Options: nosniff
Content-Security-Policy: default-src 'none'
X-MS-InvokeApp: 1; RequireReadOnly
X-XSS-Protection: 1;mode=block
**Access-Control-Allow-Origin: https://mercado.ren.pt**
X-Powered-By: ARR/3.0
X-Content-Type-Options: nosniff
Date: Fri, 26 Aug 2022 14:05:41 GMT
Content-Length: 9810
Strict-Transport-Security: max-age=31536000

Jeff_Emanuel · August 26, 2022, 2:26pm

Same key as my request. Use it until it doesn’t work any longer.

Regarding specifying it in Julia: Client · HTTP.jl

Jeff_Emanuel · August 26, 2022, 2:30pm

Not surprising. They probably charge for feeds and wouldn’t want to give that away for free.

tom-plaa · August 26, 2022, 2:30pm

Thanks for the help, I’ll give it a try and I’ll report back.

tom-plaa · August 26, 2022, 2:38pm

They are obliged to provide this info for free for transparency reasons, I believe, at least that’s my understanding:
https://mercado.ren.pt/EN/Electr/Communication/WebServices/Pages/default.aspx

Jeff_Emanuel · August 26, 2022, 2:43pm

See https://mercado.ren.pt/PT/Electr/Comunicacao/WebServ/WSDoc/REN%20–%20SpecWSInfoMercado.pdf

tom-plaa · August 26, 2022, 2:46pm

Phenomenal, I’m such an idiot
Thanks Jeff, I have much to read!
I did manage to get some “nonerrored” response with the following:

headers = ["Origin" => "https://mercado.ren.pt",
          "x-apikey" => "bWVyY2Fkb19tTDI3M0J0aUxlUmNxZnFCcUltV0JmNXV2UFRtZFc0Vkh4YjRFZUQ2"
          ]

HTTP.get("https://mercadoservices.ren.pt/api/Exports/GetExports?language=EN&nome=PrecoMerc&dia1=2022-08-27&dia2=2022-08-27&dia3=&dia4=&ordem=&tipo=",
         headers
         )

Jeff_Emanuel · August 26, 2022, 2:47pm

It’s probably easier in the long run to use their API, including that you won’t have to decode xlsx

tom-plaa · August 26, 2022, 2:49pm

That’s a very good point, as that was a crutch for my current workflow, particularly because of the character set used in the default xlsx format they encode the download. The column names have tons of special characters that error out when converting to DataFrame.

tom-plaa · August 26, 2022, 2:55pm

Another question, on using this type of API with Julia, are there any particular resources I should use? Am I still looking at mainly HTTP.jl? Anything I should read (blog posts, tutorials, guides)?

Jeff_Emanuel · August 26, 2022, 4:23pm

It’s typical to map an API like this into function calls for each endpoint to hide all the http request handling. There are code generation tools that take an API documentation in some format such as OpenAPI (aka swagger) and generate at least stubs for those functions. It looks like there is https://github.com/JuliaComputing/Swagger.jl for Julia.

tom-plaa · August 26, 2022, 4:49pm

I’m my case, I only want to do a specific type of POST request to a particular Function “GetMarketPrice”, so I don’t think I need something as complex (fortunately).

I have been getting HTTP 400 errors when I specify the body argument for HTTP.post() as a Dict, because there’s a main method “GetInfoForTimeFrameByInfoType” that wraps all other methods. I tried a Dict of Dict to see if it would work but it doesn’t. It’s not a real problem because I can write a hacky parser that takes the string arguments I need and it will get the job done, but I’d still like to try and make it prettier before I resort to the pragmatic approach.

Here’s what I mean; this works:

using HTTP

soap_endpoint = "https://ws-mercado.ren.pt/MarketInfoService.asmx"

content_type = "application/soap+xml;charset=UTF-8;action=\"https://ws-mercado.ren.pt/GetInfoForTimeFrameByInfoType\""

headers = Dict("Content-Type" => content_type,
               "Host" => "ws-mercado.ren.pt",
               "Connection" =>  "Keep-Alive"
               )

body = "<soap:Envelope xmlns:soap=\"http://www.w3.org/2003/05/soap-envelope\" xmlns:ws=\"https://ws-mercado.ren.pt\">
<soap:Header/>
<soap:Body>
<ws:GetInfoForTimeFrameByInfoType>
<ws:StartDay>2022-08-27</ws:StartDay>
<ws:EndDay>2022-08-27</ws:EndDay>
<ws:InfoType>GerMarketPrice</ws:InfoType>
</ws:GetInfoForTimeFrameByInfoType>
</soap:Body>
</soap:Envelope>"

HTTP.post(soap_endpoint, headers, body)

However, this does not:

using HTTP

soap_endpoint = "https://ws-mercado.ren.pt/MarketInfoService.asmx"

content_type = "application/soap+xml;charset=UTF-8;action=\"https://ws-mercado.ren.pt/GetInfoForTimeFrameByInfoType\""

headers = Dict("Content-Type" => content_type,
               "Host" => "ws-mercado.ren.pt",
               "Connection" =>  "Keep-Alive"
               )

wrapped = Dict("StartDay" => "2022-08-27",
               "EndDay" => "2022-08-27",
               "InfoType" => "GetMarketPrice"
               )

body = Dict("GetInfoForTimeFrameByInfoType" => wrapped)

HTTP.post(soap_endpoint, headers, body)

EDIT: Scratch that none of the above work, I get some response in the first one, but it’s not what it’s supposed to be…

Topic		Replies	Views
Code sample for logging into a web site and downloading a csv file New to Julia	4	617	April 27, 2021
How to download the most recent CSV from a link New to Julia question	8	661	June 26, 2021
Download a url (with a password) New to Julia	7	1341	January 31, 2023
Downloads.download() error General Usage http , download	2	50	August 7, 2025
Getting data directly from a website Performance	9	6392	July 9, 2018

Automating download of a fetch api in a website

Related topics