Automating download of a fetch api in a website

I’m trying to automate a procedure in a Julia script which is equivalent to going on this page:
https://mercado.ren.pt/EN/Electr/MarketInfo/MarketResults/OMIE/Pages/Prices.aspx
and downloading a .xls file for the day, and I was hoping someone here had any previous successful experience in doing something similar.

Let me start by “disclaiming” that I have zero experience with webdev apart from making really basic plain html + css. My programming background is related to numerical/scientific computing, so please anticipate stupid questions/remarks.

Picture attached with the order of clicks to recreate the desired download.

After messing around a bit in the devtools on my browser, I believe the request that is performed is (for today, 27/08/2022):
https://mercadoservices.ren.pt/api/Exports/GetExports?language=EN&nome=PrecoMerc&dia1=2022-08-27&dia2=2022-08-27&dia3=&dia4=&ordem=&tipo=

In the browser, if I try to go directly to that link, I get

<string>API key is invalid.</string>

And in Julia, an extremely naive attempt went like

download("https://mercadoservices.ren.pt/api/PrecosMercadoSpot/GetPrecosMercadoSpotLatest?language=EN")
ERROR: HTTP/1.1 403 Forbidden while requesting https://mercadoservices.ren.pt/api/PrecosMercadoSpot/GetPrecosMercadoSpotLatest?language=EN
Stacktrace:
 [1] #3
   @ ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:233 [inlined]
 [2] arg_write(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, String}, arg::Nothing)
   @ ArgTools ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:101
 [3] #download#2
   @ ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:221 [inlined]
 [4] download(url::String, output::Nothing)
   @ Downloads ~/packages/julias/julia-1.7.2/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:221
 [5] #invokelatest#2
   @ ./essentials.jl:716 [inlined]
 [6] invokelatest
   @ ./essentials.jl:714 [inlined]
 [7] do_download
   @ ./download.jl:24 [inlined]
 [8] download(url::String)
   @ Base ./download.jl:20
 [9] top-level scope
   @ REPL[16]:1

I believe the request on this API requires that I pass some kind of key. I have tried to google around but I am so out of my zone that I probably don’t know the correct keywords to google for. Any pointers?

EDIT: Copy pasting the Options/Request headers/Response headers of this link

scheme https
host mercadoservices.ren.pt
filename /api/Exports/GetExports
language EN
nome PrecoMerc
dia1 2022-08-27
dia2 2022-08-27
dia3
dia4
ordem
tipo
Address 185.165.107.11:443

Status

200

OK

VersionHTTP/1.1

Transferred618 B (0 B size)

Referrer Policystrict-origin-when-cross-origin

Access-Control-Allow-Credentials true
Access-Control-Allow-Headers Origin, Content-Type, X-Auth-Token, X-ApiKey
Access-Control-Allow-Methods GET, POST, PATCH, PUT, DELETE, OPTIONS
Access-Control-Allow-Origin https://mercado.ren.pt
Allow OPTIONS, TRACE, GET, HEAD, POST
Content-Length 0
Content-Security-Policy default-src ‘none’
Date Fri, 26 Aug 2022 12:31:56 GMT
Public OPTIONS, TRACE, GET, HEAD, POST
Strict-Transport-Security max-age=31536000
X-Content-Type-Options nosniff
X-Content-Type-Options nosniff
X-MS-InvokeApp 1; RequireReadOnly
X-Powered-By ARR/3.0
X-XSS-Protection 1;mode=block
Accept /
Accept-Encoding gzip, deflate, br
Accept-Language en-US,en;q=0.5
Access-Control-Request-Headers content-type,x-apikey
Access-Control-Request-Method GET
Connection keep-alive
DNT 1
Host mercadoservices.ren.pt
Origin https://mercado.ren.pt
Referer https://mercado.ren.pt/
Sec-Fetch-Dest empty
Sec-Fetch-Mode cors
Sec-Fetch-Site same-site
User-Agent Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:104.0) Gecko/20100101 Firefox/104.0

You must have crashed their website :laughing:

Won’t load for me to play around with it.

:joy:
But it’s actually loading on my side still. What about the root links, like https://mercado.ren.pt/EN ?

Nope nothing for me.

Let me try using a vpn of some sort to check if it’s geographically blocked, although I would find that to be extremely weird…

I’m getting a timeout if I use an US VPN, but with an UK VPN it’s working. I tried it using the Browsec extension for Firefox.
EDIT: Wait, no, it’s working on a US VPN as well, just took longer.

FWIW, when I downloaded the xlsx using the browser (Chrome), it sent an x-apikey header. Sorry, I can’t tell yet where it gets the key value. Maybe, but unlikely, it’s hard-coded and you can reuse the value in your own request.

That is my current intuition as well, but I’m at a loss as to:

  1. how to look on google for methods of searching the api key sent when you click on the button in your browser (all I get are marketing “blog posts”);
  2. if I happen to find that key, how to include it in a julia script for the GET request.

Additionally, I would also be open to just using Julia to building a robot that recreates opening the browser and clicking the link (although I have no idea where I would start with that), but I’m just afraid this would be robust enough to deploy on some random server in the future that uses this request for doing some other stuff.

I think I found the key used in the raw request:

GET /api/Exports/GetExports?language=EN&nome=PrecoMerc&dia1=2022-08-27&dia2=2022-08-27&dia3=&dia4=&ordem=&tipo= HTTP/1.1
Host: mercadoservices.ren.pt
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:104.0) Gecko/20100101 Firefox/104.0
Accept: application/octet-stream
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://mercado.ren.pt/
content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
**x-apikey: bWVyY2Fkb19tTDI3M0J0aUxlUmNxZnFCcUltV0JmNXV2UFRtZFc0Vkh4YjRFZUQ2**
Origin: https://mercado.ren.pt
DNT: 1
Connection: keep-alive
Cookie: cookiesession1=678B29657898901234ABCDEFHIJK7788
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site

Furthermore, it seems it is set up from only accepting requests from the company’s website, which it seems to say in the raw response:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Expires: -1
Content-Disposition: attachment; filename=PrecoMerc.xlsx
X-AspNet-Version: 4.0.30319
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE, OPTIONS
Access-Control-Allow-Headers: Origin, Content-Type, X-Auth-Token, X-ApiKey
X-Content-Type-Options: nosniff
Content-Security-Policy: default-src 'none'
X-MS-InvokeApp: 1; RequireReadOnly
X-XSS-Protection: 1;mode=block
**Access-Control-Allow-Origin: https://mercado.ren.pt**
X-Powered-By: ARR/3.0
X-Content-Type-Options: nosniff
Date: Fri, 26 Aug 2022 14:05:41 GMT
Content-Length: 9810
Strict-Transport-Security: max-age=31536000

Same key as my request. Use it until it doesn’t work any longer.

Regarding specifying it in Julia: Client · HTTP.jl

Not surprising. They probably charge for feeds and wouldn’t want to give that away for free.

Thanks for the help, I’ll give it a try and I’ll report back. :+1:

They are obliged to provide this info for free for transparency reasons, I believe, at least that’s my understanding:
https://mercado.ren.pt/EN/Electr/Communication/WebServices/Pages/default.aspx

See https://mercado.ren.pt/PT/Electr/Comunicacao/WebServ/WSDoc/REN%20–%20SpecWSInfoMercado.pdf

1 Like

Phenomenal, I’m such an idiot :joy: :joy: :joy:
Thanks Jeff, I have much to read!
I did manage to get some “nonerrored” response with the following:

headers = ["Origin" => "https://mercado.ren.pt",
          "x-apikey" => "bWVyY2Fkb19tTDI3M0J0aUxlUmNxZnFCcUltV0JmNXV2UFRtZFc0Vkh4YjRFZUQ2"
          ]

HTTP.get("https://mercadoservices.ren.pt/api/Exports/GetExports?language=EN&nome=PrecoMerc&dia1=2022-08-27&dia2=2022-08-27&dia3=&dia4=&ordem=&tipo=",
         headers
         )

It’s probably easier in the long run to use their API, including that you won’t have to decode xlsx

That’s a very good point, as that was a crutch for my current workflow, particularly because of the character set used in the default xlsx format they encode the download. The column names have tons of special characters that error out when converting to DataFrame.

Another question, on using this type of API with Julia, are there any particular resources I should use? Am I still looking at mainly HTTP.jl? Anything I should read (blog posts, tutorials, guides)?

It’s typical to map an API like this into function calls for each endpoint to hide all the http request handling. There are code generation tools that take an API documentation in some format such as OpenAPI (aka swagger) and generate at least stubs for those functions. It looks like there is https://github.com/JuliaComputing/Swagger.jl for Julia.

I’m my case, I only want to do a specific type of POST request to a particular Function “GetMarketPrice”, so I don’t think I need something as complex (fortunately).

I have been getting HTTP 400 errors when I specify the body argument for HTTP.post() as a Dict, because there’s a main method “GetInfoForTimeFrameByInfoType” that wraps all other methods. I tried a Dict of Dict to see if it would work but it doesn’t. It’s not a real problem because I can write a hacky parser that takes the string arguments I need and it will get the job done, but I’d still like to try and make it prettier before I resort to the pragmatic approach.

Here’s what I mean; this works:

using HTTP

soap_endpoint = "https://ws-mercado.ren.pt/MarketInfoService.asmx"

content_type = "application/soap+xml;charset=UTF-8;action=\"https://ws-mercado.ren.pt/GetInfoForTimeFrameByInfoType\""

headers = Dict("Content-Type" => content_type,
               "Host" => "ws-mercado.ren.pt",
               "Connection" =>  "Keep-Alive"
               )

body = "<soap:Envelope xmlns:soap=\"http://www.w3.org/2003/05/soap-envelope\" xmlns:ws=\"https://ws-mercado.ren.pt\">
<soap:Header/>
<soap:Body>
<ws:GetInfoForTimeFrameByInfoType>
<ws:StartDay>2022-08-27</ws:StartDay>
<ws:EndDay>2022-08-27</ws:EndDay>
<ws:InfoType>GerMarketPrice</ws:InfoType>
</ws:GetInfoForTimeFrameByInfoType>
</soap:Body>
</soap:Envelope>"

HTTP.post(soap_endpoint, headers, body)

However, this does not:

using HTTP

soap_endpoint = "https://ws-mercado.ren.pt/MarketInfoService.asmx"

content_type = "application/soap+xml;charset=UTF-8;action=\"https://ws-mercado.ren.pt/GetInfoForTimeFrameByInfoType\""

headers = Dict("Content-Type" => content_type,
               "Host" => "ws-mercado.ren.pt",
               "Connection" =>  "Keep-Alive"
               )

wrapped = Dict("StartDay" => "2022-08-27",
               "EndDay" => "2022-08-27",
               "InfoType" => "GetMarketPrice"
               )

body = Dict("GetInfoForTimeFrameByInfoType" => wrapped)

HTTP.post(soap_endpoint, headers, body)

EDIT: Scratch that none of the above work, I get some response in the first one, but it’s not what it’s supposed to be…