How to bypass the block for Julia on HTTP servers

How to download html pages when the domain treats the query

headers=["User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"]
HTTP.get(url, headers)
as a robot and does not return html only:
julia> response = HTTP.get(url, headers)
ERROR: HTTP.Exceptions.StatusError(403, "GET", "/", HTTP.Messages.Response:
"""

HTTP error code 403 means that the request required some permissions, usually a login cookie or API token, not that the website has decided your request is from a robot. Pass the required information (usually an authentication header or a cookie) with your request.

1 Like

I’m entering there for the first time and maybe I should accept a cookie file? Who should do it?
Paul

I don’t know - it depends on the website you’re trying to access.

Give this a try:

HEADERS = [
    "User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36",
    "Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Accept-Language" => "en-US,en;q=0.9",
    "Accept-Encoding" => "gzip, deflate, br",
    "Connection" => "keep-alive",
    "Cache-Control" => "max-age=0",
]

response = HTTP.get(url, HEADERS; cookies=true)

You can also copy the requisite headers from a browser (like Chrome) directly.

Thanks, but:

julia> url
"https://dziennikpolski24.pl"

ERROR: HTTP.Exceptions.StatusError(403, "GET", "/", HTTP.Messages.Response:
"""
HTTP/1.1 403 Forbidden
Date: Sat, 24 Aug 2024 08:16:44 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Accept-CH: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-
UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Ve
rsion, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
Critical-CH: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-C
H-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-
Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Origin-Agent-Cluster: ?1
Permissions-Policy: accelerometer=(),autoplay=(),browsing-topics=(),camera=(),clipboard-read=(),clipboard-write=
(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-cred
entials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
Referrer-Policy: same-origin
X-Content-Options: nosniff
X-Frame-Options: SAMEORIGIN
cf-mitigated: challenge
cf-chl-out: boPcmEAtSjCfhoxHOE8fd88RsXn16y+CNFpuHX8ebQu5DQVkMi3xT/97MNSXJR2o0KT6XDuYEBg2mZnZfJQBB0RlE6vlXjGSNKsO
623ZPtKzgck4qFWebli8cvSkODx4JphMLiD+1hBwcpVFU8NPBA==$5oEsrfT5KtRHqWxL/OYHDQ==
Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Vary: Accept-Encoding
Server: cloudflare
CF-RAY: 8b81e6e4fd8dbbc9-WAW
Content-Encoding: br
alt-svc: h3=":443"; ma=86400
...

Thanks. But I need to do it automatically, I need a body for reading.